This month I'm shifting from my usual data-centric writing to put together some tools and processes to help students or professionals jump into the job search. I spent 19 years with one company, so when it came time to search, it was like being a student again. Over the past month, I've met with dozens of students and talked to 2 classes about the job search. It's also notable that several people ask you "how" to go about that after shifting careers. So, what better way to try to help than to kill off my regular data posts and put together a framework? I'll also have materials in the resource section of my blog for support or examples. Given this is a data blog, be ready for data commentary. It's surprising how much data and technology revolves around this industry!
I think it's important to note that a change of any kind will take some courage. You will likely be stepping out of your comfort zone. You will have to network with people, take aptitude tests, negotiate salary, and interview. If something scares you in the process, write it down so you can devise a plan around it. If interviewing scares you, then do practice questions. If networking scares you, then use some simple networking techniques. Change happens if you make it happen in the job search, and discomfort is short-lived. You can do this. Sometimes it will be slow, and you might struggle at first. Your path is different from everyone else, and you may sometimes be discouraged. Finding your first or subsequent job is about sampling. If you haven't read the book "Range", you should. Most people need to sample and try new things. That's how we learn.
Throughout this series, I'll talk about values. Understanding your values is something I believe in deeply. So deeply that I frequently as others what values they have and what they need from an organization. Note, I said need. Finding and evaluating companies and people on values is the most powerful thing you can do.
The framework that I usually lay out for people is the following:
This article will be a 4 part series, with 1 article released each week!
Step 1: Learn
Guides and Processes
I cannot stress the importance of using something as your guide. This blog series will hopefully give you a high-level framework, inspire you, and enable you with a tool you hadn't thought about, but there is a massive amount of fantastic literature. The one I recommend to people is "Amplify your Job Search" by Jeff Ton. Choose your own adventure here. Not every process works for everyone; you can mix and match some of them to create the best outcome for yourself.
I needed to update my resume when I started thinking about a change. I didn't realize how powerful your network, process, and the right tools and coaches can take you. In my case, I hired a coach, Lisa, to help me. She opened my eyes to how to think about your resumes, LinkedIn, and my network. Her perspective was that this is a marketing approach. Your resume, and any tool, are components of a marketing plan. Your resume is the first part of your toolkit, but it's an evolving creature. So before I just started winging applications at companies or calling contacts, she had me build a toolkit that I'll lay out in this series.
Along the way, I think it's important to reflect and think. Jeff talks about this in his book, and several people have recommended it. Your journal can be anything, digital, voice recording, or paper notes, as long as you have something. I would write down answers to these questions weekly and reflect on what I was doing.
1) How did I feel about my work, projects, etc., today?
2) Are there specific things I'm not getting or want?
3) What should I try next or contemplate?
As you write, what starts to form are clear pictures of what you're thinking about. You can derive your values from that information and how you need to align with a company or leader.
At its core, your values are intimately related to your needs, and people frequently confuse values and beliefs. Your beliefs are derived from experiences and function as a contextual framework for your understanding of the world, and values transcend that context and are part of you. In that sense, beliefs are external factors, while values are internal factors. But why is that important?
To understand why that's important, let's explore what happens when your values are misaligned. Pretend you have a friend who loves collaboration and relating to people, but they take a job in a high-pressure sales organization. You hang out on the weekends and slowly see the individual growing angry, tired, depressed, frustrated, and sad. Each week takes its toll on them, but the money is so good they can't pass it up. You talk to them about their values and find that they are family, collaboration, harmony, and other factors that their job will make impossible.
Values are the framework for your decision-making process. In decision science, values are known to drive goals and motivations. Values are generally part of unconscious behaviors, meaning you'll act on them even if you haven't thought about them. If you a misaligned with them in the workplace, your decision-making suffers, and you suffer as a result.
Now that you understand values, the hard part comes. You have to do some real thinking to figure out what some of your values are. I encourage you to write these down, and as many or as few as you want, but I encourage you to try to find 6-8 that resonate with you. Those will be your uncompromising values. It would help if you tried to answer some questions like:
Want to know what some of mine are?
Without discussing data, I can't write a post on my data blog site. Online searching is a data game, so there are a few things to remember. The advent of tools like Linkedin means that some recruiters are slammed with hundreds of resumes for every job posted. That means you're playing a game of percentages and trying to increase your chances just enough to get into that top X% that they interview. There are application tracking systems (ATS) technologies that human resource teams use to help them weed out candidates that aren't a fit. Unfortunately, the technology is built on top of a flawed process with resumes. It uses a bag-of-words technique in most cases. A bag of words algorithm works as a simple word matching and frequency. There's more complexity, but that's the gist of it. For example, if the job description has the word leadership in it, and your resume has it, that might increase your chances by 1%. If the word is mentioned several times, and you only have it once, but someone else has it three times, that might increase their chances over you. The key is that a machine learning algorithm determines if you make it to the next step. The best way to ensure that you have the right words is to use a tool like Jobscan that shows you how a particular ATS might rate you. Remember, the first phase is all about increasing your odds. More on that when we build your toolkit.
Stay tuned for the next article.
Those of us who work with data for a living will, at some point, find ourselves at the other end of a conversation about Artificial intelligence (AI). Likely, if you're me, that's at a bar, and you just heard someone say "AI," "Machine Learning," or "Skynet." In my case, I can't help but to engage and ask what they're doing with AI. Then comes the dreaded..."AI is here, and it will replace all our jobs." Well, my friends, here are five ways to argue the difference between AI and ML.
1) At some point, someone will say this "new" field of AI. Put on your glasses and say, "The pursuit of AI is old; it's credited to guys like Alan Turing, Allen Newell, Herbert Simon, and John McCarthy from the 1950's." Walk away in silence. Kidding aside, this is an old pursuit. Alan Turing derived the Turing Test in 1950 and published Computing Machinery and Intelligence. But progress takes time. It would be several more years before Turing, Newell, Cliff Shaw created the logic theorist and presented it at John McCarthy's Dartmouth Summer Research Project on Artificial Intelligence. The logic theorist is thought to be the first AI. From there, DARPA would fund AI, the first atonomous car would be built in 1986, Deep Blue would defeat Gary Kasparov in 1997, SIRI and Watson showed up in 2011, and then Microsoft introduced Tay, and the ethics challenge of AI began. These are all stories in themselves, but in the evolution of AI, they're all part of the story.
2) People debate the difference between clever ML and real AI. It's complex, but the easiest way to simplify it is to say Machine Learning (ML) is considered a branch of AI, but not all AI is ML, and ML is a subset of what is now considered the broad field of AI. It doesn't matter; the problem and the approach to solving it matter. If you're trying to automate resetting passwords in a call center, you might not need some superintelligence.
3) At some point, the topic of deep learning will come up as a way to create the newest human-like AI or Skynet. The truth is that deep learning has accelerated the progress of AI and ML, but it's likely not the end state. While deep learning is powerful, its continued evolution drives many AI breakthroughs. The need for better deep learning models has pushed computing to the next level. Deep learning is easy to train, but it's also easy to fool. There's a lot of ongoing research to find new algorithms and approaches that can deal with some weaknesses in deep learning.
4) People also love to debate if it's "really" AI. From an academic perspective, ML is a part of AI, and a clever chatbot is a function of AI. However, what your friends are debating is actually a functional area of AI. Artificial General Intelligence (AGI) refers to a true thinking machine. This area is still being explored, and there are no tremendous true thinking systems out there, not in the sense of AGI.
5) Many people think that google, amazon, the government, and others have made it past AGI, and into a form of intelligence beyond humans. Superintelligence, a state of AGI, has not yet been achieved. Outside of science-fiction, we aren't even close, and we have decades before we might get there. AI's best use is in helping humans with remedial tasks, helping us remember and do jobs that require a high degree of computation. Symbolic, conceptual, and other thinking areas are still the human brain's realm.
Now, venture into data social gatherings, bars, and dinners. Be confident with your newfound debate skills. If you happen in an establishment where people aren't talking about data, bring it up yourself!
NASA is a bedrock of innovation. In one place, some of our greatest engineers build technology that can take us to unexplored worlds. However, NASA isn't immune to problems, and we can learn from its failures. On September 23, 1999, the Mars Climate Orbiter burned to pieces. It had exemplary talent, and the hardware was getting the right inputs. Why on earth did it burn up? The Orbiter's team that designed the navigation system used the metric system (newtons) to calculate force. Although, the engineers that built the spacecraft built it with the English system (pounds). There was no conversion; therefore, the Orbiter missed orbit by ~90 miles or ~144 kilometers. It's too bad. The Orbiter would have been part of a communications relay on Mars for further missions.
How does that have anything to do with digital transformation? The problems that NASA experienced with the Orbiter are similar to those companies that are going through transformation efforts and scaling out data platforms are sharing today. I've seen several challenges identical to the Orbiter and have taken similar lessons from them. Here are five that jumped out to me while reading about the Orbiter.
1. Test Your Governance Framework
If you're like me, you ask, "how could that happen?" It wasn't that NASA didn't have a governance framework. They have a very robust one. It was a function of the framework that didn't work as intended. Lockheed Martin was the contractor brought in to do the navigation work. They simply used a different unit of measurement. The lesson here is that if you have a framework to catch problems, ensure it's working from time to time. The TSA does this where the baggage scanner will artificially flag something and force attention optimization on complex cognitive workloads. Their system ensures that the process is working by introducing fake data, measuring agent success, and ensuring the operations continue to work.
You Have 2. Communicate
"What we've got here is a failure to communicate." - Cool Hand Luke.
It's evident that regardless of the systems, there wasn't a communication channel between the engineers. I've seen this too. The people building the technology in a company are isolated from the people using the technology. Guess what happens. You end up with similar results. Agile approaches seek to remove some of these barriers and get the builders working with the users. Or, in a scaled framework getting the teams communicating rather than going through a single mastermind. Communication is a key capability for any team looking to transform how their organization.
3. Know The Collaboration Blind spots
Large complex projects and initiatives require collaboration. We know that. If you want to send a rocket to the moon, one person isn't going to accomplish that. Some studies tell us what happens when collaboration isn't working. Lisa Kwan of HBR writes about the blind spots created by the need for people to seek security in their work. When leaders ask for collaboration, sometimes, if the goals are not shared, there can be issues with how groups perceive each other. Her belief is that when groups grow concerned about security, they go defensive and guard their territory to reduce the threat. I'm not sure that's what happened between NASA and Lockheed Martin. Still, it triggered my thoughts to ensure that I have true collaboration and not teams in a defensive stance.
4. No Code is Perfect
I've had engineers and developers that like to tell people their code is crap, and they could do it better. Let's call that perfect engineer "Ace" for this discussion. Perhaps Ace could do better, and in the proper context, I love the work of trying to make things better. I have a basic rule: working was better than perfect if the code met the requirements. Also, I was not too fond of the environment it fostered when people belittled others' abilities. It's hard to get people to separate from the code they wrote and themselves when peers give feedback, especially unexpectedly. My biggest challenge was that Ace had extreme blind spots in thinking their code was infallible and liked to tell others how bad theirs was. Ace was also cognitively rigid. Once Ace locked into an idea, that was it. But Ace was coached and became a leader. As leaders, QA's, and reviewers, you know that all code has bugs. No code is perfect, so your best defense is to test. Ace can help write great software, but you have to assume that it will not be perfect even with all the talk. Leaders need to know that code is about getting close to perfect, but you have to roll with some imperfections to have a chance to hit your goals.
5. Learn from Mistakes
Here's an easy one. NASA does learn from its mistakes. They also don't stop trying. They plan, build, execute, and then try to understand. I love the retrospective concepts in Scrum. Suppose you don't have a formal committee or review of projects. In that case, you should consider having retros on your digital initiatives. At a minimum, you can keep learning about how to evolve and become a better organization. Digital transformations don't happen overnight, so you will likely have many retros on your journey.
In Depth | Mars Climate Orbiter – NASA Solar System Exploration
Why Trying to Make People Collaborate Freaks Them Out (hbr.org)
How to Build Digital Dexterity Into Your Workforce (hbr.org)
It's been a whirlwind of a month! I started a new job as the Chief Data Officer at CSpring consulting, so I can spend more time working on data problems with clients! To go with that theme, I thought we would explore the topic of Data Strategy today. Specifically, what is a good data strategy, and how can you use it to help your business?
The first thing to consider about strategy is that it is an aspirational model. Many leaders get hung up on the how, but that's more tactical. The second thing to understand, which is hard for many technologists, is that the business has strategic imperatives; you're not generating new ones in most cases. You're trying to align the technology or data strategy to the company. Aligning plans to those goals is more about strategic alignment than new strategies. You may need to think entrepreneurial or strategically, but your aim shouldn't be to change the business's strategy unless you're a technology or data product company.
In most cases, technology and data are enabling capabilities. Therefore, you must think strategically about aligning them to the business. For instance, if your company's strategic plan is to help mid-market companies develop data capabilities that allow them to grow, then your data strategy is more of a tributary of that. Your endeavor should be to create a way to align data to deliver against that business strategy. A small company's data strategy might be some accounting in QuickBooks, and a larger company may need full analytics and a predictive modeling platform.
Why do we need data strategies, though? Can't we take the company's big goals and deal with issues as they come? The answer is complicated. Some companies can leverage a more reactive approach; for others, that's a recipe to fail. Companies that fail to recognize the importance of an aligning data strategy will ultimately find issues with data silos, poor quality, or inefficiencies. Data strategies help depict the alignment and call out these challenges as hurdles they mean to address. Data strategy, on the whole, is a function of your business strategy, business model, business goals, and technological capabilities.
Any MBA knows the basics for developing a strategic roadmap. Define the goals, map the current state, propose a future state, and implement it. Data strategy, on the whole, is the same. Once done, you should be able to put the strategy on a page.
Here are some thought-provoking questions as you build your strategy:
Elements of Data Strategy
As you continue to form your strategy, keeping these four core tactical questions in mind is essential.
This topic is easy to continue to expand on and grow. I'll post more about it in the future, but for now, here are some examples and links to help:
I had a great conversation about data strategy with a colleague the other day. While we were talking through the complexities of forming a data strategy, it occurred to me that many business leaders that don't map a data strategy are not doing it because they think they don't need it. They don't recognize the importance of its relationship to the business model. I've written about offensive and defensive data strategies before, but this is on a more foundational level as it relates to the business strategy. I'm not naive enough to think that all businesses need a data strategy. If you're a sole prop that builds sheds for a living, you likely don't need to spend the time on a data strategy. Unless you're planning to go C-corp and grow like crazy across the nation, a data strategy could be very beneficial. As a general rule, I would argue that company size and scale tend to have more of a need for data strategies that scale with them. Back to my analogy of a one-person company, no scale is needed; therefore, the data strategy is "don't invest right now ."There are always exceptions, but that, in concept, is the basis for this article.
Does a Business Strategy need a Data Strategy?
Are data assets and the ability to derive value from that data a strategic capability you need/want in your business model? That's the core question to ask yourself when thinking about the business. This can be in several forms, not just a singular item. If the business model needs customer adoption of services, then the data strategy might need to seek ways to enable insights into customer channels. If the business needs to understand if a product will work in the market, then the data strategy may seek to run experiments and capture data. Ultimately, the business strategy provides the north start, and the data strategy provides the decision framework for where to invest in data. Is data an asset? That's a critical element you need to frame up in your data strategy.
One mistake I've encountered when talking to other leaders is, "Let's hire a data scientist, and they can get value from all our data." That's not how it works or shouldn't be the approach. There are stories of companies investing millions in data science teams only to be frustrated with the results because the team isn't creating a return for the company. Gartner's Nick Heudecker estimates that 85% of data science initiatives fail, a significant number. Ultimately, a data strategy well-formed aligns with the business objectives and empowers teams to make decisions on data collection, analysis, and sharing within the framework of the strategy. That's what Signet Bank did, now known as Capital One.
The Tail of Signet Bank
The 1990s gave birth to a data revolution. In banking, we saw the first fraud neural nets, data warehousing from Ralph Kimball and Bill Inmon, and one interesting story about a bank that shifted towards analytic thinking by Richard Fairbanks and Nigel Morris. Banks, at the time, did everything as a singular offering. The credit card offered to you was the same as everyone else. The standing belief was that no customer would stand for different treatment. Fairbanks and Morris had extensive work in analytical thinking, believing that a bank with the right mindset could win in the market. Signet Bank's leadership team felt that modeling profitability, not just default rates, would allow them to steal profitable customers away from other banks. At the time, they didn't know how to do it until Fairbanks and Morris found them.
The business strategy was to capture customers from competitors through understanding profitability. The data strategy that formed from Fairbanks and Morris was to acquire data as if it were an asset. That meant, since the data didn't exist, Signet would have to run experiments to develop a model. Models need data. If you're treating data as an asset, you might have to pay to get that data, which is what Signet did to know the best credit terms to offer that would generate a return. Signet went from a 2.9% charge-off rate to 6%. That was short-term. As the models became better, they were able to adapt their offering and became very profitable. Not only did they become profitable in the long game they played, but they also boasted some of the best customer retention and lifetime value.
Signet spun off its credit card division into Capital One and made Fairbanks and Morris the company's leaders. Capital One went early down their data strategy, plowing the way for those of us coming later. It might not be the strategy for everyone, but it's a great example of aligning the business strategy and data strategy into a superordinate goal for a win!
My dad once gave me the book "Effective Problem Solving" by Marvin Levine. The book covers the creative problem-solving process with various puzzles, logic problems, and multiple problem-solving principles. It's a great read, and if you're looking to sharper your mind, check out the crypto arithmetic section at the end of this article. Levine has some important concepts that may shape your approach if you're building a program around digital transformation or data modernization.
A few weeks ago, I discussed the problems with getting people to use data with some colleagues. Using data, and getting data, are not the same thing. If you've read any of my data literacy material, you can see a difference. Training people on the tools and building data platforms is the "if we build it, they will come" model, and it rarely works. Newer approaches combat that with better training, changing cultures, looking for outcomes over outputs, and investing in competency centers for data. It's easy to see the drive for data literacy programs. Building an excellent data literacy program can help you build skills in your people and help make them more effective problem solvers with data. These skills tend to be grouped into four categories: collection, evaluation, application, and management (more on these in the following article and my upcoming talk at https://www.groupfuturista.com/FOWA2022.2/). Data literacy programs generally need to address the different categories effectively so that individuals have a way to use data and tools to be effective in decision making. Training teams in these categories uplifts your company and helps shift toward a data-driven culture. But does that mean everyone will become data literate soon, and you'll have an army of citizen data scientists running around? Unfortunately, the answer is likely no because of the concept of intimate engagement.
In Levine's book, he doesn't cover big data strategies and complex data architecture, but he does cover what I think might be the most important concept in data: intimate engagement. Intimate engagement is how we used to find our next BI analyst or champion in the line of business. My team and I would look for people to train in the company that we thought "had the knack for data." After all, building skills and centers of influence with existing teams pays back tenfold compared to convincing a business partner to invest in additional resources. We didn't seek out people who could code or had a background in data. Don't get me wrong, those are great, and if we had those resources, we would use them, but we were looking for something much more simplistic. We were looking for people who had a natural inclination to dig into concepts and lean into problems. That is precisely what Levine talks about in his book.
When something doesn't work right, some people seek understanding of a problem, and others seek to find someone else to figure out the problem. The first group are your experts in problem framing. In the book, Levine gives an example of a stuck car seat. Some individuals sometimes say, "oh well, someone will have to figure this out for me." In other cases, they might lean in, look at what's blocking the seat, and find a bottle that had lodged in the seat. The second group intimately engages the problem, it doesn't mean that they'll always be able to fix it, but they're leaning in to understand the issue. Those are the individuals you want to invest in and train in data. Those individuals are the diamonds in the rough in your enterprise. They can accelerate your data literacy and digital transformations at a rate much faster than you're going.
I have had numerous experiences finding the next data talent by looking for those individuals. If you want to learn more about intimate engagement, problem-solving, and data, feel free to reach out to me!
Now, as promised, here's some Fun with Crypto Arithmetic:
S + M = MZ
In this case, it is likely that M is 1, since we can't add two single digits to something that starts with anything other than 1. So we know that S + 1 = 1Z. The only digital that can be added to 1 to get a two digital number is 9, therefore 9 + 1 = 10 is the same thing as S + M = MZ. We solved it!
Here's a harder one for you to try!
SEND + MORE = MONEY
Click Here For Answer
I was reading an article on master data while pontificating on how to solve some issues at the office and ran across the old golden customer concept in master data. For those unfamiliar, one of the concepts in master data management is the idea that you should have a single source of truth for customer information. Because customer data enters into two systems, there should be a framework to reconcile those systems. For example, suppose we have two systems; system A lists me as Mike Butler with an SSN of 555-55-5555, and system B lists me as Michael Butler with an SSN of 123-45-6789. The golden record model should support a way to know which is correct, store that information, and allow other systems to update themselves from the golden record. Ideally, you create this with a process versus just a direct application of technology. A business process using technology that ensures creating a new customer always follows the same process regardless of the platform is an excellent way to enable golden records. This process allows the primary system to act as a data hub for customer data, which would ensure an accurate golden record. All work with a customer starts in one system and flows into other systems. This has been a key enabler of digital transformation, but it's changing!
Early in my career in analytics, I had a team that had built many fantastic tools. They would set up models, reports, and reviews with the business. Every tool or process did a great job showcasing the capabilities we had with our data. But we were missing something. Teams didn't fully grasp the concepts behind reading and understanding data. We would complete one analysis, only to repeat the information to executives with an incorrect interpretation. We would have misunderstandings on reports when leaders tried to create their curated versions, and they would break simple logical rules. The problem was that we were giving people tools, but we weren't making them data literate. We determined that without more people who understood data, we were never going to achieve any success at self-service BI/analytics.
We set out thinking about how we could develop programs and engagement models to help people grow and build data literacy. We had a moderate data literate executive team, very focused on defensive data needs (auditing, risk, financials, etc.), but the middle management and staff were struggling. Fortunately, communication is easy, and you have to have everyone on the same page, talk and listen to the right amount, document everything in a way that everyone can understand, etc. Super easy (note the sarcasm). We set out to develop a training program, and we selected individuals from all around the organization that had the "gift." Maybe it wasn't a gift, but they had some training and had a good foundation for us to begin to build. These individuals were to become our influencers. We infected them with our thinking, gave them access to the data programs' architects and directors, taught them the basics of SQL, and made them a part of the team. We started creating a culture with this group that could help other groups. At the time, I had two incredible leaders, Jamie Hines (Eavey) and Nathan Maxfield, committed to building and growing this practice. There's still more work to do as if you don't maintain a literacy program, the data literacy decays. Self-service started moving forward after that, incremental but a good step.
While thinking about this article, it got me thinking about the challenges of digital transformation and the need for data-driven culture to steer decisions in efforts. One of the key drivers to enabling that data-driven culture is data literacy. Gartner defines data literacy as "the ability to read, write and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application and resulting value." I define it as a way to communicate effectively with data.
A few things to note about a data literacy program:
Hopefully, this is helpful. I had a data interpretation experience yesterday that had me thinking about this all day!
Over the weekend, I started to review various cloud platforms and the data platforms and changes that have occurred in them. As you think about digital transformation and the amount of data needed, you also need to think about how you will collect and store data, and there are many options out there.
The most robust offerings sit with Amazon AWS, but Azure and Google have a ton of offerings and some unique services that there's no clear "better" cloud. The AWS juggernaut in the cloud data infrastructure has something for every situation, but Google and Azure are close. I would caution you with thinking all the clouds are the same. They have different offerings and different services. While they may be similar, one may be better at specific tasks than another. The matrix depicted below is a generalization of the types of data services, mainly for something I was working on for myself, and I felt that it would be good to share it!
You can spin up anything you want through the marketplace features of many clouds, and these are no different. The distinction here is the native tools in the cloud offering that are managed by that cloud provider and not a third party.
Data strategies are helping to reimagine digital transformation in organizations around the world, but what of data ethics. Many companies embark on offensive data strategies that lack processes that involve data ethics. Many years ago, data ethics was a concept reserved for academia and research papers. For-profit companies used data sparingly and took some risks in the use and management of data. That's all shifted, though. Digital transformation brings forward a flood of information and data. A decision on how to use data is something that needs ethical understanding. Using data has direct business consequences. It takes on legal, regulatory, and reputational risks. Those risks, if not properly managed, can result in a direct impact on the bottom line.
Perhaps the most relevant example that we see in the market today is the cyber defense of data. Laws are catching up, and companies are being forced to take precautions to defend and protect the data they collect from customers. David McCandless & Tom Evans demonstrate the challenges around safeguarding data in this incredible visualization. Many of these events spark ethical questions around the role these companies played and their part in protecting that data. For instance, why did some of them even need to store data in an un-obfuscated form? Work in encryption and synthetic data creation from real data sets are expanding fields that seek to address these challenges.
A great story related to business ethics is Target's use of data to predict pregnancy. Charles Duhigg of the New York Times interviewed Target's Andrew Pole and learned what Target was attempting to do. Pole lays out the strategy Target used to mine data from women who registered for baby registries. They found patterns, of course, like purchasing lotion, scent-free soap, supplements, and other items common among those on the registries. Ultimately, they found that about 25 products could reasonably predict pregnancy with about 87% accuracy. They ended up sending ads to a girl still in high school, and her father became very upset with Target. Turns out Target was accurate, and regardless of an anecdote, is it ethical to use that data to determine personal information like that? While target marketing can be ethical, in this case, there are several ethical questions, like why market to a minor come into question.
Ethical use of data begins at the top of the organization. It should be a part of your data strategy and conversations with executives and the board. The esoteric nature of data ethics means that it may be essential to have individuals that understand ethics to challenge the use, storage, and management of data in an organization. Offensive strategies are particularly at risk for needing to understand the ethical risks in using data. This is because offensive systems tend to shift towards multiple versions of the truth (MVoTs), data meshes, and strategies that encourage use and experimentation with data. The ethical management of data, dissemination, access controls, and other policies have to move left so that the MVoTs don't create ethical challenges with AI, ML, and other strategic data uses.
AI is a specific use case that gets brought up in ethical studies. Regardless of the debate on if it's "really" AI, AI introduces a few things that make people uncomfortable. If data were ethically collected and managed, the data the AI uses can be a problem. If the algorithm is difficult to explain, regulatory and legal challenges, depending on the industry, can be concerning as well. Explainable AI is a set of tools and technologies that help to explain how models got to their results. This is very important when AI makes decisions without any human intervention. If poor data, unethical data, or unexplainable models are used in building an AI, it could impact the business's bottom line.
Ultimately, data strategies have to wrap ethics as part of the transparency of the use of data within the organization. Senior leaders need to understand how data is being used, not myopically, but the governance and strategy. Keeping leadership at the table for data strategy conversations, getting an ethics expert involved for perspective, and keeping your governance model connected to the ethical use of data will enhance your data strategy.
How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did (forbes.com)
A Practical Guide to Building Ethical AI (hbr.org)
Explainable AI | Google Cloud
World’s Biggest Data Breaches & Hacks — Information is Beautiful
Image Credit: Photo 56298104 / Data © Rawpixelimages | Dreamstime.com
If the last 20 years of the 2000's era have given us anything is a diverse and (at times) confusing array of data platforms and solutions. From on-prem databases to Hadoop to many technologies in the cloud that I couldn't list here if I tried. At this point, I probably learn about 1 or 2 new ones a week without researching. The strategic challenge that ensues is what platform best enables your digital transformation culture? Do you even need a platform? If you're under the gun like many other companies, you're probably thinking about transformations and if your data strategy needs a refresh. Like many of the great answers in life, it depends. This article will break down some common platforms in use today and apply those strategies to the situation.
There are several types of platforms out there; warehouse, lake, lakehouse, and mesh are among the most common. Each comes with benefits and drawbacks. There are also other types of design patterns for data platforms such as fabrics, factories, and others. A few key concepts are found across the platforms that connect to the organization's strategy. For instance, Monolithic vs. distributed architectures are an essential characteristic that maps back to the goals of the enterprise. For example, if the enterprise strategy is offensive data strategies, empowering product teams, mesh or fabrics provides a better means of doing than a warehouse or lake technology. At the same time, the tradeoff with governance and risk strategies becomes important as well. Distributed data architectures like a mesh distribute everything. Including how the data has to be governed means everyone has to help steer the ship. There are Multiple Versions of Truth (MVoT) versus a monolithic control model like a singular data warehouse which encourages a Single Source of Truth (SSoT).
When looking at data strategies for platforms, you may find that multiple strategies are needed. For instance, sometimes data, like financial data, need to use a warehouse strategy. While some aspects of product data and user interaction may work better where the data is stored and used closer to the product, but with access and integrations back to other data channels. This is similar to the data mart concepts of the early 2000s where data was processed and stored in different data marts that were rolled up into cubes. I'm sure we all miss explaining to the C-Suite why OLAP versus OLTP was important to consider with data strategies at the time. Lakehouses are an exciting hybridization of a monolithic structure that uses both concepts from warehouse and lake architecture. The pattern of hybridization continues as data platforms change and evolve. Check out the links at the bottom of this article for notes on the history of data warehouses and future trends.
Quick Notes When Re-Thinking Your Data Platform
What is a Data Lakehouse? - Databricks
A Brief History of the Data Warehouse - DATAVERSITY
(6) Data Mesh: Design, Benefits, Hype, and Reality | LinkedIn
As the topic of digital transformation looms over many companies worldwide, there have been many books, articles, and guides written about how to take the transformation journey. One of the common themes found in digital transformation efforts is the need to use data. Some companies have a working strategy that's evolving. At the same time, some companies are trying to decide what that even means. Companies have to develop a data strategy, at the executive or board level, that lays out the principle model of how data is to be used at a company. The problem is that the topic of data and strategies associated with it can become so complex. Many companies become overwhelmed by the idea of data lakes, data warehouses, data meshes, analytics, machine learning, AI, data ingestion, and all the other buzzwords surrounding data.
Data Offence and Defense
The most simplistic way I have found with my peers to help explain the differences between different strategies is to approach it as offense and defense. Leandro DalleMule and Thomas H. Davenport have written some excellent articles in Harvard Business Review on the topics I'll post in the references below. Essentially, think of your data strategy as a spectrum. On one end, you focus on defense and the other offense. This framework is easy for me to explain when working with groups that are not data practitioners, and it's a way to abstract away the technical complexities and focus on the company's direction.
A defensive strategy with data focuses on security, privacy, regulatory compliance, and integrity. Data is thought of as something that has to be protected and used for its intended purposes. This strategy has a control orientation around data and is commonly found in highly regulated environments like health care. An offensive strategy implies a focus on using the data for competitive advantages or differentiation. In this strategy, data analytics and enrichment are the main focus, with the company's orientation focused on deriving value from the data. Offensive strategies require a more flexible risk posture in the use of data, which means data silos and sources of the truth start to vary depending on the transformations imposed on the data.
There isn't a better strategy. Many times when I explain this to people, they think they have to be on the offensive. In business, yes, we play offense and defense. Data strategy is the same. But your company is better at one and likely focused more on one. Having a balanced approach can sound great but ultimately creates confusion in the decision-making framework at the company. For instance, if you were to announce that the strategy is balanced between offense and defense, the team will try to use that principle to determine where to focus efforts. It will be difficult, and they won't be able to. They don't know if they're supposed to focus more on offense or more on defense. When I worked with Enterprise Architecture and developed principles so that teams can work in more agile fashions, the same concepts appeared. Your security approach, if balanced, doesn't create a direction and a goal for a delivery team. It creates confusion on how to align to the goals of the company. Principles of design or strategy generate an orientation to how leaders should make decisions. Regardless of the company's strategy, it's essential to know which direction the strategy leans from the executive team.
Applying Scaled Thinking
With my experience and background in working with architecture and data, I have found that enterprise-scale is a huge factor in how these strategies play out. For instance, an offensive strategy at a massive company with 50,000 employees versus a company with 500 people is different. My experience comes from media-sized companies and start-ups, so the operations at scale at a company like Walmart change the dynamic of the strategy. I used to laugh when people would say big data because most companies had medium data, for which I have no formal definition. Tools and technologies might not need the framework from google to be effective, or you might not have enough data for building a strategy around analytics. You might need a download of customer data and a spreadsheet if you only have ten customers. That's important to keep in mind as you work with executives or teams on the data strategy for your company.
Innovation and Digital Transformation
There is a caveat to my experiences with strategy. In my opinion, the digital transformation movement requires an offensive approach. Data silos, or data gardens, walled off by departments and teams for protection will limit digital transformation capability. If the organizational principles focus on avoiding data risks over the management of data to empower the business, digital transformation efforts will suffer. That's not implying that a company shouldn't protect privacy, security, and other defensive techniques. If the focal strategy isn't offensive, it will be hard to win the transformation game with backup from the defense. I'm a big believer in making sure the principles and superordinate goals are in place that can allow self-empowered teams to drive forward. Adan Pope and Peter Buonfiglio discuss aspects of this transformational challenge in their book "Respect the Weeds" which is a fantastic read for anyone thinking about digital transformation.
Protecting people from fraud is one of my favorite activities, but it can be complicated. Occam’s razor states that the simplest explanation is often the best one. As a data practitioner, I've always had to make sure I keep that in mind. Complexity usually comes with expense, errors, and unexplainable models. It doesn’t matter if you work in data engineering, software, or analytics, the simplest solution is usually the best.
There are a lot of complex algorithms and model out there to detect fraud. Many of which I’ll cover later in this blog, but before I do that we have to hit some of the ones that I think are pretty amazing given their simplicity. These are some actual models I've used to detect and expose fraud in my career. All three of this incredibly simple models are approaches I've used to clean data or in some cases even find certain types of potential fraud.
So there are a lot of different algorithms out there that can be used for fraud, but the simple ones are sometimes the best to get started. I've known some data practitioners that have tried to jump into learning the newest and most advanced algorithms out there and while you need to do that, you also should make sure you pay attention to the simple ones!
Links and References
In my career, I can't count the number of times someone has said: "garbage in, garbage out." It's true with any type of activity that takes an input and transforms it into an output. Frankly, garbage data is a buzzkill when you're working with visualizations, machine learning, or any number of data topics. You may be amid your latest new breakthrough to help with customer attrition, only to realize you have so many issues with your data that the algorithm might be wrong. It's why data cleansing, modeling, and initial forms of analysis are centered around understanding the data and assessing its quality. There are ultimately 2 types of garbage data; known and unknown. This post is particularly interested in the unknown, but a specific area of the unknown is concerned with bias in collecting data.
Some of my favorite Authors, Morewedge and Kahneman, discuss biases as a process in the brain that creates predictable errors. These errors result from the brain developing energy-saving shortcuts and are a perfectly normal part of human behavior. Unfortunately, biases also can create problems for our data collection processes. As an example, let's say you're trying to collect data on people's knowledge of the solar system, and you ask a question like, "Given the billions of moons likely in the cosmos, how many moons are there in our solar system?" That seems like a harmless question. Well, you've just created a bias in the midst of writing the inquiry. You've just started an anchoring effect. Anchoring bias occurs when you have given an individual a starting value and then asked them to predict a value. The individual will adjust from the initial amount rather than draw on their knowledge of the solar system. This is a problem in data science because sometimes we're working with a data set that we have little data governance knowledge of. If an anchoring bias was in effect during data collection of a set of data you are asked to analyze, what consequences will that have on the analysis results?
Let's discuss another example, one that combines confirmation bias and selection bias. Suppose that you will collect data on innovation because you're trying to prove that individuals who are good at math are good at innovation. You collect data but pick out people you know are good at math and give them a test to measure creative output. You find that they score high, so you conclude that good math people are great innovators. Well, you may be correct, but your data collection process selected only people you thought were good at math, and you only sought to confirm the belief rather than disprove it, which is bad science!
In Machine Learning, Software Engineering, and Artificial Intelligence (We can debate the meaning of this later), we find a society the continues to use algorithms to drive decisions. Where the collection of knowledge around the solar system might be harmless, many are not. The decision to provide an individual a loan, show a particular ad, train chatbots, detect fraud, prescribe medical treatment, and others can profoundly affect people. It's exciting to think about using algorithms to find "the signal in the noise," but when we think about data, we have to be careful of "the noise in the signal."
Akter, S., McCarthy, G., Sajib, S., Michael, K., Dwivedi, Y. K., D’Ambra, J., & Shen, K. N. (2021). Algorithmic bias in data-driven innovation in the age of AI. International Journal of Information Management, 60, 102387. https://doi.org/https://doi.org/10.1016/j.ijinfomgt.2021.102387
Morewedge, C. K., & Kahneman, D. (2010). Associative processes in intuitive judgment. Trends in Cognitive Sciences, 14(10), 435–440.
Don't be alarmed, but I think you might be one.
Data hitchhikers are people like me that when I need some help I go out and ask for it through a tangible and relevant example. Sometimes it's not enough to just know that you could get from point A to point B, but you want someone to take you, or tell you how they've done it before. Tangible and relevant is the key to data hitchhiking. Stories about working with data have to be relevant, in that it's easy to see yourself in the shoes of those doing the data work. They also have to be tangible, so the most recent theory or algorithm isn't a good use for a hitchhiker because it doesn't connect to other concepts yet and is therefore difficult to connect with. Hitchhikers are applied data technologists that have a wealth of knowledge from seeing more to the world of data and have the ability to go new places through connecting.
I consider myself more an applied data scientist than an academic one. I feel like many people that will land on this site will find the same. They're applied learners in the fields of data science, and even if they have an academic background, they find application more interesting then theory. Hopefully it will help someone with a problem, concept, or just let them kill 30 minutes on a coffee break and needing an escape from work.
In full disclosure, most of this site is a result of a course on marketing and analysis of the data. While I've toyed with blogs in the past, I've not committed to it in many years so hopefully someone gets something out of it!
Beer and Diapers.
A study performed in 1992 by Thomas Blischok gave rise to the world of data mining and correlations of products. Blischok was doing a study on the buying patterns of customers for Osco drugs. He found a number of correlations between products during his analysis. The most famous one being a correlation between the purchase of beer and diapers occurring in the same transaction. They concluded that fathers, on the way home from work, were stopping to pick up diapers and while there, would buy the weeks beer supply as well.
This is my favorite example of what market basket analysis, also know as affinity analysis, can do and how it can teach us about our customers. Analyzing buying patterns of our customers can lead to an increase in our understanding of what product can be sold together. It can be used across a while array of industries including retail, banking, music, and a variety of others.
How does it work?
I'm going pull from my background of R and Banking to explain. In retail you may analyze individual transactions when someone checks out, but in banking we can analyze the customer themselves and the "basket" of products they have with us. We can do this to target efforts around increasing cross sell ratios.
Suppose you work for a bank that has 6 different types of products and you want to sell more mortgages. Traditionally, you may take your entire sales force and have them spend time focusing on cold calls and advertising. This is still valid, but your customer data may have a story for you on some mortgages that may already be within your relationships and might help strengthen your relationship with your customer!
We can feed the profile of a customer into R and build association rules. An association rule is simply taking a subset of the products and looking at the confidence that another subset of products relates to the first part of the subset.
For example create a file that has the following structure:
customer1, item1, item2, item3
It may take some tweaking, but the following R code should allow you to run a basic analysis on your data set.
mytransactions = read.transactions(file="Banking.csv", format="basket",sep=",");
itemFrequencyPlot(mytransactions, support = .25, cex.names = 0.8, xlim = c(0,0.3),type = "relative", horiz = TRUE, col = "dark red", las = 1, xlab=paste("Prop of Market Baskets Containing Item", "/n(Item Relative Freq or Support)"))
rules = apriori(mytransactions, parameter = list(support = .0025, confidence = .05))
Mortgage.rules = subset(rules, subset = rhs %pin% "Mortgage")
Mortgage.rules.top = head(sort(Mortgage.rules, decreasing = TRUE, by = "lift"), 5)
Your final output would look something like this:
How do you interpret it?
There are 5 columns in the output.
lhs - Left Hand Side of the rules (also called antecedent)
rhs - Right Hand Side of the rules (also called consequent)
support - frequencies of the itemset being analyzed. For example support of .01 would indicate that 1 in 100 customers have the itemset.
confidence - Estimate of probably for the consequent/rhs given the antecedent/lhs
lift - The ratio of confidence to expected confidence or the change in probability given the antecedent. Generally, anything over 1 indicates significant relationship.
Therefore, in line 1 above, we can interpret the results as those customers with an itemset of car loan, checking, credit card, and saving with a mortgage are 3.3% of the records in the dummy file. The probability of a mortgage given the itemset is .52 with a lift of 1.07, which means there is a potential gain if we were to target these customers with a direct sales effort. By create a file for you team of those customers with only a car loan, checking, credit card, and savings you can created a targeted sales list.
The business pitch; Spend time with those customers in the itemset and you increase your sales efficiency and cross sell potential. Customers with higher cross sell ratio have a tendency to have a higher retention rate as well.
This is a great set of tools for marketers, sales teams, and anyone looking to increase their cross sell capabilities. There are more advanced techniques, but hopefully I gave you a taste of what it means.
We can use market basket techniques to create a recommender systems as well, which I'll discuss in a later post. Recommenders are the reason that Amazon, Netflix, Google, and other services seem to have you figured out when you log in and they offer you 5 products that you've been thinking about. Did they read your mind? Yes and no, but now you know one of the ways they do it!
<i>Originally posted on linked at https://www.linkedin.com/post/edit/5995707413066498048</i>
This is the second article in my getting started with analytics series. We're going to spend some time discussing the fundamental segments of analytics and an example of each. There are numerous frameworks to support types of analytics. As I was writing the article I reviewed 3 different books on analytics, making sure that my comments hold academic merit, and each has classified the segments of analytics in different ways.
The types of analytics land somewhere from 3 to 5 different types in my research. One of the more common breakdowns has 4 types; generally considered to be an evolution starting with descriptive and moving to diagnostic, predictive, prescriptive respectively.
Descriptive Analytics seeks to answer “what happened”. This type of analytics reviews data and uses a lot of traditional research approaches. Generally, Classical or Bayesian statistical methods are used to learn about the data set. An example would be the average amount of money in a bank account on a monthly basis. Grab standard deviation and you have some idea of how much money someone keeps in their account and how much spending is occurring. The example below is a quick block of R code to show you how to get descriptive statistics from the iris data in R.
Example in R(Descriptive Statistics):
pairs(iris[1:4], main = "Iris Data", pch = 21, bg = c("blue", "yellow", "brown")[unclass(iris$Species)])
Diagnostic analytics seeks to answer why did something happen. I like to think of this as seeking potential causation for a decision. Depending on the individual they may characterize this as descriptive, but for discussion purposes it’s broken out into its own category. An example of diagnostic analytics might be a review of web click patterns on a website and building a decision tree that lets you see how someone is moving though the site as well as what’s being presented to them. This tree allows you to analyze that activity that has occurred. The example below provides a short review of how a decision tree model and plot can be built.
Example in R(Decision Tree Visualization):
model.rpart = rpart(Species~., data=iris, method=class)
plot(model.rpart, margin=.08, branch=.5, compress=TRUE, uniform=FALSE, main="IRIS Dataset Classification Tree")
text(model.rpart, fancy=TRUE,minlength=1, splits = TRUE, use.n=TRUE, all=TRUE)
Predictive Analytics seeks to answer what will happen in the future. It’s been around for a while with regression and time series analysis but new techniques are evolving every day. This branch of analytics focused on predicting what you may buy or do; think Google and what they're doing. There are a number of areas of predictive analytics as far as predicting values versus predicting classes. An example may be a sales forecast for your business. The code example below extends on the decision tree model above to try to predict the type of species of a plant from the iris data set.
Example in R(Iris Decision Tree Prediction):
model.rpart = rpart(Species~., data=iris)
Predicted = cbind(iris, predict = predict(model.rpart, iris, type="class"))
Prescriptive analytics seeks to determine what should be done. A stock portfolio optimization model is an example of this branch of analytics. Not only do we want to predict what might happen, but we want the model to tell us how we should allocate our portfolio as well. The code provided below is a simplified version of a stock portfolio optimization problem in R.
Example in R(Stock Portfolio Optimization):
stocks = getReturns(c("MSFT", "JPM", "GM", "VZ", "WFC", "PSX"), start="2014-01-01",end="2014-12-31")
model1 = stockModel(stocks)
optPort = optimalPort(model1, shortSell = TRUE)
I wrote the code above to facilitate understanding and isn't something to copy/paste and try to run at your business. My hope is that the concepts were useful as I found it difficult in the beginning of my own journey to find anything with examples. This space continues to evolve and we'll explore some more specific analytically approaches in later articles. Good luck with your own analytics!
Originally posted to https://www.linkedin.com/post/edit/getting-started-analytics-types-mike-butler