Summary
Organizations of all sizes are striving to become data driven, starting in earnest with the rise of big data a decade ago. With the never-ending growth in data sources and methods for aggregating and analyzing them, the use of data to direct the business has become a requirement. Randy Bean has been helping enterprise organizations define and execute their data strategies since before the age of big data. In this episode he discusses his experiences and how he approached the work of distilling them for his book "Fail Fast, Learn Faster". This is an entertaining and enlightening exploration of the business side of data with an industry veteran.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Your host is Tobias Macey and today I’m interviewing Randy Bean about his recent book focusing on the use of big data and AI for informing data driven business leadership
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by discussing the focus of the book and what motivated you to write it?
- Who is the intended audience, and how did that inform the tone and content?
- Businesses and their officers have been aiming to be "data driven" for years. In your experience, what are the concrete goals that are implied by that term?
- What are the barriers that organizations encounter in the pursuit of those goals?
- How have the success rates (real and imagined) shifted in recent years as the level of sophistication of the tools and industry for data management has increased?
- What is the state of data initiatives in leading corporations today?
- What are the biggest opportunities and risks that organizations focus on related to their use of data?
- At what level(s) of the organization do lessons around data ethics need to be embedded?
- You have been working with large companies for many years to help them with their adoption of "big data". How has your work on this book shifted or clarified your perspectives on the subject?
- What are the main lessons or ideas that you hope readers will take away from the book?
- What are the most interesting, innovative, or unexpected ways that you have seen big data applied to business?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on this book?
- What are your predictions for the next decade of big data and AI?
Contact Info
- @RandyBeanNVP on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI (affiliate link)
- Harvard Business Review
- MIT Sloan Review
- New Vantage Partners
- COBOL
- Moneyball
- Weapons of Math Destruction
- The Seven Roles of the Chief Data Officer
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means? Our friends at Outland started out as a data team themselves and faced all this collaboration chaos. They started building Outland as an internal tool for themselves. Outland is a collaborative workspace for data driven teams, like GitHub for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlan enables teams to create a single source of truth for all of their data assets and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.
Go to dataengineeringpodcast.com/outland today. That's a t l a n, and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $3, 000 on an annual subscription. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Randy Beane about his recent book focusing on the use of big data and AI for informing data driven business leadership. So, Randy, can you start by introducing yourself? Hi, Tobias. Nice to be with you today. I'm Randy Bean. I'm author of the book,
[00:02:12] Unknown:
Fail Fast, Learn Faster, Lessons in Data Driven Leadership in an age of disruption, big data, and AI. For the past decade, I've been a frequent columnist for Harvard Business Review, MIT Sloan Review, Forbes, and for 2 years, wrote a monthly column on Wall Street Journal on big data. 20 years ago, I founded the company, New Vantage Partners, where advisors and strategic consultants to Fortune 1, 000 companies on the use of data, basically. How do organizations become data driven? How do they leverage data as an enterprise asset, how do they build a data culture, and how do they learn to innovate with data in their business.
[00:02:58] Unknown:
And so do you remember how you first got involved in the space of data and data management and working with businesses to understand the utility and the capacity for data to act as a motivator of change?
[00:03:10] Unknown:
I sure do. I basically trained in college as a liberal arts major and maybe that's why I got into writing. So I studied literature and history and the classics and art history, and not really anything related to technology. But when it came time to get a job, basically, the jobs were mostly in technology related fields. So I was hired by a major bank, Bank of Boston, which is now part of Bank of America, and hired to be trained as a COBOL and assembler programmer, which I was. And to my surprise, they thought I was, like, very good at it, which astonished me. In any event, I was really more interested in the data, the inputs and the outputs than the programming per se. I mean, the program was really moving data around, and I was responsible for an application called deposit accounting history.
And what I quickly discovered was that they had vast repositories of customer information and deposit history for 7 years. And I asked the executives that were responsible for this. I said, what do you do with the 7 years of customer history? And they said, well, the regulators make us hold on to it for 7 years, and then we're free to destroy it. And I was like, what? Oh my god. You know, this is such an opportunity to mine this information, to understand customer behaviors. And really from that point forward, I was on a mission of sorts to make organizations
[00:04:40] Unknown:
think more about the data and the information they had and how they could learn from it. Yeah. It's definitely interesting that they were seeing it more as a burden and a chore to have to keep the data around and not as a, you know, potential gold mine as most companies would view it these days. Although there is the potential for data to become potentially sort of toxic and become a liability, so it's always a useful perspective to keep in mind. Yeah. And maybe at some point, we can talk a little bit about the toxicity
[00:05:08] Unknown:
side of data because in my book, I have a chapter on data ethics and even have some quotes in there as they pertain to Facebook that seem to be very timely given the events of the past few days.
[00:05:21] Unknown:
Absolutely. And so digging more into the book itself, I'm wondering if you can just give an overview of the particular focus that you put into it and what motivated you to choose now to actually sit down and write it and, you know, who the intended audience is? I wrote the book now. I I really wrote it as a legacy project in terms of,
[00:05:43] Unknown:
hey. You know, I've been in the field for a generation, and I've seen a lot of things attempted, some successful, some failures. And what are the major takeaways that I've learned from that experience? And the last winter was going into a second COVID winter. And I said, well, I can't travel anywhere. I mean, it's gonna be long, dark, cold days in the northeast. And I've been asked from time to time over the years by publishers, why don't you write a book given you've written all of these articles? And I said, oh, absolutely not. But I consented, and once I agreed to it, the difference between writing a book and writing articles are basically articles are about a 1, 000 words, and you can crank them out in various themes. A book publisher was looking for something that was a minimum of 50, 000 words. So the first thing I did was conceptualize it in terms of what were the major themes I wanted to address, and there were 10 themes that I addressed in the book, 10 chapters. And so, basically, I thought of it as 10, 5000 word essays, but I had also used the number of case studies that I've written over the years based upon a number of organizations. So I was able to draw upon those case studies. So in essence, I was writing 10, 25 100 word thematic essays and pulling together this information to support the argument. So that's the reason for the book and how I approach the book. I really wrote it for 3 audiences.
First of all, I wanted to basically educate a broader audience of general users on why data really matters and why it matters now. That's a major theme in the book. In addition to that, I really wanted to reach senior business decision makers, board members, c suite executives who often see data as just another project or something that should be pushed off to the corner, whereas I see it as central to everything that we see and do and think about. You know, I can't just in day to day living, you know, you're hearing about positivity rates and development of new vaccines and political polls and Nielsen ratings and how much various films movies or television shows did at the box office, sales figures for different products and services. So everything in the sense relates to data.
So I wanted business executives really to understand the value and why it should be central to their organizations, and why companies like Amazon and Facebook, depending upon your view on that, have really developed highly data driven businesses and developed substantial market share as a consequence of that. And then lastly, I wanted to write for the practitioners, and that's why I pulled together roughly 25 case studies from leading fortune 1, 000 companies to show how leading companies were approaching various issues around data management and analytics and data driven AI and steps they've taken to build a data culture. So it's intended for multiple audiences in that degree.
It's ambitious in trying to really elevate the data discussion.
[00:08:49] Unknown:
Yeah. And to your point about everything these days, having some element of data being part of it, you know, even in just your day to day life, not even actually in sort of working in the technical sphere. And 1 of the things that's probably tangential to this conversation, but that I always take note of is how people are generally unable or unwilling to sort of understand the semantics of that data where you see the number and you take it at face value, but nobody really stops to dig into how was this number obtained, what was the processing for it, what is the, you know, sourcing of the data, you know, how did they perform the analytics? And that's interesting and useful and important information as well that people say, oh, you know, taking the vaccine rates, for example, you know, it's 95% effective.
And then, you know, for in the case of the Johnson and Johnson vaccine, because of the different timing of when it happened, you know, it was 75% effective or something like that. And it's like, well, those numbers taken in isolation are essentially meaningless because you're not taking into the account the broader context of how they were obtained and where.
[00:09:54] Unknown:
Yeah. 1 of the things I've learned is context is really everything, and it's been escalated in terms of the importance of context. You know, what I used to say is that you went into a corporate boardroom and you presented a set of numbers in terms of how many customers the organization had. And different areas the organization would come to very different different conclusions in terms of the actions that could be taken. So it was all through the lens in which they viewed these things. But these days, you can take the exact same numbers, which, you know, often you'll see in the news, and they can be used to represent virtually entirely opposite points of view. In other words, you could say, here's the numbers, and 1 group says conclusion of that is everything is, you know, up. And another group will say, you know, that means everything is down. The glass is half full. The glass is half empty, black or white, anything that's an exact opposite. So, you know, you can have the data, but the interpretation of data can be highly subjective, highly selective, and also people can always cherry pick the data to support, you know, their particular arguments.
1 other thing in response to your previous question, the other reason why I wrote the book is because I wanted to create a sense of urgency and emphasize that now more than ever, data is important. And if you don't mind, I'd like to read from the first chapter of the book, which really sets the stage. The book begins as follows. The world is in a race to become data driven now more than ever. The warp speed effort to organize scientific and epidemiological data from across the globe in our heroic effort to find a COVID 19 vaccine has illustrated the urgency and existential nature of this quest.
We need data, science, facts, knowledge, and insight to make informed, wise, and critical decisions. Now more than ever, data matters and having good data matters tremendously.
[00:12:02] Unknown:
Absolutely. And to the point of the focus of the book and part of the subtitle being that it's lessons in data driven leadership, And the sort of overall goal of being data driven is something that businesses and leaders have been touting as 1 of their top priorities for a number of years now, you know, at least the past decade, if not before. And I'm wondering what you see as what are the concrete goals that that typically entails when somebody says, I want to become data driven? Or is it something where they hear this, they say, oh, yes, I want to be data driven, but they don't understand what that actually means and what is required of them to achieve that outcome.
[00:12:42] Unknown:
Yeah. Precisely. So let me frame it in this context. You know, 1 of the premises of the book is that leaders will have to act faster. They'll have to think differently. They'll have to consider the ethical consequences, and they'll have to embrace change to basically become data driven organizations. But what happens from my experience with large Fortune 1, 000 companies is 2 things. First of all, many of them pay lip service to the notion of becoming data driven because they hear about it and they see about it. They see it. You know, they see, you know, the notion of Moneyball and professional sports and how it's made professional sports more data driven. So they say, yep, you know, we have to become data driven.
We'll name a chief data officer, but then they think it's over and done. They're not prepared to do the hard work, which is really hard work and requires persistence and sticking with it over many, many, many, many years, sometimes decades to be quite frank. I tell the story sometimes of going into an organization and meeting with the president of the consumer insurance business. This was about 6 years ago, and this executive said to me, we'd like to bring you in because our goal is to become data driven. So we've allocated 60 days to this effort. So we'd like to hire you so that our 41000 company will be data driven within 60 days.
You know? I basically smiled, thanked him, and walked out the door. So that's often the mindset that happens with with large organizations is they just don't really understand the magnitude of the challenge because data is an asset that flows across an entire organization from production to consumption to creation of new sources or new calculations along the way. And, traditionally, companies have not been organized to manage data as an asset. There hasn't been incentives for people to share their data. You know, it's amazing talking to the major pharmaceutical companies who were very siloed. Often, they'd have 100 different drug programs, and they had individual teams scattered around the world that were focused on these individual drug development and clinical trials and activities of that kind.
But there was no reason, no incentive, no mandate to share the data more broadly across the entire organization, across the enterprise. And for many of these firms, it was only in the context of COVID that they started for the first time to begin to share this data. And through that, they saw correlations related to COVID, but also to many other things that they had never anticipated. So it's a different way of thinking and acting for most organizations. And when you look at legacy companies, legacy fortune 1, 000 companies that have existed for decades, generations, and in many cases, over a century, that type of transformation and change doesn't come easily.
[00:15:53] Unknown:
Yeah. It's definitely humorous to think that some executives are under the impression that 60 days is sufficient to achieve something like that. But the siloing of data is definitely 1 of the sort of long running challenges of people who are working in the field. And even in newer companies where they say, oh, yes. We're going to be data driven. We are going to use all the latest tools. Because of the organizational structures, things inevitably end up in silos. And so that's been 1 of the challenges, particularly in the past 2 to 3 years in the sort of technology layer of understanding how to manage those silos, whether it's with the advent of data mesh and treating, you know, the data as a product that is you know, has APIs at these organizational boundaries or using data discovery and data cataloging to be able to find those different data sources across those silos and then be able to access and query across them. So it's definitely interesting the way that or forget which sort of apocryphal odd is, but where the communication patterns of the software mimic the communication patterns of the organization, and the same is true of data.
[00:17:00] Unknown:
Yeah. And I give a few case study examples in the book about the challenges organizations continue to face around data preparation and data quality and ETL related types of activities. And I do relate 1 anecdote in the book, which is kind of ironic. About 10 years ago, I was reading in a college alumni magazine or thumbing through 1, and I saw that a former housemate of mine had been appointed assistant secretary of defense for research and development with something like a $1, 000, 000, 000, 000 budget or or some extraordinary figure. And it listed that he had 3 mandates, and 1 of them at the time, this was 2, 012 or so, was to develop a plan for the defense department to leverage big data. Okay? You know, it didn't say anything more than that. It just said, you know, develop this comprehensive big data plan with a multibillion dollar budget.
So I reached out to him, said, hey. You know, it's been a number of years. How you doing? Congratulations on the appointment. Saw what you're doing around big data, and, you know, here's actually a couple of my articles from the Wall Street Journal, and here's what I'm doing. And he said, oh, great. Can you come down to the Pentagon next week to speak to a small group of folks? And I said, sure. Why not? I've been a tourist. I've never been a guest. So the following week, I flew down to Washington DC, went to the Pentagon, went through many layers of security where they confiscated virtually everything that I had with me, And I walked into a room of about 18 people, 6 of them were in full stars and bars generals, another 6 were in camouflage fatigues, and another 6 were in business suits and ties. And I looked around, I said, I have no idea who the decision maker is here. And then they said, the reason we asked you to come here is because we execute these campaigns, and we need to have the very best data, and we're spending too much time on data preparation and not enough time on analysis and decision making. We're spending 80% of our time on data preparation and data cleansing.
And we'd like to learn how 4 to 1000 companies are doing it. You know, we assume that they're spending 80% of their time on analysis. Well, you know, I had to abuse them of that notion and share with them that their experience was consistent with most large companies. The the irony of it for me was that I was used to talking to large companies that when they use the term campaign, they were talking about increasing their customer retention, increasing their customer acquisition, things of that kind. And I came quickly to realize when they were talking about campaigns here in this context of the Pentagon, they were talking about having data to make very precise decisions about drone strikes.
So same terminology, but very different context and with much greater consequences.
[00:20:06] Unknown:
Yeah. It's funny how consistent that 80% number has been through, you know, the past decade plus despite all of the technological innovations. And there have definitely been a number of arguments that have been made and that I agree with, at least in some part, that the data preparation is part of the analysis because you don't understand the data, then you can't actually make an effective analysis. And so you shouldn't be trying to get rid of the preparation phase. You should just be, you know, actually investing in that preparation phase and, you know, not relegate that to say, you know, let the intern do the preparation so that so that the important people can do the analysis.
[00:20:43] Unknown:
That's a great point and good insight.
[00:20:45] Unknown:
So digging more into the barriers that organizations encounter in the pursuit of being data driven, I'm wondering if you can share some of the sort of common challenges that they run into and if there has been any shift in the success rates, either real or imagined, of these different efforts to improve their capacity for data analysis and data storage and, you know, being able to propagate that up through the different layers of business to actually truly become data driven?
[00:21:14] Unknown:
Yeah. I'd say a few different things. So I'm gonna share with you some data data person sharing the data. From an annual survey that we've been conducting for the past 10 years among senior c executives of Fortune 1, 000 companies, so the respondents tend to be, these days, chief data officers, chief analytic officers, so roughly 76% of the respondents are chief data and analytics officers, and the others include CEOs, chief digital officers, and chief information officers. And we asked, yes or no. Are you driving innovation with data? 48.5% said yes. The balance greater than 50% said no.
Are you competing on data and analytics? 41.2% said yes. Nearly 60% said no. Are you managing data as a business asset? 39.3% said yes, over 60% said no. Have you forged a data culture? 24.4% said yes, roughly 75% said no. And have you created a data driven organization? 24% said yes, and 76% said no. So what that highlights, you know, you could view it from a couple different lenses. Some people have heard those numbers and said, wow. You know, that's really bad. We've really done a poor job. But the other side of that is there's a tremendous opportunity. There's a tremendous opportunity for organizations to learn, to get better, to improve, and that creates a tremendous opportunity for the data profession. It means that there's gonna be a demand for people and expertise and skill for the next decade or the next multiple decades. So tremendous opportunity.
I also think that these numbers reflect a growing realization and awareness within organizations of exactly what they're good at and what they're not good at. Because when we ask these same questions several years ago, the numbers were higher, and people have said to me, oh, you know, have we gotten worse? Well, you know, in some respects, new sources of data proliferate, so the challenges continue to mount. But at the same time, you know, a decade ago, of the same organizations that we surveyed, 12% had appointed the data officer at that point in time, and now it's up to 65%. So I think organizations are becoming more self aware and more self critical and more realistic about exactly what their data capabilities are. So that's good news.
The other piece of data that I'd share with you to your question was we asked these same organizations, what's the principal challenge to becoming data driven? And only 7.8% said it was technology, and 92.2% said it related to people and business process and culture. So, you know, the major challenge with most organizations, you know, there's an abundance of some really good technologies, But it's all of the issues around organizational alignment and change management and communication that are holding many organizations back. I sometimes relate the story because it happens time and time again in 1 form or another. Where I go into a leading fortune 1, 000 company and I meet with their data teams, and they tell me about the robust capabilities that they've created, and they're justifiably proud of those capabilities.
Then I'll meet with the technology organization, and they'll also talk to me about the data engineering and the platforms they've created, and they're justifiably proud. Then I meet with a line of b business leaders, and they'll say, you know, we don't have confidence in the data we're receiving. We're not receiving the data that we need to make decisions when we need to make those decisions. We have concerns about the timeliness of the data that we're receiving. So there continues to be this gap between expectations and ability to deliver a business measurable business value with the capabilities that have been created. So I don't think this is a bad thing because there are a lot of capabilities that have been created, but it's really these cultural issues, communication, common understanding, bridging the gap.
So often I talk to data and technology folks, and they will tell me sometimes, they'll say, well, the business folks don't get it because we're building a platform for the long term that will service their needs for the long term. But the business folks have to meet often daily, weekly results and quarterly measurement. You know, they have to report to the financial markets into the street on a quarterly basis. So they're very much living in the moment, and they need to answer the questions that they need to answer when they need to answer them. And often, they don't know the precise form because the questions can change. And, you know, often people can use the same language to mean different things.
So they're in an environment where they're kind of living these issues real time, and it's great that long term capabilities and platforms are being created. But I often encourage my data and technology friends and brethren, and having come out of that world, is to put yourself in the shoes of the end business user, and what can you do in terms of creating self-service capabilities or speaking in business terminology that they understand or asking them what are the most critical business questions that they're not able to answer today and what specific pieces of data are needed to answer those questions. Because sometimes they don't need all of the data. They may just need a very small subset of data. So it's really breaking down those barriers and creating a common understanding and a trusted working relationship that seems to bring the breakthroughs and success for those organizations that are able to make some progress.
[00:27:22] Unknown:
Yeah. It's definitely always interesting when you talk to somebody who's very deep in the weeds on a technical basis about, you know, what are your highest priorities and what are you doing to achieve them, and then you talk to the business person. And those 2 priorities are wildly divergent of, you know, I wanna make sure system has high uptime and that I'm, you know, processing the data as quickly as possible, but, you know, who cares if you're not answering the questions that the business person needs? Because, you know, if the business isn't succeeding, then you're not gonna have the funding to be able to embark on this, you know, multi month or multi year process to build out this massive, you know, fully self serve data platform. So you're figuring out what are the iterative steps to be able to say, okay. I can answer your specific needs right now. And then, you know, next, I'm going to be able to do this in a more automated fashion, and then I'm going to apply predictive analytics to be able to give you a, you know, future looking answer to that question. Yeah. I tell the story sometimes because I thought it was hilarious and indicative at the time, and it was about 10 years ago.
[00:28:21] Unknown:
And an executive was sharing with me, a data and technology executive, that they went to the president of the company and asked them for $25, 000, 000 in funding for an MDM project. When the executive said, get out of my office, and until you can come back to me and speak in terms of the business benefits and the value and how much revenue or how much cost savings or how many customers, then we can start having the conversation. But when you come to me and talk and speak in terms like MDM, you know, that's the nonstarter
[00:28:56] Unknown:
for me. Absolutely. And I think it's also informative to your point that the sort of rate of response of companies who say that they are data driven or that they're succeeding in their initiatives to be data driven has actually dropped in recent years. And as you said, I don't think it's because they have lost any capacity in being able to answer those questions so much as they have gained understanding of what it actually means to be data driven, and so their overall confidence in being able to answer that question in the affirmative has dropped despite an increase in capabilities.
[00:29:27] Unknown:
I think we're getting better all the way around, but it's a journey. You know, I don't think that there's a destination that you get to where you declare that you've become data driven. And people often ask me about, well, you know, what organizations are data driven and what are the best examples that you see out there? And I tell them time and time again that the organizations that many companies consider to be most data driven when I engage with them, they're always perpetually nervous. They're always looking over their shoulder. They're wondering how they can be better. They're fearful of new Fintech or InsurTech or Big Tech competitors.
So they're perpetually restless. And, from my experience, that's what really drives data driven companies and innovators in being data driven organizations like Capital 1, for example, and financial services. When I go into an organization and ask them about their data and analytics programs and they say, oh, we have everything under control. We're all set. I take that as a bad sign.
[00:30:40] Unknown:
Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world's first end to end fully automated data observability platform. In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem with broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing the time to detection and resolution from weeks or days to just minutes.
Start trusting your data with Monte Carlo today. Visitdataengineeringpodcast.com/impact today to save your spot at Impact, the Data Observability Summit, a half day event featuring the 1st US chief data scientist, the founder of the data mesh, the creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 people who RSVP will be entered to win an Oculus Quest 2. In terms of the data initiatives that these companies are undertaking, I'm wondering if you can give an overview of sort of the overall state that they're in and some of the ways that the goals and manifestations of those initiatives have changed over the past decade since the initial hype of big data to where we are now, where data has become much more pervasive?
[00:32:10] Unknown:
Well, 1 of the things I'd say is the proliferation of data and the introduction of new sources, is particularly unstructured data, which some people would say constitute 80% of all data now, has been concurrent with the growth in computing power. So, for example, in the past, organizations had to rely heavily on samples. And now, for example, organizations like American Express, if you go in for a credit card approval, they can look through all of your credit history plus the general parameters for all credit card history and very quickly determine whether it's a good transaction or a fraudulent transaction. So just the sheer magnitude and processing power of what organizations can do with data has been revolutionary compared to where we were a decade ago or longer.
And then to that point about unstructured data sources, you know, you have GPS and signals and texts and documents and pictures, and organizations are just beginning to scratch the surface in understanding how those capabilities can be used. They have at least 1 or 2 case studies in the book that talk about, for example, how insurance organizations look at satellite and other imagery to address claims, insurance claims, whereas before they had to send individuals and they had to do complex site inspections. You know, now they can be done much more rapidly using satellite telemetry and other capabilities.
So there's many examples, but the computing power and the evolution of unstructured data in particular is, you know, presenting new opportunities and a whole opportunity for organizations to expand how they think of data and what constitutes data for their organization.
[00:34:09] Unknown:
As you're discussing the use of the massive computing capabilities of companies like American Express to be able to determine if a transaction is fraudulent, it also brings up the interesting difference between being able to have capacity for using data for a given product and for end user facing solutions versus being data driven as a business leader? What are some of the ways that you see that manifest where the company has capacity for being able to build out these various data products, but they're not necessarily able to gain the necessary insights at an organizational level using those same technical capacities?
[00:34:46] Unknown:
I guess I could answer that several different ways. I'll start by answering it as that some organizations use these additional capacities for what I'll call defensive purposes, back end, regulatory, processing efficiencies. So these are things that are largely invisible to the customer. I mean, if you process a credit card transaction, it really doesn't change anything from your perspective, whether it was done looking at a small sample or at all transactions. You just know it was approved or declined. The other side of things, so when organizations are using data for revenue generating activities, and just think of Amazon or any other type of online digital company that can produce offers for you, highly personalized offers based upon your past purchase history, your shown interest.
So the degree of personalization, I think, is 1 area where consumers have benefited and can see the exact value of data,
[00:35:53] Unknown:
and that includes on mobile devices as well. It's definitely interesting the number of different ways that data can be used and sometimes even the same sets of data for wildly different purposes in terms of who it impacts and what is required as far as investment and capacity to be able to actually put that data to use. Absolutely. Continuing on that, what are some of the biggest opportunities and biggest risks that organizations are focusing on related to their use of data and recognizing going back to the idea of data being potentially toxic, what are the, you know, useful and beneficial assets that they have, and what are some of the ways that that data might become a liability for them?
[00:36:36] Unknown:
Yeah. I think 1 of the biggest issues that organizations are facing today is around the issue of data ethics. And I host a series of quarterly chief data officer roundtable discussions and and actually just hosted 1 on Tuesday of this week with chief data officers from organizations like American Express and and JPMorgan and Eli Lilly and Mayo Clinic. So it's across the board in terms of a mixture of financial services companies, as well as pharmaceutical and health care companies. Mastercard would be another, and Visa is another, and data ethics has really become, top of mind for these organizations, and responsibilities in terms of data privacy, it's become an imperative for these organizations and something that's increasingly becoming a mandate of the chief data officer.
And with the hearings earlier this week on Tuesday in Washington with the Facebook whistleblower and the segment on 60 60 minutes. Began talking about data and algorithms in the context of weapons of math destruction, which was a book that Cathy O'Neil wrote several years ago. And I have this quote in there that actually relates to Facebook as it turns out. So it's from a April 23, 2018 Wall Street Journal book review. And in the context of the book review, it states the following, the article's entitled Big Data, Big Problems, and it goes on to say, Big Data is the big bad of our moment.
Companies and governments amass enormous troves of information about our online and offline activities, so they can understand them better than we do. Recently, we learned that creepy firms like Cambridge Analytica mine big data from websites such as Facebook. Facebook itself seems increasingly creepy, grounded in lying to the public about what happens to the data it collects. So, you know, this was written in 2018 in April, and here it is October 2021. And, you know, now the senators and congressmen are kind of cluing in on an issue that has been a growing concern for a number of years now. Yeah. It's definitely
[00:39:09] Unknown:
an interesting case because in some cases, you know, the objective of a business like Facebook is to make more money and return value, at least in terms of monetary sense, to its shareholders. But it also because of its far reaching impact and the, you know, breadth of society that it interacts with from a perspective of ethics should have some measure of responsibility for the negative outcomes that it produces. And so it's definitely an interesting discussion and also an important aspect of ethics in terms of how it pertains to the use of data and the use of technology to analyze it and the ways that that information is then put to use in these organizations.
[00:39:56] Unknown:
Yes. And at our roundtable the other day, couple people cited a great example, and they're explaining this was people from health care as well as financial services. You know, when you go into a bank or a credit agency or an insurance company, you have to sign a whole bunch of forms to give them permission, and it's pretty explicit in terms of how the data can be used, and it can be used for profiling and can be shared in certain context. But it's pretty narrow and limited within the scheme of things. And the person from 1 of the health care firms pointed out that when you're in a health care setting, people are typically operating under duress.
You know, we can't do this test to see whether you have something that's benign or whether you have 2 months to live until you sign all of this paperwork so people will freely sign away. So that's taking it to the next stage. The 3rd stage, which was pointed out, which is the situation with Facebook and others of the big tech companies, is that when you're agreeing to basically the terms and conditions, they're actually embedding capabilities that track the remove that you make, every site that you visit. So it's actually can be considered a form of spying. It's not just permission to use the data, but it's permission to basically go on your devices and follow you everywhere you go, which is often not well understood at all and raises the whole ethical question to a new level. The sort of level of
[00:41:33] Unknown:
obfuscation that goes into the terms and conditions and the intentional length of them where people just click through has become legend where I've I don't know if it's true or not, but I've heard 1 reference to somebody who put in their terms and conditions that by, you know, clicking agree, you agreed to sign away your firstborn to the company. You know, obviously, they didn't intend to actually act on that, but just as a sort of case in point of how much people do not pay attention to those. Right. Yeah. Exactly. And I think that what's gonna come out of this is
[00:42:03] Unknown:
greater education and some guidelines in terms of exactly maybe in less legalese and plainer
[00:42:12] Unknown:
language in terms of what exactly you're agreeing to. And continuing on the subject of the ethical use of data because of its potential to have both great benefit and create great harm, what are the levels in an organizational sense that are responsible for identifying and enacting the ethical use of their data and technology?
[00:42:37] Unknown:
Yeah. You know, Mastercard's really been at the forefront of this in a few respects. I mean, 1 thing that they've done is create the Mastercard Center For Inclusive Growth, which is a little bit different, but it's really looking at opportunities to take the data that it has and apply those for social good in the community. But the chief data office at Mastercard is really built around ethics and privacy to a significant degree. It's the only company where, to my knowledge, the chief data officer is actually a lawyer by training. And, you know, they work with a a broad set of financial services institutions that they're supporting the credit card and payment transactions for.
But I wrote a paper in Harvard Business Review about 2 years ago with my colleague, Tom Davenport, on the 7 roles of the chief data officer. And we talked about chief data officer is like the head analytics person or the data management person and a variety of different configurations. But we also talked about the chief data officers, the chief data ethics and privacy person. You know, Mastercard is the primary example of that, and we see more of that in the years ahead.
[00:43:53] Unknown:
And as technologists who are working with the data, what is their sort of level of responsibility and opportunity for identifying potential ethical issues in the information that they're collecting or how it's being used?
[00:44:08] Unknown:
Yeah. You know, there's not a lot of common standards and policies surrounding the use of data within most organizations and many organizations. In Europe, they've established the GDPR standards, and in California, they've established the CSCP, if I'm getting CCPA. CCPA. Thank you. For getting that exactly right, and have done some work with Cam Carey of the Perkins Institution in the past couple years in terms of understanding. He's really been focused on what should be the standards for data and what is government's role in in managing that. But I'd say it's at an early stage, you know, as you know, tech, big tech has been largely unregulated.
You know, it's hard enough to regulate financial services because, you know, from my experience, the people that are innovating in the business work a 1000 times faster than the regulators. So you can have regulation on Wall Street, but the people that are developing the the trading algorithms are working. You know, they're 5 steps ahead of the regulators. But, you know, there really has been for a decade or 2 behind, you know, maybe consideration of where there should be some degrees of regulation in high-tech and big tech, I should say. And it's it's ironic because, you know, I think I was watching CNN the other day, and I think it was Ed Markey, the senator from Massachusetts calling for a whole new level of regulation on big tech. And then you also had Josh Hawley, the senator from Missouri and the completely opposite end of the political calling for the exact same thing, but for very different reasons. So maybe it'll be something that's a bipartisan certain to be coming in some shape and form.
[00:46:08] Unknown:
In terms of the project of writing this book, as you mentioned, you've been working in the space for a number of years now. You've had the opportunity to work with a number of very large and impressive organizations who are doing important work. And I'm wondering how the work on the book specifically has clarified or shifted your perspective on the overall subject of the use of data and its potential for change within the organization and its potential for having positive impact for the different companies that you're working with. Yeah. It's really been a learning experience for my part because I brought these perspectives to bear. But 1 of the things that I've been doing is speaking in large groups,
[00:46:48] Unknown:
to a range of organizations. So for example, last week, I spoke to, a few 100 people at Charles Schwab who are in data analytics and insight organization. I've been speaking with other Fortune 1000 companies, and it's just fascinating to hear their questions, their perspectives. So just in answering the questions, you know, forces me to think and to think maybe in new and different ways and outside of ways that I was thinking when I wrote the book at the time. So it's expanded my own personal perspective in terms of the challenges as well as the opportunities that that organizations face.
[00:47:30] Unknown:
And for people who are reading the book, what are the main lessons or ideas and takeaways that you hope readers will come out of it with? Yeah. As I mentioned at the outset, there's
[00:47:41] Unknown:
10 chapters and each serves a purpose. So the first chapter and I try to be a little ironic or entertaining in the book. So the first chapter is called the little history of big data, so kind of little and big. But it's really, you know, what have organizations been doing for the past generation and what are some of the lessons learned? And organizations are still facing the same issues. In other words, you know, 30, 35 years ago, I heard people say, you know, we're trying to get insights from the data, And that's what they still say, and I described somewhere in the book the story of there was somebody that was completely not in the data field. They were a jazz musician. I was talking to them in in a bar, and they were asking me about what I did. And I was trying to phrase it in language that anybody would understand, calling it Moneyball for Business. And the person looked right at me and said, well, if we have all this data, how come we're not any wiser?
And, you know, I think that Mike said said it well. The 2nd chapter is called think different, becoming data driven, and it's really about becoming data driven really requires a different mindset and a different way of thinking. It's not a traditional asset that organizations are established to manage. You know, in chapter 3, I talk about insight and knowledge, data science and facts, and just the importance of data and cherry picking in subjective qualifications and sometimes be brought to us that it's important to view data in context. In chapter 4, I talk about the state of data in the corporate world today, and that's some of the data findings that I shared with you earlier. Chapter 6 is about the rise of the chief data officer. You know, this was a role that didn't exist a little more than a decade ago, mentioned gone from 12% 10 years ago to roughly 65% of Fortune 1000 companies today, so it's really becoming an established role within the c suite. At the same time, there's tremendous turnover in the role. Just wrote an article in Harvard Business Review in August with Tom Davenport where we gathered data and found that the average tenure of a chief data officer was about 2 and a half years. And from personal anecdotal experience, can give you a long list of companies that are on their 3rd, 4th, 5th, 6th, and even 7th iteration of chief data officer.
And I was speaking with 1 of those companies that was on the 7th iteration just the other day, and they were saying, well, you know, I'm the 7th chief data officer. And I said, yes. I know because they tell the story of companies on their 3rd, 4th, 5th, 6th, or 7th, and people will say, well, yeah, we can understand the 3rd or 4th, but we think you're engaging in hyperbole when you say the 7th. And this person says, well, hey. Here I am, number 7. So it's a reality. Chapter 7 is on data responsibility and data ethics.
Chapter 8 is on data innovation and disruption. In other words, how data can be used to disrupt traditional industries and traditional businesses. Chapter 9 is a glimpse of the future and that talks about data driven AI in this notion, you know, 35 years ago when I started in the industry. Yeah. Organizations were engaging in AI, but they didn't have the large datasets, the computing power, the volume of data. So that's really, in recent years, significantly enabled AI. And then the concluding chapter is 1 company's odyssey, and it talks about the process being a journey, not a destination.
And I relate the story of 1 company, which happens to be American Express, who I've worked with and interviewed for articles over the course of the decade. And it talks about their trials and errors and their false starts and their progressions. But even this week with American Express's latest chief data officer who is number 3 there participating in the roundtable discussion. You know, they're embarking on a whole bunch of initiatives because even though much has been accomplished over the past decade, there's still much that they would like to accomplish going forward.
[00:51:42] Unknown:
In your experience of working with these different companies over your career, what are some of the most interesting or innovative or unexpected ways that you've seen big data and the goal to be data driven applied in the business context?
[00:51:56] Unknown:
I often cite the example of Capital 1, and I do that because I spent a lot of time in financial services, and that's not by accident. You know, financial services companies, they have huge massive repositories of data. They've been working on managing and governing data for 3, 4 decades. So they're used to these issues. What's interesting about Capital 1 was compared to all of the other major banks, they didn't exist 30 years ago. They were actually a couple data and analytics people that basically brought data and analytics principles to the idea of credit card marketing. How could they deliver credit to underserved markets? So they basically brought analytics to an aspect of banking and financial services and then grew to create 1 of the top 10 banks in the United States.
So they're an interesting example because in the same way that Jeff Bezos brought data and analytics to retailing, the founders of Capital 1 brought data and analytics to banking and financial services.
[00:53:05] Unknown:
In terms of working on the book, what were the most interesting or unexpected or challenging aspects of putting it together and building a coherent narrative and figuring out sort of what the main goals and target audience would be.
[00:53:20] Unknown:
Clarification in terms of the major themes in the audience that that came pretty quickly. You know, I will relate an anecdote. When I signed the contract to write the book, the publisher said, do you think you can get us a full manuscript within 6 months and a half manuscript within 3 months? And I said, sure, why not? But not fully really knowing what it takes, though my mindset has always been when I write an article, people say, oh, that must have taken days or weeks, and I say, no. It takes 1 hour. And they say, like, what what do you mean? And I say, well, I spend a lot of time wandering the streets thinking about what I might be writing. Then when the idea comes to me in my head, I basically sit down and dump it all out on paper, get a keyboard.
And then I print it, and I put it in a drawer for about a week, and then I take it out, and I make some edits, move sentences around, restructure it so it reads well, and it has, like, a good opening and a good closing and a narrative flow and that type of thing. And then I hit the key and off it goes to the publisher or self publishing in certain instances. And I said that I really adopted that same mindset around the book. So I started it on Thanksgiving of 2020 because couldn't really travel because of COVID or have the large family gatherings, and I finished it on Christmas day, and I sent a complete manuscript the 1st week in January, which was basically 4 and a half months before the full manuscript was due and about 6 or 7 weeks before a half manuscript was due.
And they pretty much accepted it as is. And so, you know, different people approach writing a book in different ways. But once I had fixed on the idea, I knew where I wanted to go, and it was like breaking the sausage into parts and then executing on it. You know, the biggest learning experience has been the release of the book. So, for example, have targeted different audiences, have targeted large corporations saying, you know, you should purchase this on bulk so that you can develop a common understanding and narrow that gap between the different audiences and understand the business value that's important to derive from data initiatives. And I've also gone to the data companies and they've purchased bulk orders to put in the hands of their customers so they could do that education and evangelizing and also doing some work with the universities as part of their curriculum. So it's all about a learning experience in terms of how you take a book to market.
And as a sideline, something that I do outside of work is I'm actually cochair of an international writers in residence program where we bring together writers from across the world. And these are people that have won the National Book Award, the Pulitzer Prize, the Booker Prize in the UK, and actually the 2020 Nobel Prize in Literature. And so I've been surrounded by these people that have spent their entire career writing, and so it, in part, inspired me to find what it is that I had a perspective on and could bring a perspective to and write about that, tell that story.
[00:56:35] Unknown:
As you continue to work in the industry and work with these businesses, what are some of the predictions that you have for the next decade and beyond for big data and AI and some of the impacts that it will have both for business and broader
[00:56:50] Unknown:
society? Yeah. Well, I think it's great news for the data profession. I think it's gonna be in demand for the next several decades, and that's also has significant impact for universities and where they focus their education, their programs. But I also think that there's gonna be more and more discussion about data ethics in the context of data and debates about how data is presented and how do you establish objective structures around data so that, you know, we talk about data science and facts. Well, I believe it was Kellyanne Conway who said, well, there's facts and then there's alternative facts.
So, you know, this is something that we have to come to terms with, you know, as a society and as a nation in terms of how we look at data science and facts and establishing trust and credibility so that there'll be hopefully some level of consensus and understanding. And I believe I saw on the news the other day, Lindsey Graham in South Carolina up before an audience, and he said, well, I've been vaccinated and the crowd booed. And he said, well, 91% of the people in South Carolina or in hospitals are unvaccinated, and people shouted, lies, lies, not true.
So, you know, that's the challenge that we all face in terms of not only being able to take the data and present the data, but do it in a way where there's some level of hopefully trust and credibility and people don't say, yeah, well, you produced this data, but we think it's fake data.
[00:58:31] Unknown:
Absolutely. Are there any other aspects of the work that you've done on the book or the goals that you have for it or the material that it contains and your overall experience in the industry that we didn't discuss yet that you'd like to share before we close out the show? No. I would just reiterate that I think data permeates
[00:58:49] Unknown:
all aspects of our society now. When I go into organizations and they say, well, you know, we need to think about doing a data project, or we need to think about establishing this data organization over here so it can help us understand more about, you know, our company, our customers, and our markets. I argue for the centrality of data and that data flows through everything and that, you you know, if you're a business, if you're a sports team, if you're in the entertainment world, if you're in the health care world, data needs to be central and first and foremost at the heart of basically all decision making.
Medical decisions, you know, you need to look at what the test results are and what they say and how that dictates the treatment.
[00:59:37] Unknown:
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap the tooling or technology that's available for data management today. Yeah. I think there's a lot of great tools out there.
[00:59:55] Unknown:
People share with me the capabilities that they've created regularly, but I think that there's a gap between an understanding of the business that can be derived and the capabilities that exist. So I think that more work needs to be done in terms of speaking the language of the business users because until the business users really understand the value and the benefit to them and overcome, you know, natural resistance to language they don't speak or things that they don't understand. I think that until that is overcome, there's still gonna be a gap between the capabilities that exist and the ability of organizations to take advantage of those capabilities.
[01:00:40] Unknown:
Well, thank you very much for taking the time today to join me and share the work that you've been doing at New Vantage Partners and on the book that you just wrote. It's definitely a very interesting subject area, and I enjoyed reading the book. So thank you for all the time and effort you've put into that. I'll definitely add a link in the show notes for anybody who wants to find it and read it for themselves. So thank you again for your time, and I hope you enjoy the rest of your day. My pleasure. It's been a pleasure speaking with you, Tobias, and thank you for your insightful questions. Listening. Don't forget to check out our other show, podcast.init atpythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used.
And visit the site at data engineering podcast dotcom to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Randy Bean Begins
Randy Bean's Background and Career
Overview of Randy's Book and Its Focus
Importance of Context in Data
Urgency of Becoming Data-Driven
Challenges in Becoming Data-Driven
Survey Data on Data-Driven Organizations
Bridging the Gap Between Data Teams and Business Leaders
Evolution of Data Initiatives
Opportunities and Risks in Data Usage
Ethical Use of Data
Writing the Book and Its Impact
Innovative Uses of Big Data
Challenges in Writing the Book
Future Predictions for Big Data and AI
Final Thoughts and Closing Remarks