Summary
The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your organization? In this episode Lior Gavish, Lior Solomon, and Atul Gupte share their view of what it means to have a data platform, discuss their experiences building them at various companies, and provide advice on how to treat them like a software product. This is a valuable conversation about how to approach the work of selecting the tools that you use to power your data systems and considerations for how they can be woven together for a unified experience across your various stakeholders.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch.
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Your host is Tobias Macey and today I’m interviewing Lior Gavish, Lior Solomon, and Atul Gupte about the technical, social, and architectural aspects of building your data platform as a product for your internal customers
Interview
- Introduction
- How did you get involved in the area of data management? – all
- Can we start by establishing a definition of "data platform" for the purpose of this conversation?
- Who are the stakeholders in a data platform?
- Where does the responsibility lie for creating and maintaining ("owning") the platform?
- What are some of the technical and organizational constraints that are likely to factor into the design and execution of the platform?
- What are the minimum set of requirements necessary to qualify as a platform? (as opposed to a collection of discrete components)
- What are the additional capabilities that should be in place to simplify the use and maintenance of the platform?
- How are data platforms managed? Are they managed by technical teams, product managers, etc.? What is the profile for a data product manager? – Atul G.
- How do you set SLIs / SLOs with your data platform team when you don’t have clear metrics you’re tracking? – Lior S.
- There has been a lot of conversation recently about different interpretations of the "modern data stack". For a team who is just starting to build out their platform, how much credence should they be giving to those debates?
- What are the first steps that you recommend for those practitioners?
- If an organization already has infrastructure in place for data/analytics, how might they think about building or buying their way toward a well integrated platform?
- Once a platform is established, what are some challenges that teams should anticipate in scaling the platform?
- Which axes of scale have you found to be most difficult to manage? (scale of infrastructure capacity, scale of organizational/technical complexity, scale of usage, etc.)
- Do we think the "data platform" is a skill set? How do we split up the role of the platform? Is there one for real-time? Is there one for ETLs?
- How do you handle the quality and reliability of the data powering your solution?
- What are helpful techniques that you have used for collecting, prioritizing, and managing feature requests?
- How do you justify the budget and resources for your data platform?
- How do you measure the success of a data platform?
- What is the relationship between a data platform and data products?
- Are there any other companies you admire when it comes to building robust, scalable data architecture?
- What are the most interesting, innovative, or unexpected ways that you have seen data platforms used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while building and operating a data platform?
- When is a data platform the wrong choice? (as opposed to buying an integrated solution, etc.)
- What are the industry trends that you are monitoring/excited for in the space of data platforms?
Contact Info
- Lior Gavish
- Lior Solomon
- @liorsolomon on Twitter
- Atul Gupte
- @atulgupte on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Monte Carlo
- Vimeo
- Uber
- Zynga
- Great Expectations
- Airflow
- Fivetran
- dbt
- Snowflake
- Looker
- Modern Data Stack Podcast Episode
- Stitch
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means? Our friends at Outland started out as a data team themselves and faced all this collaboration chaos. They started building Outland as an internal tool for themselves. Outland is a collaborative workspace for data driven teams, like GitHub for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams create a single source of truth for all of their data assets and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.
Go to dataengineeringpodcast.com/outland today. That's a t l a n, and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $3, 000 on an annual subscription. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. Massey. And today, I'm interviewing Lior Gavish, Lior Solomon, and Atul Gupta about the technical, social, and architectural aspects of building your data platform as a product for your internal customers. So starting with you, Lior Gabesh. Can you start by introducing yourself?
[00:02:11] Unknown:
Hi, everyone. I'm Lior. I'm the cofounder and CTO of FonteCarlo, the data observability company. Before that, I used to run engineering at a company called Barracuda. I built machine learning applications for
[00:02:24] Unknown:
fraud detection. Very excited to be on the show today. And Lior Solomon. How about you? Hey, guys. Thank you for having me. My name is Lior Solomon, VP of Data Engineering at Vimeo.
[00:02:33] Unknown:
Spent in the past 10 or 15 years as a VP of Engineering different start ups. And, Atul, how about you? Hey, everyone. My name is Atul. Thanks for having me, Tobias. I am currently a product manager at Facebook. But prior to this, I helped build and manage several data products at Uber. This included products in the data science space, machine learning,
[00:02:52] Unknown:
analytics, as well as data cataloging and metadata management. Excited to be here today. And going back to you, Lior Gavish. You've been on the show before, but for anybody who hasn't listened to the previous episodes you were on, if you wanna just share a bit about how you got involved in data management.
[00:03:07] Unknown:
So I got involved in data management basically as part of my work trying to build products for in the cybersecurity space. So you all know this, but security is data heavy. And some of the most interesting developments in recent years have been around how to better leverage data in order to detect threats and block them in real time and understand their impact. And so building out those data intensive machine learning based systems really require building out a data platform. A lot of capabilities around collecting different types of datasets and kind of synthesizing them and putting them together in different ways to both solve some of analytical use cases and also solve some real time mission critical machine learning based use cases. And so that's how I got really excited about building data platforms and data management in in particular, and I've been, specializing in it ever since I started Monte Carlo
[00:04:07] Unknown:
2 years ago. So very passionate about the space. And, Ler Solomon, do you remember how you got involved in data management? Yeah. Absolutely. For me, it's like it it's from 2 different angles. Mostly, you know, I spend and started focusing on, like, you know, the aspect of, like, collecting user behavior data and turning it into insights and basically, you know, helping to improve the user experience. So it was, like, a lot of interest on the, you know, data collection kind of product perspective throughout, like, my previous role in ad tech video ad tech company named LiveU, which was, you know, like, that was my experience when it comes to the high volume data collection, you know, third party cookies, and all all that fun, being a a huge, you know, scale of billions of events coming here today.
And then, you know, from ad space into Vimeo and kind of combine those 2 worlds of, like, high volume at scale and also, you know, improving the very much a consumer facing app. That was my journey and my what create my curiosity and passion. We got, like, the data and then, you know, usage of data. And, Atul, how about you? Yeah. So my journey was sort of similar to Lior s's. It was more on the gaming side. So I started my career at Zynga,
[00:05:17] Unknown:
and there, I witnessed firsthand, you know, collecting large amounts of user behavior data, right, based on events that we were hovering in. And I saw how product managers were using that to determine what features to build, what things to optimize. And I saw that there was a gap there. Right? There was definitely things that could be done to make this easier for not just product managers, but others in the org to truly understand how to build better games. That's where my interest started, and I kind of went from there or to Uber to build these platforms at large scale. That brings us to the overall topic of conversation
[00:05:49] Unknown:
about the overall idea of a data platform and treating it as a product. But before we get too far along, I'm wondering if we can just build some consensus about the definition of that term data platform and what that means for the organization and the technologies involved and the overall set of capabilities that we're discussing. I think 1 of the biggest
[00:06:10] Unknown:
challenges when you when we're trying to find a data platform team, first of all, understanding what sort of challenges they're trying to solve and to contain the scope of work on that data platform. Because and specifically in Vimeo, what we define data platform as being that their mission statement is to build real time, centralized data collection pipelines and consumption frameworks at scale. So as you can see by just by the mission statement, mostly about the real time aspect of it, I guess we can talk, you know, later on about, like, is that necessarily the only definition of a data platform? Is it necessarily just real time? Could it be also in regards of, like, ETL and and and batch processing?
But, you know, from our perspective, at this point in Vimeo, we most resolve that challenge in, like, helping product teams and engineering teams. I'm not necessarily familiar with the intricacies of, like, real time processing, helping them, first of all, to actually make sure that we provide that sort of sort of service and frameworks to be able to release their products and use that kind of capability.
[00:07:11] Unknown:
I think MyView is similar, but slightly more expansive. Right? So I would say a data platform seems like it's a collection of internal tools, systems, technologies, frameworks that are all interconnected somehow. Right? Either at the data layer or more commonly at the workflow layer. So if your workflow spans multiple tools, technologies, and they all rely on each other to derive an insight or to drive a business outcome, I'd say maybe that could be a data platform. It's slightly more expansive than I think what Lior just said, but in the similar vein. And I I agree with everything that's been said, and I'd even add
[00:07:46] Unknown:
that the conventions and methodologies and people processes that exist around the infrastructure and tools is actually part of the platform and actually defines how the platform is going to be used and leveraged. So it's a combination of, actually, of people, process, and technology that every organization designs a little bit differently in order to accomplish the goals that have been set forth for data insights, for machine learning, for other kind of data applications.
[00:08:19] Unknown:
The interesting aspect of the sort of platform part of the terminology is that it sort of implies a certain expansiveness and sort of holistic approach to the data. So I'm wondering if somebody says, oh, I have a data platform, and really what they have is a Postgres database and a BI dashboard, does that count? You know, at what point can you say I have a data platform, and are there sort of limitations or requirements that you would place on being able to use that phrase in reference to the tools and processes that you're building on top of. That one's tricky.
[00:08:56] Unknown:
I think I think what you're essentially saying is, is there a critical point in time when what you have becomes a platform versus just an just a collection of tools or maybe just a single tool or a single process? I think all data platforms start that way. I don't think you ever go in building this, like, giant monolith that has 10, 000 different pieces. So I guess it's just a matter of semantics. You call it a data platform when you think there's a vision and you're executing towards it. Perhaps that's a good way of putting that. Yeah. I think that's a fair approach. And I was kind of trying to throw some contrary perspective in there to see
[00:09:30] Unknown:
how I can take you off guard. And so I agree that, yeah, nobody says, oh, I just built this data platform in a day, and you have, you know, all of the bells and whistles. It it always starts out as I need to answer a question. This is the, you know, first step to be able to get there, and then you kind of iterate on that. And so in terms of the actual platform and the evolution of that, who do you view as being the main stakeholders in the platform and its use and its sort of design and care and feeding?
[00:09:59] Unknown:
It's mostly, in our case, product managers that, you know, all the way from the moment they have a knife offices and they want stuff like working with the engineering team, supporting them to actually use the right data collection, framework that we provide, making sure that they have the best practices to to understand which cases should they use 1 framework versus the other. We have the actual engineering teams that uses the data I know to process it or make sense out of it or just, like, stream it towards other workflows. And, obviously, analysts, you know, consuming the data into into dashboards or just, like, you know, doing the using it downstream.
In our case, specifically, also going back to the my definition of a beta platform being it you realize there's, like, different downstream teams or data engineering teams that are consuming the data from real time. We have a BI engineering team that focuses on the data warehouse aspect and the ETLs and the of these. So they're like another team that consumes data from the data platform team. I think that 1 another observation I wanna make is that data platform team in our case is actually mostly focused on the infrastructure and focused on the frameworks that you will use in order to stream the data. They were trying to minimize their engagement, you know, gas, like, certain metrics and having the understanding of the business metrics or any discussions in the level of, like, an analyst or a PM trying to make sense of data. They focus solely on the engineering at the scale and making sure the SLAs are met. And, therefore, we have other data engineering teams which are a little bit closer to the matter or the subject matter and can help, you know, bug, certain data discrepancies or anomalies for helping building aggregations or making data more accessible. I wanted to add just
[00:11:45] Unknown:
maybe 2 things there. So I think, Leo, you covered sort of these 2 categories. Right? But you're saying the people who actually rely on the insights derived from the platform, most likely that's PMs or, like, someone in the business. And then you also mentioned about people who build using the platforms. Right? So the data scientists, data engineers, analysts, depending on the company. I think there's 2 other categories. 1 is the people who build the actual platforms themselves, you know, the engineers, the PMs who design it, etcetera. And the other category, which I think sometimes is overlooked but tends to be important, these are the people who actually administer or fund the platform. So this is, like, you know, the VPs, the CIOs. I like to call them the people whose names show up in the New York Times article and there's a breach. Right? But I think they're very important stakeholders because they sort of determine the funding levels, the technology choices sometimes, and also help evangelize it within the organization. Because more often than audio, data platform starts out as this little thing that then grows, and so you really need that evangelization. So I think those are the only 2 things that I would add. The data product managers that build the platform,
[00:12:47] Unknown:
I think. We we mentioned the data engineers building the platform, and then there's, of course, the data product managers, like like the tool who define their requirements and really help develop a platform in a direction that makes sense for the company.
[00:13:01] Unknown:
To your point of having sort of product managers involved and iterating in a particular direction, this goes back again to the question of what's the point at which you can say that I have a data platform versus and plan for how those tools fit together to be able to interrelate versus I have, you know, system a over here where I can answer some questions. But then when I need to be able to iterate on something there, I go over to system b, and then I need to go to system c to make other changes. And so it's this discrete sort of choppy workflow where you don't have 1 continuous flow from an idea through to delivery.
You need to be able to kind of context shift and jump around a bunch of different places. I think that that's really the core element of what makes something a platform versus
[00:13:53] Unknown:
a bunch of technology that you try to throw at the problem. Yeah. I think that's the point almost by definition, it's the point where things can be built on top of the platform. Right? Like, that's the definition of a platform. The point where if you're trying to build a new use case or answer a new question, you don't start from scratch and build a complete a new solution, but rather you build on top of the tools that have already been put in place and have already been integrated together in order to help you accomplish that goal. And sometimes maybe a postgres database and a BI tool is all you need to get to that point. So I don't think it's necessarily about how sophisticated or fancy the technology is, but more about at what point it becomes usable to address, you know, a broad range of use cases that the team has in a way that's effective, that's fast, that's reliable, that's secure. I think those are some of the things that a good platform will help you to. Just wanted to add to that. That is, like, I think a number of challenge that, you know, we're seeing is providing the best practices
[00:14:58] Unknown:
and just, like, you know, like, having that clear guidance of, like, which tool or, like, you know, it's a data collection framework or in what cases would you use 1 versus the other. Some of the challenges we're seeing is that it's really from a lineage perspective. It's really hard to anticipate who's gonna use a framework. Like, if you can hand off a framework to a certain staple, they can start working on it. But in a larger engineering organization, you end up someone, you know, like, downstream, a team that uses tool that you provided for a complete different purpose. And you realize it just after the fact, I think that's, like, another challenge that, you know, looking into platforms like Monte Carlo to be solving to help you actually identify when someone is, like, picking up a certain data stream and shifting it completely to a different use case. And sometimes, you know, against the guidance and best practices the data platform team, provides, I think that's another challenge. So my point is that I think part of the the definition data platform is a lot of documentation and a lot of, like, you know, making sure it's, like, training and kind of, like, distributing that concept and vision to the rest of the teams. Otherwise, you know, as the company scales and the engineering team scales, it's really hard to kind of, like, control that. And in terms of actually
[00:16:17] Unknown:
establishing and driving the overall vision of the data platform, who takes that responsibility, and what are some of the technical and organizational constraints that are likely to factor into
[00:16:30] Unknown:
how that design takes shape and the overall goals of what the platform is intended to achieve? Maybe I can get started on this 1, but would love to hear the thoughts of Lior's. I think as a PM, I can talk about what I've observed as a PM. Right? I think a large part of it does depend on those stakeholders that I mentioned. You know, the people who who sort of green light these efforts and fund these efforts. Right? They need to help act as evangelists because you might see the smattering of platforms or or a smattering of, sorry, tools or systems that, you know, you as a PM say, yeah. Clearly, there's a platform here that's waiting to be formed or, like, there's guardrails that need to be put, whatever. But I think a large part of the challenge that you see is adoption. Right? How do you convince different teams to sign up for this vision and to maybe stop using some of their cobbled together tools and instead use your cobbled together tool? Right? How do you make sure that they're, you know, signing up to pay that productivity tax as it were, right, for not using something that they're so used to using? And this could be sort of an external platform. It could be like an external tool versus using something that's homegrown or something completely different in the workplace that they're used to. So in terms of who owns the responsibility, I don't think there's sort of 1 person or discipline that I could point to, but I think it definitely is a partnership between the team that decides to build and own this platform most more likely sort of the product leadership, the product manager leadership there, and some sort of sponsor higher up in the organization that can help drive adoption and evangelize at that level as well. But that's kinda what I've observed. Curious what the others may have there. I think what I've seen is this thing evolves and matures. Right? The people that actually build the first
[00:18:05] Unknown:
versions of a data platform might actually be the single data engineer, data analyst that are they're starting the data function in a company. Right? And they own the data platform. They set it up in a way that allows them to work effectively. And then as the team grows, the data platform evolves and becomes more powerful. And at some point, you start seeing a dedicated platform team, kind of like Lioras described, where where you start seeing pieces or parts of the system being owned by 1 particular individual or by a team that adopts it. And then eventually, maybe 1 day, if you're lucky enough to grow the the Uber scale, you have a set of product leaders and engineering leaders that own it together. And and I think it's the natural evolution of how these things get built out, and that needs to adapt to what the company needs. Right? Like, the more use cases you have, the more investment in data you have, the more scale, organizational and data volume scale that you have, the more you need to assign clear ownership and formulate a clear vision for the platform and invest the resources in building it out and making it useful for all the stakeholders. Right? It's not easy. That's why I'm sure you have a lot of horror stories. It's not easy to get a lot of different stakeholders that have their own ways and and own tools to adopt something. And in order to accomplish that, you need both an investment in a platform that is robust and that adds real value, and you need buy in and evangelism from from the top leaders of the company. And and I think it's just a process.
In terms of the constraints, I think the organizational constraints that we've all talked about matter a lot from technical perspective. I think some of the things that really impact the way the platform evolves are the skill set of the people that are building on top of it. Right? Like, you build quite a different platform if you're trying to help, you know, analysts that love SQL build dashboards versus if you're helping machine learning engineers bring real time machine learning use cases to production versus you're handling real time analytics. India. Right? So you really need to think about who the people that are building on top of the platform are and what sort of technologies they're used to using. Do they like SQL? Do they like to write Python notebooks? Do they like to write Spark jobs and really adapt the platform to what the intended use case is.
And, of course, you need to think about the data sources you're going to consume. Where are they coming from? How are they structured, what level of control you'd have over that over time, and as well as, you know, real time versus batch, machine learning versus analytics, and all those different aspects of what the data platform needs to deliver in order to be valuable.
[00:20:50] Unknown:
And on your point of the machine learning and real time versus batch and analytical, what do you see as being kind of the breaking points at which a single platform is not able to deliver on all the different use cases and you need to build out separate platforms for specific purposes? Or do you see that as sort of the overarching goal of 1 platform to be able to deliver on all the possible applications of data within the organization?
[00:21:18] Unknown:
I mean, I I'll I'll invite Lioris and Bill to comment on it as well. But my sense is you get the most leverage if you're able to create a consistent platform. Let's think about real time data. Right? Real time data is valuable, but even if you have that use case, it's very likely that you're going to have some batch processes that are using the same data. Right? And so you kinda need to support both in the same platform. If you start splitting it into different platform, you're losing some of the economies of scale and some of the consistency and some of the argument to even use the platform in the first place. And same comes for machine learning. Right? If you're building models on top of something, it's very likely that you also need analytics on top of it. And so splitting that out in a into completely different platform kind of loses some of the value, at least in my opinion.
Not to mention some of the kind of horizontal functions of the platform if we're talking about observability and discoverability, security, and and access control, all of these things, you know, are much more valuable when they're consistent across multiple different use cases as opposed to kind of being part of separate systems where you need to kind of rebuild all of these building blocks
[00:22:34] Unknown:
fresh. I don't know if you guys would agree. I'll I'll invite you to challenge that. Actually, I would say plus 1 to that. That's exactly what we observed at Uber, and that's kind of what I've observed at other companies that I would partner with. Insights are like to understand how they build a particular product or space. I'll give you an example of the real time space at Uber itself. Right? We used to have a stand alone system and product to handle these real time use cases. And exactly what you said, Lior, happened. Teams wanted observability. Teams wanted to combine that real time data with batch processes. They wanted to use some of our metadata management systems with these real time systems. And instead of having to build all of those things separately by integrating it with our existing querying systems, our existing data cataloging systems, it actually made usage much easier. It increased usage, and we actually got much more value out of those real time systems than we would have if they were standalone. Right? And I think that argument applies to pretty much all of the examples you gave, whether it's data science, whether it's machine learning. Because a lot of the platforming components, right, like the security, the access controls, observability, the ability to debug something, you get those for free by partnering with the platform. Right? When I say free, but you see what I mean. And customers are increasingly demanding that. So instead of having each team kind of build this out themselves and exist in a silo that doesn't work with any other silo at the company. I think there's much more value in, you know, that initial friction to combine, integrate.
And, again, it goes back to sort of the definition of data platforms that we all spoke about earlier. Right? Just what is a platform? If there is some workflow that can be orchestrated across everything, if it's relying on the same sort of underlying tools, technologies, or frameworks. I would argue that that kind of fits the platform definition, and so it makes sense to leverage the the strengths there. So at TLDR, that was just a plus 1 for me.
[00:24:23] Unknown:
I'm really curious about, like, how you guys think about the fact that some of those data platform tools, you know, they actually bear for a different data engineering skill set or it's a complete different stack. You know, like, for example, you would have data platform tools that are focused on the, in our case, Kafka consumption, bidding consumers, and versus, you know, having that running on tools that supports that could actually create great expectations, maybe test cases or providing, you know, like, any sort of process around airflow. I'm wondering from your experience, do you envision in your world, all those disciplines are actually and the experiences are actually happening in 1 team or other spread to different skill sets or different data platform things? I can maybe venture an answer because we've dealt with some of these challenges at Uber. I think when we started out, it was a whole bunch of, like, separate disparate
[00:25:12] Unknown:
tools. But what we found much more success with was to provide different levels of views for different disciplines and skill sets. Right? And Uber, we have this, like, unique problem. I'm not sure how widespread it is everywhere, but I definitely feel like it's a little unique that we had a lot of different people from a lot of different skill sets trying to access the same data or drive the same insights. And so what ended up happening was we created these different levels of views that, you know, you could use based on your skill set. So if you were an amateur SQL programmer or, like, you didn't understand too much but you really wanted to use those insights, you could leverage some prebuilt components that were there to sort of plug and play plug and play these components and, like, build, you know, whether it's interactive dashboards or sort of, solutions where you can just use the insight and run with it. But if you were a data scientist that really knew what you were doing and and truly understood which query you wanted to run or which DAG you want to manipulate, you have that access as well. So yeah. I I mean, I know it's really high level, and we can definitely go deeper. But I think that was the value. Right? So by getting everyone on the same platform, we were able to foster this sort of collaboration, right, where data scientists could build something that's complex, shared with an analyst who understood a little less but wanted that insight, who then shared with the PM, who only cared about whether the number went up or down. Right? But they were all operating on that same dataset, the same dashboard, the same sort of underlying asset. Right? This made managing these things much easier. It made troubleshooting much easier. And now there was, like, a single source of truth for this data. Right? Like, all 3 disciplines were now operating on the same ground reality.
So I think that was very valuable. It was not an easy task. It took us a while, but we found that the dividends that it paid off were tremendous.
[00:26:52] Unknown:
Yeah. I wholeheartedly agree with this. I think there will always be different people using different technologies to manipulate and access and derive insight from data, and that's perfectly fine even within a single platform. I think the role of the platform is to make sure they're all looking at the same data and are working consistently, are able to find that data consistently. Right? So for example, as Lior has mentioned, there are Kafka streams and there are some batch processing, but that's perfectly fine. And there's likely different people using those different technologies.
But, for example, something that the data platform can make sure happens is that if there's a certain dataset that's flowing through Kafka, it also ends up, for example, in the data lake or the warehouse so that it can be queried in batch jobs with people with different skill sets and that there's also enough metadata to be able to determine that particular Kafka topic is also related to this particular table on on the data lake or the data warehouse. And it's a very kind of tactical example, but I think these are some of the things that a good data platform can make sure happens at the foundational layer so that the people using that data don't have to work really hard to have the same source of truth, the same information, and work consistently across across the company.
[00:28:10] Unknown:
In terms of the overall aspect of the technologies that are used for being able to build out a data platform, you know, as we discussed at the beginning, a lot of times, you start with 1 small use case, you know, a small subset of the tools that you're going to end up with, and you need to iterate on the stack and sort of have this overall vision for how it's all going to tie together. And 1 of the conversations that's been happening a lot recently is the idea of this so called modern data stack where a lot of times people will say it consists of things like Fivetran with Snowflake and DBT and Looker is sort of the happy path that a lot of startups are going down. And for teams who are early in their journey of building out a data form and establishing the overall product vision and goals, I guess, how much credence do you think we should be giving to these conversations about what constitutes the so called modern data stack and how much variance do you think we should bring into that conversation to fit different needs versus trying to push everybody into this 1 sort of uniform strategy of how to build out the foundational layers of their platform?
[00:29:18] Unknown:
It's a controversial question, Duais. At least my 2¢. Right? I think we have to be careful about mandating 1 particular technology as part of the stack. All vendors, including Monte Carlo, would love people to do that, but I'm not sure there's, like, a single answer to all data problems. I think some of the concepts behind the modern data stack are very, very valuable. Right? I think using cloud services that are managed and scalable is super, super important, and that's going to radically accelerate the organization's ability to innovate with data. Right? Anything you can do to choose a solution that lets the team focus on building insights and doing kind of core business work rather than building infrastructure and and kind of, quote, unquote, dumb pipes is very, very valuable. Right? And and the companies that you mentioned, Fivetran and Snowflake and Looker, all help you do that. There are also other vendors that will help you do that, and you need to think about it in the context of your business requirements, your technical requirements, and where the rest of the company runs. Like, for example, if you're running your all of your applications on GCP, then it might make sense to use BigQuery and not Snowflake just because it it'll be much more operationally efficient to do so, and and there's a lot of knowledge in the company around it. Right? So and I think there's, like, 1 solution. I think the concepts make a lot of sense, and then I'd definitely go for a kind of a cloud based, managed, scalable stack every single time. That's a no brainer.
[00:30:49] Unknown:
But choose Defender that makes most sense for what you're trying to solve and the environment that you operate in. It it really matters on the scale, like, how big of a company we're talking about and how much engineering kind of the resources you have to solve this problem. You know, if you're definitely, I would argue that, you know, for smaller companies, first of all, let the when it comes to serving the different business departments, you know, like finance or CRM or marketing, trying to move their deliverables, try and provide whatever tools off the shelf to let them kind of, like, drive their insights and drive their deliverables, like, in an autonomous you know, as much as possible, it would be free, not be dependent on data engineering. So definitely, you know, we're evaluating different, managed ELT tools that, you know, 5 channel or Stitch, definitely, it can help there and kind of try and contain the data engineering efforts mostly on the bigger problems that more where scale is a challenge and actually, you know, like, we're you know, we need the little bit to support more of the data science or or machine learning engineers to actually drive the deliverables. They need a little bit more attention than those, like, more kind of like the business metric aspect where it can be just sell sales to Snowflake. And when it comes to data observability, definitely thinking about, like, tools that can help you identify and monitor any sort of, like, you know, freshness issues and anomalies, having something there in place makes sense.
And also from a data collection perspective, having a clear framework of, like, what is the organization gonna be using in order to collect data and also to classify when we collect data. I don't think that even there is, like, 1 tool that serves it all. Depends on, like, you know, in my mind, I break it down into, like, you know, transactional kind of data could have 1 type of pipeline. It used to be everyday that could have different type of pipeline. So, you know, they have different SLAs in my mind. And, yeah, it's like, you know, I personally think it's like spending the time trying to engineer towards the tougher problems makes sense versus trying to reinvent the wheel where there's, like, some tools that are off the shelf. You can just, like, use them, go for that, and keep your engineering force for tougher problems for organization.
[00:32:59] Unknown:
And so for a new team who is thinking about building out their platform and maybe they already have an existing set of tools that they've been leaning on to be able to answer questions in the early days or, you know, maybe it's a older company and they're just working on trying to tie together all the disparate systems that they're dealing with. What are some useful strategies that you have found for being able to either build or buy their way towards a more well integrated platform that fits this sort of product vision that we've been discussing.
[00:33:28] Unknown:
So as you're thinking about whether you wanna expand what you have, whether you wanna acquire new solutions, you know, from from new vendors, etcetera, I think you do need to think very carefully about the inherent costs and benefits of each approach. Right? It might appeal to purchase a lot of, like, stand alone separate platforms thinking that, hey. Like, these platforms have already done all of the hard work, because often the fit and finish of these platforms is gonna be far superior to what you could stand up internally with a short amount of time or or resources. Right? But there is a price that you need to pay for sort of integration, making sure that everything is compatible with everything else, making sure that things flow well, making sure they they appear in the right places.
And the other thing is that, you know, when you do purchase a platform or an external solution, it may not always meet very niche use cases that you may have. It it truly is important for you to truly understand, like, what is your use case? How niche is this something that needs to be this niche? And and I think that will help you answer your question on what should I pursue. Should I just expand what I have because it's working well enough and I just wanna make it better or, you know, sort of cut what you have and and and acquire a solution that exists. I think there's value in doing both because using a solution that's prebuilt by sort of seasoned practitioners has a lot of value. They've they've spent a lot of time. The fin and finisher is gonna be great. You're gonna have sort of a dedicated support function and, you know, use cases and all of that documentation. Everything is gonna be available, but it may not fit, like, this crazy small niche that you have. And so yeah. I guess no easy answer. It just requires some careful thinking here.
[00:35:02] Unknown:
I think migrating from an existing stack to a new stack is always a very, very difficult decision. It can have a lot of benefits. I talked to a lot of companies that have, you know, for example, made the transition from an on prem data warehouse to kind of a modern stack, if you will. I think, by and large, you hear about dramatic acceleration of data initiatives and dramatic acceleration of the ability to really use data. So there's definitely a lot of value there. Having said that, those are complex and long projects that can take many months, sometimes years, to accomplish.
And so you really need to think carefully about how much rope you're gonna get as a data platform. Right? Like, for how long are you going to get the funding and the focus and the attention on doing these migrations and rebuilding your platform, essentially? And if the answer is yes, I have the backing, I have the support from leadership, from other teams to really invest in this, then you can jump leaps. But if the answer is no, like, in a few months, people will be frustrated and the project will be dropped and resources will be taken away, then then it's probably not worth it. Right? So it's always a complicated question that it also ties to higher level of business objectives and
[00:36:19] Unknown:
leadership perspective. I think it sounds like it's really important to define, like, if you do go and pick up a certain tool, it's gonna pick up a new infrastructure or a new kind of like a new platform even if it's CPU buying versus building it. It's really important to understand what's the first use case or business use case you're gonna be using in order to show the impact, understand, and set up the expectations of, you know, what are you exactly trying to improve, try to contain it for that specific, you know, group, and be very attentive to how effective it is. Because as this becomes a success story, it's really easy to, like, celebrate to the organization and get more teams to be onboarded.
And it's also to actually helps to set up the expectation with leadership in regards of, like, It's worth investing that direction. We already have a success story. Here's how we've actually moved the needle, and, you know, that's how you get the buy in for the rest of the teams. And also, in some cases, you know, like, it's always a little painful to move kind of, like, certain payload or certain, like, workflow for 1 pipeline or 1 infrastructure to another. 1 thing that worked for me before is actually, you know, introducing a carrot at the end where, like, if you start using this infrastructure, you get out of the box those set of features that does not exist some somewhere else. And slowly kind of, like, motivating the teams doing that shift to being your voice within the organization and actually motivate the rest of the teams to follow that pattern.
[00:37:46] Unknown:
Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo or Facebook ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you're looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hitouch for reverse ETL today. Get started for free at data engineering podcast.com/hitouch. Going back to the question of scaling the platform, there are a number of different ways that we can think about scale, whether it's the scale of the underlying technology for being able to deal with volumes of data. It could be organizational scale of being able to support a wide number of users. It could be scale in terms of the range of features that it supports. I'm wondering if you can just talk about some of the ways that data platform teams should be thinking about scale as they're building out their solutions and selecting the underlying technologies and trying to promote the platform within their organization.
[00:38:55] Unknown:
I think there's a point in time where some deliverables from the data platform turns into a real data product. And in that case, you know, it makes sense to kind of offload the whole kind of, like, onboarding to that data product to a different team. Because otherwise, you know, you just, like, you spread to so many different direction that they have platform team. There's, like, more and more kind of, like, features they introduced to the platform or tool they're introducing. At some point, you know, they'll be stretched so thin that it'll be really hard to anticipate what sort of volume of request they're gonna get and how much adoption they're gonna get into each 1 of those, like, features or tools that they introduce. So I think it's really important to kind of identify adoption, identify how many consumers we have for any given features or tools that the data platform here is is providing.
At the certain point, ask yourself, like, are we in a position where this is, like, a substantial is this, like, a product that we wanna try, you know, like, offload to a different team? And in my mind, from my experience, it also becomes, like, 1 indication for it's about time to do so is when the discussion becomes much more about the business impact and the time data that's streaming for those data tools, when the discussions are becoming a little bit more high level in, you know, more product or business initiative,
[00:40:10] Unknown:
then I think it's a good time to actually offload it from data platform. Let them focus on the engineering aspect of it, the infrastructure aspect of it, and let a different team just, like, handle that. Absolutely agree. I think very similar to what I've observed, and experienced at at Uber as well around you know, you start out a data platform. This is something that we build a hosted notebooks product. Sounds fairly pedestrian vanilla, but the the features and functionality that were added to it attracted an audience far greater than what we initially started out to serve. So, initially, we just wanted to make data scientist life simpler. We ended up serving a lot of customers because they were able to adapt the platform to their needs. And you suddenly ended up in a place where, you know, these very disparate audiences wanted very different things out of their platform. Some wanted things that were, like, plug and play, sort of what you see is what you get editor kind of experience. Data scientists wanted access to complex infrastructure, GPU farms, and things like that. And I think that's where I think what Lior said made absolute sense. Like, you need to start, you know, figuring out, okay, which users do you really wanna focus on? Which parts of the product do you wanna hive off into other systems? Are there better solutions here? Can I buy something? Can I build something?
So I guess to ask to answer your question, I think the scaling of technical complexity seems to be the harder problem. I would argue, I think some of the other things whether it's whether it's infra capacity or, like, usage, I think those are things that can be managed in other ways. But I think technical complexity, that's the 1 that requires the hardest conversations.
[00:41:37] Unknown:
I'll add to that. I agree with all of this. Certainly a challenge to manage all the different use cases that emerge and figure out what you're going to support and what you're not going to support or how you're going to support the things that you don't wanna focus on. I'll also add a couple of other things that we've seen emerge as data platforms scale. It's just the sheer complexity of the number of sources and datasets that are produced from them and the number of people that's involved that creates the kind of horizontal challenges. So and we touched on them in the past, but observability and discoverability.
Right? Like, so, you know, as long as they're, like, 1 or 2 pipelines or 10 pipelines, pretty obvious how you'd make sure that they work and how you guarantee the SLAs and the reliability around them. It's overall pretty easy to find what datasets are produced from them and which ones should be used for different purposes. As the platform scales. It becomes really challenging because any change in any piece of the puzzle, whether it's a source that's changing in some unexpected way or pipeline that's been modified and had some unintended consequences, All of these things can create downstream reliability issues. Right? And that's kinda where observability comes. That's why we built Monte Carlo. And then similarly, every new piece of data that's created creates discoverability issues. So, like, how do you make sure that that data is available to all the people that could make use of it? How do you make sure that they know about it and don't create their own version of it, which in of itself creates complexity and confusion in the organization. So all of these things probably are not super important in the early days of a data platform, but then become much more important
[00:43:20] Unknown:
at the later stages as you scale. And so going back to the question of treating the data platform as a product. So we've discussed about some of the organizational aspects, what it is to even have a data platform, some of the ways that the components fit together in this cohesive whole, and some of the sort of stakeholders and responsibilities involved in making this happen. But when it comes to actually treating it as a product above and beyond just having a data platform, what does that entail and sort of what is the meaningful distinction between a data platform versus data platform as a product. When you get to the point where
[00:43:58] Unknown:
the discussions around, like, how will we be using that data and what sort of datasets we're gonna be enriching, what sort of decisions we're gonna be making out of that product, and which use cases we're gonna be actually using that. I think that's in my mind when it becomes, like, more of a hit up product. Have 1 example where we build a data collection framework we call internally a big picture where we're letting PMs create schemas that defines the type of the data or data points they wanna collect throughout the apps. You know, to do that, there's you know, there are discussions about, like, how should we model the data? Like, what's the events should we have and how should we design them? And it also influences the data model downstream in the consumption label.
At that point, I think it's like more you know, that's the point where you wanna start understanding the problem and provide the guidance, which is beyond, like, how that scale, beyond what infrastructure should we use in order to scale the the volume of events that we're gonna be collecting. In a nutshell, it's like, in my mind, it's just like it goes back to the question of, like, the essence of the data. So there's 1 thing, building a pipeline and making it scale and make it performant versus the data that streams within that pipeline. When you start handling and disc discussing what's the data that streams into it and to what
[00:45:13] Unknown:
sort of workflow or what sort of product you're using it for, I think that's 1 way to stop thinking about it as a product. I agree with what you said, Lior. And for me, the distinction is a data product is something that you build and gets used by other people. Right? Like, I could go anytime and create a notebook and, you know, write a prototype of my fanciest ideas on how to do machine learning or datasets and exploring the data in various ways. And as long as it's my notebook that I've been playing around with and that I'm learning from, it's not a product. And, therefore, there aren't any kind of stringent requirements on it other than I should probably not destroy the warehouse or the cluster while doing so. Right?
The moment I take that thing and kind of, you know, wrap it so that other people in the company can use, maybe I'm exposing it as a dashboard that other people are going to use every day or as a model that is going to, you know, make predictions for person application or even if it's just a dataset that other people are going to use to build their own products, at that point, it becomes a product. And when it becomes a product, there's some responsibility that goes with that. Right? Like, when we build, you know, software products or consumer products, right, you expect the thing to have certain SLAs. You expect it to have in the data. You expect it to have certain SLAs. You expect it to be documented. Right? To To be well understood. Like, where is it coming from? What does it do? What does it not do? What it is and what it is not? You expect it to be findable and discoverable.
Right? You expect it to have the proper permissions and the proper governance. Right? Like, you don't want to expose, you know, all of your users' BII to everyone in the company just because it's part of your new product. Right? So for me, that's an inflection point. If you're going to expose it to others, it needs to hit a certain checklist of expectations. And what they exactly are depends on the use case of the company and a lot of other things, but you need to have that checklist and make sure that the boxes are ticked. It's no longer, you know, Lior's personal experiment. It's something that's going to be used widely and therefore has to meet certain criteria.
[00:47:29] Unknown:
As you mentioned, there are a lot of other requirements that come in when you start to treat it as a product versus just a technical capacity. To your point of SLAs and the sort of reliability of the system, what are some of the ways that you determine the level of success of the data platform and its product oriented nature and some of the ways that you sort of justify the time and expense that goes into building and maintaining this system for the purposes of the organization.
[00:48:01] Unknown:
I can kick this off because I've tried a bunch of different approaches in the past, and there's some that have had some success. So I think 1 is you truly need to understand what is that data platform trying to do. Right? Is it improving productivity for some cohort of people? Is it helping with cost? It's gotta be helping with something. Right? Otherwise, you're not gonna be building, managing, and maintaining it. And so there were sort of 2 approaches that I've tried, some have failed, and some have worked. 1 is to try and peg it to the success metrics of whatever you're helping solve. So if you're building machine learning platform that is designed to, you know, reduce fraudulent behavior, you know, your success could come from sort of this fraudulent behavior that you've prevented from happening. Right? Like, the cost savings, stopping bad things happening essentially. Like, how have you done that or how well have you done that? That could be 1 way of doing it. Another approach that I've seen that has worked well is to understand what is the unit of business for the organization. Right? So let's say you are a food delivery company. Maybe the unit of business is the number of deliveries you make or the amount of money that you make on each delivery. And then sort of breaking that down to say, okay. What can the data platform do to make this more efficient? Right? Can it make deliveries faster? Can it make deliveries cheaper? Can it make customers happier?
Whatever that is, and demonstrate how the data platform helps drive that. So something that we've done was to identify this unit of business and see how much does it cost to run this data platform to deliver that unit of business. And as business grows, so as you deliver more units of business, if the cost goes down for your data platform while you're still delivering the same or more value, then that is a measure of success or that could be a measure of success. So, yeah, it's a little harder to do with the data platform because you're so far below the actual business. Right? It's hard to demonstrate that impact, but there are ways. Some could be a little more convoluted, but there's definitely
[00:49:52] Unknown:
avenues here. I'm curious if that resonated at all with the others as I explained. Yeah. I totally agree. I think it's, like, making sure that, you know, there's a clear understanding of the type of problems they're trying to solve and trying to tie them up as much as possible to business objectives. Like, for example, you know, like, if the data platform was focused, let's say, for, I don't know, for next year just on real time analytics or real time streaming, What's the availability metrics? What's the you know, like, how many what's the ingestion kind of latency or consumption latency that we could commit to and to continuously track those sort of, like, metrics. It will also help you a lot when to see when the team is actually turning in the wrong direction to slow down We're committing to new features and actually making sure that you're paying that technical debt versus, you know, committing to more features and more fancy tools on the data platform. So having, like, those SLIs and is really big deal. On the flip side, I would say that it's really tough to sense the you know, those data data tools, sometimes different components, and having a list of them all kind of, like, provide, like, some kind of a north star to say when should we slow down or when can we take be more ambitious about more features that's a little tricky.
But, you know, I think I really believe that just the intent to continuously look at those, like, build those metrics,
[00:51:06] Unknown:
that's part of the mentality or, you know, it's an exercise that's really important to be doing. There's definitely a lot of other directions that we can take this conversation, and perhaps we'll have to reconvene for a deeper dive on some of these subtopics. But in each of your experience
[00:51:22] Unknown:
of working with and building out data platforms and building out products oriented for data teams and end users of data, what are some of the most interesting or unexpected or challenging lessons that you've each learned in the process? And Lior, s, why don't we start with you? I think it goes to my previous point about, like, first of all, just, like, keeping track of who's using which like, which 1 of the data platform tools or features that we provide and making sure they're equipped with the best practices and they're sharing the ownership and accountability of those, like, metrics that they, you know, follow. Because, you know, you turn your back and you suddenly realize it's like a terrible incident that, you know, is caused by a team that uses something that a different team created and then, you know, ties that back to the data platform team. So to make sure that whomever is picking up any of the tools, any of the features from data platform, they have, like, you know, some sort of cadence or share accountability where they meet together and they look into the metrics. They provide feedback as users of the platform, and we can continuously iterate and improve on it. And to make sure that they, you know, whomever they're handing that data or who's they are downstream is continuously using it following those best practices.
[00:52:31] Unknown:
How about you, Atul? As far as your experience of building and managing and evolving these data platforms, what are some of the most interesting or unexpected or challenging lessons that you've learned? I think as we design, build, and manage these, data platforms at Uber, we saw how they could truly democratize data and insights. I know that word is overused, but I really saw how that worked. We built numerous internal tools and solutions, and we found that even though we targeted these towards, you know, some of the more technical users in our audience, right, the data scientists, the data analysts, maybe even the engineers. We saw a lot of less technical people start to adopt these tools because they discovered the power that they had to derive these insights from underlying data because those insights were what helped them drive their function forward. Right? That function could be marketing. It could be, you know, Uber's case, like working with our drivers and other partners.
But we saw that these tools could really improve their efficiency, their productivity, and sort of the quality of results that they can drive. Basically, these products really empowered this generation of users that weren't really, you know, the focus of these products. And so we kind of made these people more data literate. We infused sort of more data thinking into processes that may not necessarily have it. And I think it was a big plus for the organization. Right? It truly demonstrated the value of a well built, well designed data platform. It got us a lot more funding, a lot more visibility, and proved that, you know, data platforms are here to stay. I think for me, that was truly transformational, like, seeing how that can truly help the business move. Lior Gavish, how about you? For me, 1 of my biggest learnings was that data platforms are complicated
[00:54:03] Unknown:
and can go in many different directions. And so 1 of my base learnings was if you are investing in your data platform, I think it's worthwhile hiring people like Atul, like Leoras. They really have, you know, a profound understanding of how the platform is used and can really narrow down the scope to the things that really, really matter and focus priorities there, I think that's how you get the most out of the resources that you're investing into the data platform and can also get to some of the the outcomes that Atul just described where the data platform actually becomes a force multiplier for the entire organization. So but to do that, you need to be very focused and then really understand how the system gets used. As far as
[00:54:48] Unknown:
the overall trends in industry and the organizations that you've worked with, what are some of the elements that you are keeping an eye on specifically for the purposes of helping your organizations and your customers build out meaningful data platforms that can help to power the organizations that they're intended to support?
[00:55:10] Unknown:
I think there's something that Lior has touched upon that I think is very relevant, which is insights about data and motion. I think a lot has been done about data at rest, as data processing and things like that. I think I think it's fairly mature, although there's always new innovations coming out. But I think for data and motion, it's very new. I think organizations are still trying to understand how they can use these signals, how can they leverage them, how can, you know, rebuild standardized tools that help, you know, extract these insights and this value from this data? I think for me, that's the most exciting space. And, of course, second is sort of machine learning based solutions. I think the word ML is thrown around quite a bit with a lot of different solutions and a lot of different, you know, contexts. There is true value to machine learning when used and applied in the right way to the right problems. And I'm seeing that more and more organizations are trying to leverage this as a tool or as a solution. I think for me, these are the 2 most exciting things to basically monitoring and extracting insights from data and motion and applying the power of machine learning to truly make a difference positive difference, hopefully.
[00:56:10] Unknown:
And are there any other aspects of this topic of data platforms and building them out as a product and helping organizations be able to realize the potential value of their data that we didn't discuss yet that you'd like to cover before we close out the show? I'd probably echo what Lior said. Data platforms are complex beasts,
[00:56:27] Unknown:
and they can be as simple or as complex as you want them to be. There's truly value if they're well designed, well put together, and they empower the right people in the right way. So definitely a topic that you can go really deep in. Each subsection of data platform has its own sort of backstory and value chain. I'd just say that. Alright. Well, for anybody who wants to get in touch with any of you and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And so now for the final question, what you see as being the biggest gap in the tooling or technology that's available for data management today? It's really interesting to see, like, right now, we're really trying to look into the concept of, like,
[00:57:06] Unknown:
data enrichment in real time. And then when you think about those concepts, you have, like, suddenly a lot of the business logic that used to be downstream around the world of VPLs and the data warehouse suddenly goes towards more towards their real time pipeline. And I'm curious about, like, you know, from a data observability and anomaly detection, how that's gonna be working more on the world real time. There's plenty of tools when it comes to data at rest.
[00:57:31] Unknown:
I'm more curious about, like, what sort of standardization is gonna be around the real time world. I'd say that there's a lot of different tools and solutions out there. And I think 1 of the challenges that I see is that they're all built off of very separate bases. And so, you know, it it isn't very easy to build interconnections at the platform level unless the systems themselves offer these integrations and sort of plug ins or extensions. I find that could be a tricky tricky bet, and I think that's the challenge that I see in the space. And, Lior, how about you? Observability, which is something I I spend most of my days on, I think
[00:58:05] Unknown:
1 of the challenges that I see I speak to a lot of companies that have some amount of legacy in place. And I think to really get the benefits of a data platform, you kinda need to start centralizing things and making them part of the platform. And so 1 of the biggest gaps is, like, how do you really bridge the 2? How do you get efficiently from running some of your analytics on, you know, on SAP and Salesforce and really switch to a more kind of modern stack that still allows everyone in the company to benefit from data. So I'm not sure how to solve this so that I know this will be useful. Alright. Well, thank you for taking the time today to join me and share the work that you've each been doing on building out data platforms and helping people be able to gain value from the organizational data that they're collecting. It's definitely a very
[00:58:58] Unknown:
complex and pertinent problem that all of us are facing. So thank you for taking the time to share your insights on that, and I hope you each enjoy the rest of your day. Thanks for having us, Tobias. Thanks,
[00:59:09] Unknown:
Tobias.
[00:59:13] Unknown:
For listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used. And visit the site of data engineering podcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Messages
Guest Introductions
Defining a Data Platform
Stakeholders and Responsibilities
Establishing and Driving the Vision
Supporting Multiple Use Cases
Modern Data Stack
Strategies for Building or Buying a Platform
Scaling the Platform
Data Platform as a Product
Measuring Success
Lessons Learned
Trends and Future Directions
Final Thoughts and Closing Remarks