Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Build Better Tests For Your dbt Projects With Datafold And data-diff - E378

Summary Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt…

Summary Data engineering is all about building workflows, pipelines, systems, and interfaces to…

11 June 2023 | 00:48:22


Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service - E377

Summary A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation,…

Summary A significant portion of the time spent by data engineering teams is on managing the…

04 June 2023 | 00:54:06


A Roadmap To Bootstrapping The Data Team At Your Startup - E376

Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his…

Summary Building a data team is hard in any circumstance, but at a startup it can be even more…

29 May 2023 | 00:42:32


Keep Your Data Lake Fresh With Real Time Streams Using Estuary - E375

Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argue that stream processing engines can easily handle batched workloads, but the reverse isn't true. The batch world has been the default for years because of the complexities of running a reliable…

Summary Batch vs. streaming is a long running debate in the world of data integration and…

21 May 2023 | 00:55:51


What Happens When The Abstractions Leak On Your Data - E374

Summary All of the advancements in our technology is based around the principles of abstraction. These are valuable until they break down, which is an inevitable occurrence. In this episode the host Tobias Macey shares his reflections on recent experiences where the abstractions leaked and some observances on how to deal with that situation in a…

Summary All of the advancements in our technology is based around the principles of abstraction.…

15 May 2023 | 00:26:42


Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify - E373

Summary Every business has customers, and a critical element of success is understanding who they are and how they are using the companies products or services. The challenge is that most companies have a multitude of systems that contain fragments of the customer's interactions and stitching that together is complex and time consuming. Segment…

Summary Every business has customers, and a critical element of success is understanding who they…

07 May 2023 | 00:54:35


Realtime Data Applications Made Easier With Meroxa - E372

Summary Real-time capabilities have quickly become an expectation for consumers. The complexity of providing those capabilities is still high, however, making it more difficult for small teams to compete. Meroxa was created to enable teams of all sizes to deliver real-time data applications. In this episode DeVaris Brown discusses the types of…

Summary Real-time capabilities have quickly become an expectation for consumers. The complexity of…

24 April 2023 | 00:45:26


Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic - E371

Summary Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and become more accessible, the target of what self-serve means changes. With the availability of AI powered by large language models combined with the evolution of semantic layers, the team at Zenlytic have…

Summary Business intellingence has been chasing the promise of self-serve data for decades. As the…

16 April 2023 | 00:49:19


An Exploration Of The Composable Customer Data Platform - E370

Summary The customer data platform is a category of services that was developed early in the evolution of the current era of cloud services for data processing. When it was difficult to wire together the event collection, data modeling, reporting, and activation it made sense to buy monolithic products that handled every stage of the customer data…

Summary The customer data platform is a category of services that was developed early in the…

10 April 2023 | 01:11:42


Mapping The Data Infrastructure Landscape As A Venture Capitalist - E369

Summary The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has…

Summary The data ecosystem has been building momentum for several years now. As a venture capital…

03 April 2023 | 01:01:57