Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

481 Episodes

Duck Lake: Simplifying the Lakehouse Ecosystem - E480

Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like…

Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the…

10 September 2025 | 01:10:41


Aligning Business and Data: The Essential Role of Data Modeling - E479

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that…

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL…

01 September 2025 | 01:06:51


From Academia to Industry: Bridging Data Engineering Challenges - E478

Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and lineage, as well as the challenges of data integration. He explores…

Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of…

26 August 2025 | 00:50:54


High Performance And Low Overhead Graphs With KuzuDB - E477

Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. He discusses the usability and scalability of KuzuDB, emphasizing its…

Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB,…

18 August 2025 | 01:01:29


Bridging Data and Decision-Making: AI's Role in Modern Analytics - E476

Summary In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds in data analytics and how their experiences have shaped their…

Summary In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity…

12 August 2025 | 01:10:44


From Bits to Tables: The Evolution of S3 Storage - E475

Summary In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates to enhance storage capabilities, discussing the evolution of S3 from…

Summary In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative…

05 August 2025 | 00:50:08


Revolutionizing Python Notebooks with Marimo - E474

Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as…

Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the…

28 July 2025 | 00:51:56


Warehouse Native Incremental Data Processing With Dynamic Tables And Delayed View Semantics - E473

Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in warehouse environments. Dan discusses the challenges of handling continuously evolving datasets and the importance of incremental data processing for optimized resource use and reduced latency. He…

Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the…

21 July 2025 | 00:55:07


Streamlining Data Pipelines with MCP Servers and Vector Engines - E472

Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured…

Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about…

15 July 2025 | 00:52:04


Foundational Data Engineering At Two Sigma - E471

Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing data quality with delivery speed, and the socio-technical challenges…

Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data…

06 July 2025 | 00:55:05