Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Listen in your favorite app:

Podlink

More options

Amazon Music

Show RSS Feed

Click to copy to clipboard

Here are shows you might like

See show recommendations

AI Engineering Podcast
Tobias Macey

The Python Podcast.__init__
Tobias Macey

Logical First, Physical Second: A Pragmatic Path to Trusted Data - E498

Summary In this episode of the Data Engineering Podcast Jamie Knowles, Product Director for ER/Studio, talks about data architecture and its importance in driving business meaning. He discusses how data architecture should start with business meaning, not just physical schemas, and explores the pitfalls of jumping straight to physical designs.…

Summary In this episode of the Data Engineering Podcast Jamie Knowles, Product Director for…

25 January 2026 | 00:40:50

Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability - E497

Summary In this episode Jacob Leverich, cofounder and CTO of Observe, talks about applying lakehouse architectures to observability workloads. Jacob discusses Observe’s decision to leverage cloud-native warehousing and open table formats for scale and cost efficiency. He digs into the core pain points teams face with fragmented tools, soaring…

Summary In this episode Jacob Leverich, cofounder and CTO of Observe, talks about applying…

18 January 2026 | 01:12:21

Semantic Operators Meet Dataframes: Building Context for Agents with FENIC - E496

Summary In this episode Kostas Pardalis talks about Fenic - an open-source, PySpark-inspired dataframe engine designed to bring LLM-powered semantics into reliable data engineering workflows. Kostas shares why today’s data infrastructure assumptions (BI-first, expert-operated, CPU-bound) fall short for AI-era tasks that are increasingly inference-…

Summary In this episode Kostas Pardalis talks about Fenic - an open-source, PySpark-inspired…

12 January 2026 | 00:56:42

Beyond Dashboards: How Data Teams Earn a Seat at the Table - E495

Summary In this episode Goutham Budati about his Data–Perspective–Action framework and how it empowers data teams to become true business partners. Gautham traces his path from automating Excel reports to leading high‑impact data organizations, then breaks down why technical excellence alone isn’t enough: teams must pair reliable data systems with…

Summary In this episode Goutham Budati about his Data–Perspective–Action framework and how it…

05 January 2026 | 00:49:21

Unfreezing The Data Lake: The Future-Proof File Format - E494

Summary In this episode PhD researcher Xinyu Zeng talks about F3, the “future-proof file format” designed to address today’s hardware realities and evolving workloads. He digs into the limitations of Parquet and ORC - especially CPU-bound decoding, metadata overhead for wide-table projections, and poor random-access behavior for ML training and…

Summary In this episode PhD researcher Xinyu Zeng talks about F3, the “future-proof file format”…

29 December 2025 | 00:59:24

From Context to Semantics: How Metadata Powers Agentic AI - E493

Summary In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms are evolving from human-centric catalogs into the foundational context layer for AI and agentic systems. They discuss the origins and growth of OpenMetadata and Collate, why “context” is necessary but “semantics” is critical for precise AI outcomes,…

Summary In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms…

21 December 2025 | 01:06:17

From Data Engineering to AI Engineering: Where the Lines Blur - E492

Summary In this solo episode of the Data Engineering Podcast, host Tobias Macey reflects on how AI has transformed the practice and pace of data engineering over time. Starting from its origins in the Hadoop and cloud warehouse era, he explores the discipline's evolution through ML engineering and MLOps to today's blended boundaries between data,…

Summary In this solo episode of the Data Engineering Podcast, host Tobias Macey reflects on how AI…

14 December 2025 | 00:26:59

Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics - E491

Summary In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with data beyond SQL. Michael shares the origins of Malloy from his and Lloyd Tabb’s experience at Looker, why SQL’s mental model often fights human problem solving, and how Malloy aims to be a composable, maintainable language that treats SQL as the…

Summary In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with…

08 December 2025 | 00:58:48

Blurring Lines: Data, AI, and the New Playbook for Team Velocity - E490

Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and how just‑in‑time retrieval via MCP and CLIs lets agents gather what they…

Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering…

24 November 2025 | 01:00:57

State, Scale, and Signals: Rethinking Orchestration with Durable Execution - E489

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint,…

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable…

16 November 2025 | 00:51:46