Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

485 Episodes

Open Source Reverse ETL For Everyone With Grouparoo - E254

Summary

Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own implementation of it. While struggling with the work of automating data integration workflows with marketing, sales, and support tools Brian Leonard…

Summary

Reverse ETL is a product category that…

08 January 2022 | 00:44:57


Data Observability Out Of The Box With Metaplane - E253

Summary

Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used so that you can proactively identify and fix errors in your workflows. In this episode Metaplane founder Kevin Hu shares his working definition of the…

Summary

Data observability is a set of technical…

08 January 2022 | 00:50:48


Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary - E252

Summary

Communication and shared context are the hardest part of any data system. In recent years the focus has been on data catalogs as the means for documenting data assets, but those introduce a secondary system of record in order to find the necessary information. In this episode Emily Riederer…

Summary

Communication and shared context are the…

02 January 2022 | 01:00:35


A Reflection On The Data Ecosystem For The Year 2021 - E251

Summary

This has been an active year for the data ecosystem, with a number of new product categories and substantial growth in existing areas. In an attempt to capture the zeitgeist Maura Church, David Wallace, Benn Stancil, and Gleb Mezhanskiy join the show to reflect on the past year and share…

Summary

This has been an active year for the data…

02 January 2022 | 01:03:29


Exploring The Evolving Role Of Data Engineers - E249

Summary

Data Engineering is still a relatively new field that is going through a continued evolution as new technologies are introduced and new requirements are understood. In this episode Maxime Beauchemin returns to revisit what it means to be a data engineer and how the role has changed over the…

Summary

Data Engineering is still a relatively…

27 December 2021 | 00:57:42


Revisiting The Technical And Social Benefits Of The Data Mesh - E250

Summary

The data mesh is a thesis that was presented to address the technical and organizational challenges that businesses face in managing their analytical workflows at scale. Zhamak Dehghani introduced the concepts behind this architectural patterns in 2019, and since then it has been gaining…

Summary

The data mesh is a thesis that was…

27 December 2021 | 01:10:53


Fast And Flexible Headless Data Analytics With Cube.JS - E248

Summary

One of the perennial challenges of data analytics is having a consistent set of definitions, along with a flexible and performant API endpoint for querying them. In this episode Artom Keydunov and Pavel Tiunov share their work on Cube.js and the various ways that it is being used in the…

Summary

One of the perennial challenges of data…

21 December 2021 | 00:54:43


Building A System Of Record For Your Organization's Data Ecosystem At Metaphor - E247

Summary

Building a well managed data ecosystem for your organization requires a holistic view of all of the producers, consumers, and processors of information. The team at Metaphor are building a fully connected metadata layer to provide both technical and social intelligence about your data. In…

Summary

Building a well managed data ecosystem…

20 December 2021 | 01:05:34


Building Auditable Spark Pipelines At Capital One - E246

Summary

Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large volumes of data Capital One has invested in it for their business needs. In this episode Gokul Prabagaren shares his use for it in calculating your…

Summary

Spark is a powerful and battle tested…

13 December 2021 | 00:42:10


Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform - E245

Summary

The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites and applications take that to the extreme and collect too much information. In order to make it easier for developers to build customer profiles in a…

Summary

The core to providing your users with…

12 December 2021 | 00:57:34