Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
Skip 30 seconds ahead

Listen in your favorite app:

More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

454 Episodes

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus - E313


The optimal format for storage and retrieval of data is dependent on how it is going to be used. For analytical systems there are decades of investment in data warehouses and various modeling techniques. For machine learning applications relational models require additional processing to be…


The optimal format for storage and…

06 August 2022 | 00:58:52

Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda - E312


Exploratory data analysis works best when the feedback loop is fast and iterative. This is easy to achieve when you are working on small datasets, but as they scale up beyond what can fit on a single machine those short iterations quickly become long and tedious. The Arkouda project is a…


Exploratory data analysis works best when…

31 July 2022 | 00:40:37

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta - E311


Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Because of its centrality to your data systems it is valuable for debugging, governance,…


Data lineage is the roadmap for your data…

31 July 2022 | 01:05:18

Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster - E310


The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being incorporated into various tools. This complicates the work involved in making end-to-end workflows visible and integrated. Dagster has invested in…


The current stage of evolution in the…

24 July 2022 | 00:58:14

Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering - E309


Data engineering is a difficult job, requiring a large number of skills that often don’t overlap. Any effort to understand how to start a career in the role has required stitching together information from a multitude of resources that might not all agree with each other. In order to…


Data engineering is a difficult job,…

24 July 2022 | 01:01:02

Making The Total Cost Of Ownership For External Data Manageable With Crux - E307


There are extensive and valuable data sets that are available outside the bounds of your organization. Whether that data is public, paid, or scraped it requires investment and upkeep to acquire and integrate it with your systems. Crux was built to reduce the total cost of acquisition and…


There are extensive and valuable data…

17 July 2022 | 01:07:12

Joe Reis Flips The Script And Interviews Tobias Macey About The Data Engineering Podcast - E308


Data engineering is a large and growing subject, with new technologies, specializations, and "best practices" emerging at an accelerating pace. This podcast does its best to explore this fractal ecosystem, and has been at it for the past 5+ years. In this episode Joe Reis, founder…


Data engineering is a large and growing…

17 July 2022 | 00:56:39

Charting the Path of Riskified's Data Platform Journey - E306


Building a data platform is a journey, not a destination. Beyond the work of assembling a set of technologies and building integrations across them, there is also the work of growing and organizing a team that can support and benefit from that platform. In this episode Inbar Yogev and Lior…


Building a data platform is a journey,…

10 July 2022 | 00:39:57

Maintain Your Data Engineers' Sanity By Embracing Automation - E305


Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to implement, requiring data professionals to be experts in a wide range of disparate topics while designing and implementing complex topologies of…


Building and maintaining reliable data…

10 July 2022 | 01:05:08

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-diff - E304


The perennial challenge of data engineers is ensuring that information is integrated reliably. While it is straightforward to know whether a synchronization process succeeded, it is not always clear whether every record was copied correctly. In order to quickly identify if and how two data…


The perennial challenge of data engineers…

03 July 2022 | 01:10:57