Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

454 Episodes

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus - E313

Summary

The optimal format for storage and retrieval of data is dependent on how it is going to be used. For analytical systems there are decades of investment in data warehouses and various modeling techniques. For machine learning applications relational models require additional processing to be…

Summary

The optimal format for storage and…

06 August 2022 | 00:58:52


Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda - E312

Summary

Exploratory data analysis works best when the feedback loop is fast and iterative. This is easy to achieve when you are working on small datasets, but as they scale up beyond what can fit on a single machine those short iterations quickly become long and tedious. The Arkouda project is a…

Summary

Exploratory data analysis works best when…

31 July 2022 | 00:40:37


What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta - E311

Summary

Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Because of its centrality to your data systems it is valuable for debugging, governance,…

Summary

Data lineage is the roadmap for your data…

31 July 2022 | 01:05:18


Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster - E310

Summary

The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being incorporated into various tools. This complicates the work involved in making end-to-end workflows visible and integrated. Dagster has invested in…

Summary

The current stage of evolution in the…

24 July 2022 | 00:58:14


Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering - E309

Summary

Data engineering is a difficult job, requiring a large number of skills that often don’t overlap. Any effort to understand how to start a career in the role has required stitching together information from a multitude of resources that might not all agree with each other. In order to…

Summary

Data engineering is a difficult job,…

24 July 2022 | 01:01:02


Making The Total Cost Of Ownership For External Data Manageable With Crux - E307

Summary

There are extensive and valuable data sets that are available outside the bounds of your organization. Whether that data is public, paid, or scraped it requires investment and upkeep to acquire and integrate it with your systems. Crux was built to reduce the total cost of acquisition and…

Summary

There are extensive and valuable data…

17 July 2022 | 01:07:12


Joe Reis Flips The Script And Interviews Tobias Macey About The Data Engineering Podcast - E308

Summary

Data engineering is a large and growing subject, with new technologies, specializations, and "best practices" emerging at an accelerating pace. This podcast does its best to explore this fractal ecosystem, and has been at it for the past 5+ years. In this episode Joe Reis, founder…

Summary

Data engineering is a large and growing…

17 July 2022 | 00:56:39


Charting the Path of Riskified's Data Platform Journey - E306

Summary

Building a data platform is a journey, not a destination. Beyond the work of assembling a set of technologies and building integrations across them, there is also the work of growing and organizing a team that can support and benefit from that platform. In this episode Inbar Yogev and Lior…

Summary

Building a data platform is a journey,…

10 July 2022 | 00:39:57


Maintain Your Data Engineers' Sanity By Embracing Automation - E305

Summary

Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to implement, requiring data professionals to be experts in a wide range of disparate topics while designing and implementing complex topologies of…

Summary

Building and maintaining reliable data…

10 July 2022 | 01:05:08


Be Confident In Your Data Integration By Quickly Validating Matching Records With data-diff - E304

Summary

The perennial challenge of data engineers is ensuring that information is integrated reliably. While it is straightforward to know whether a synchronization process succeeded, it is not always clear whether every record was copied correctly. In order to quickly identify if and how two data…

Summary

The perennial challenge of data engineers…

03 July 2022 | 01:10:57