Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications - E317

Summary

Data has permeated every aspect of our lives and the products that we interact with. As a result, end users and customers have come to expect interactions and updates with services and analytics to be fast and up to date. In this episode Shruti Bhat gives her view on the state of the…

Summary

Data has permeated every aspect of our…

22 August 2022 | 01:06:20


Understanding The Role Of The Chief Data Officer - E318

Summary

The position of Chief Data Officer (CDO) is relatively new in the business world and has not been universally adopted. As a result, not everyone understands what the responsibilities of the role are, when you need one, and how to hire for it. In this episode Tracy Daniels, CDO of Truist,…

Summary

The position of Chief Data Officer (CDO)…

22 August 2022 | 00:47:11


Bringing Automation To Data Labeling For Machine Learning With Watchful - E316

Summary

Data engineers have typically left the process of data labeling to data scientists or other roles because of its nature as a manual and process heavy undertaking, focusing instead on building automation and repeatable systems. Watchful is a platform to make labeling a repeatable and…

Summary

Data engineers have typically left the…

14 August 2022 | 01:20:29


Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery - E315

Summary

Data is useless if it isn’t being used, and you can’t use it if you don’t know where it is. Data catalogs were the first solution to this problem, but they are only helpful if you know what you are looking for. In this episode Shinji Kim discusses the challenges of data…

Summary

Data is useless if it isn’t being…

14 August 2022 | 00:53:24


Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab - E314

Summary

Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural pattern. The team at AgileLab have first-hand experience helping large enterprise organizations evaluate and implement their own data mesh strategies. In…

Summary

Data mesh is a frequent topic of…

06 August 2022 | 00:48:31


Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus - E313

Summary

The optimal format for storage and retrieval of data is dependent on how it is going to be used. For analytical systems there are decades of investment in data warehouses and various modeling techniques. For machine learning applications relational models require additional processing to be…

Summary

The optimal format for storage and…

06 August 2022 | 00:58:52


Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda - E312

Summary

Exploratory data analysis works best when the feedback loop is fast and iterative. This is easy to achieve when you are working on small datasets, but as they scale up beyond what can fit on a single machine those short iterations quickly become long and tedious. The Arkouda project is a…

Summary

Exploratory data analysis works best when…

31 July 2022 | 00:40:37


What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta - E311

Summary

Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Because of its centrality to your data systems it is valuable for debugging, governance,…

Summary

Data lineage is the roadmap for your data…

31 July 2022 | 01:05:18


Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster - E310

Summary

The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being incorporated into various tools. This complicates the work involved in making end-to-end workflows visible and integrated. Dagster has invested in…

Summary

The current stage of evolution in the…

24 July 2022 | 00:58:14


Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering - E309

Summary

Data engineering is a difficult job, requiring a large number of skills that often don’t overlap. Any effort to understand how to start a career in the role has required stitching together information from a multitude of resources that might not all agree with each other. In order to…

Summary

Data engineering is a difficult job,…

24 July 2022 | 01:01:02