Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Data Quality Management For The Whole Team With Soda Data - E178

Summary

Data quality is on the top of everyone’s mind recently, but getting it right is as challenging as ever. One of the contributing factors is the number of people who are involved in the process and the potential impact on the business if something goes wrong. In this episode Maarten…

Summary

Data quality is on the top of…

30 March 2021 | 00:58:00


Real World Change Data Capture At Datacoral - E177

Summary

The world of business is becoming increasingly dependent on information that is accurate up to the minute. For analytical systems, the only way to provide this reliably is by implementing change data capture (CDC). Unfortunately, this is a non-trivial undertaking, particularly for teams…

Summary

The world of business is becoming…

23 March 2021 | 00:49:58


Managing The DoorDash Data Platform - E176

Summary

The team at DoorDash has a complex set of optimization challenges to deal with using data that they collect from a multi-sided marketplace. In order to handle the volume and variety of information that they use to run and improve the business the data team has to build a platform that…

Summary

The team at DoorDash has a complex set of…

16 March 2021 | 00:46:05


Leave Your Data Where It Is And Automate Feature Extraction With Molecula - E175

Summary

A majority of the time spent in data engineering is copying data between systems to make the information available for different purposes. This introduces challenges such as keeping information synchronized, managing schema evolution, building transformations to match the expectations of…

Summary

A majority of the time spent in data…

09 March 2021 | 00:51:40


Bridging The Gap Between Machine Learning And Operations At Iguazio - E174

Summary

The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. In this episode Yaron Haviv, co-founder of Iguazio, discusses the complexities inherent to the process, as well as how he has worked to democratize…

Summary

The process of building and deploying…

02 March 2021 | 01:06:28


Self Service Open Source Data Integration With AirByte - E173

Summary

Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed platforms available, but the list of options for an open source system that supports a large variety of sources and destinations is still embarrasingly…

Summary

Data integration is a critical piece of…

23 February 2021 | 00:52:15


Building The Foundations For Data Driven Businesses at 5xData - E172

Summary

Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data that an organization collects, there are certain foundational capabilities that they need to have capacity for. In order to help more businesses…

Summary

Every business aims to be data driven,…

16 February 2021 | 00:52:16


How Shopify Is Building Their Production Data Warehouse Using DBT - E171

Summary

With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of the best ways to get a true understanding of how a technology works in practice is to hear from people who are running it in production. In this…

Summary

With all of the tools and services…

09 February 2021 | 00:46:31


System Observability For The Cloud Native Era With Chronosphere - E170

Summary

Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of…

Summary

Collecting and processing metrics for…

02 February 2021 | 01:04:50


Making It Easier To Stick B2B Data Integration Pipelines Together With Hotglue - E169

Summary

Businesses often need to be able to ingest data from their customers in order to power the services that they provide. For each new source that they need to integrate with it is another custom set of ETL tasks that they need to maintain. In order to reduce the friction involved in…

Summary

Businesses often need to be able to…

26 January 2021 | 00:34:05