Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

444 Episodes

Exploring The Design And Benefits Of The Modern Data Stack - E203

Summary

We have been building platforms and workflows to store, process, and analyze data since the earliest days of computing. Over that time there have been countless architectures, patterns, and "best practices" to make that task manageable. With the growing popularity of cloud…

Summary

We have been building platforms and…

13 July 2021 | 00:49:02


Democratize Data Cleaning Across Your Organization With Trifacta - E202

Summary

Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning,…

Summary

Every data project, whether it’s…

09 July 2021 | 01:07:13


Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager - E201

Summary

At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a…

Summary

At the core of every data pipeline is an…

05 July 2021 | 00:55:31


Leveling Up Open Source Data Integration With Meltano Hub And The Singer SDK - E200

Summary

Data integration in the form of extract and load is the critical first step of every data project. There are a large number of commercial and open source projects that offer that capability but it is still far from being a solved problem. One of the most promising community efforts is that…

Summary

Data integration in the form of extract…

03 July 2021 | 01:05:24


A Candid Exploration Of Timeseries Data Analysis With InfluxDB - E199

Summary

While the overall concept of timeseries data is uniform, its usage and applications are far from it. One of the most demanding applications of timeseries data is for application and server monitoring due to the problem of high cardinality. In his quest to build a generalized platform for…

Summary

While the overall concept of timeseries…

29 June 2021 | 01:06:03


Lessons Learned From The Pipeline Data Engineering Academy - E198

Summary

Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that, Daniel Molnar and Peter Fabian started the Pipeline Academy to do exactly that. In this episode they reflect on the lessons that they learned while…

Summary

Data Engineering is a broad and…

26 June 2021 | 01:11:04


Make Database Performance Optimization A Playful Experience With OtterTune - E197

Summary

The database is the core of any system because it holds the data that drives your entire experience. We spend countless hours designing the data model, updating engine versions, and tuning performance. But how confident are you that you have configured it to be as performant as possible,…

Summary

The database is the core of any system…

23 June 2021 | 00:58:28


Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk - E196

Summary

Working with unstructured data has typically been a motivation for a data lake. The challenge is imposing enough order on the platform to make it useful. Kirk Marple has spent years working with data systems and the media industry, which inspired him to build a platform for automatically…

Summary

Working with unstructured data has…

18 June 2021 | 00:40:48


Accelerating ML Training And Delivery With In-Database Machine Learning - E195

Summary

When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don’t have to move the information?…

Summary

When you build a machine learning model,…

15 June 2021 | 01:05:33


Taking A Tour Of The Google Cloud Platform For Data And Analytics - E194

Summary

Google pioneered an impressive number of the architectural underpinnings of the broader big data ecosystem. Now they offer the technologies that they run internally to external users of their cloud platform. In this episode Lak Lakshmanan enumerates the variety of services that are…

Summary

Google pioneered an impressive number of…

12 June 2021 | 00:53:17