Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Lessons Learned From The Pipeline Data Engineering Academy - E198

Summary

Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that, Daniel Molnar and Peter Fabian started the Pipeline Academy to do exactly that. In this episode they reflect on the lessons that they learned while…

Summary

Data Engineering is a broad and…

26 June 2021 | 01:11:04


Make Database Performance Optimization A Playful Experience With OtterTune - E197

Summary

The database is the core of any system because it holds the data that drives your entire experience. We spend countless hours designing the data model, updating engine versions, and tuning performance. But how confident are you that you have configured it to be as performant as possible,…

Summary

The database is the core of any system…

23 June 2021 | 00:58:28


Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk - E196

Summary

Working with unstructured data has typically been a motivation for a data lake. The challenge is imposing enough order on the platform to make it useful. Kirk Marple has spent years working with data systems and the media industry, which inspired him to build a platform for automatically…

Summary

Working with unstructured data has…

18 June 2021 | 00:40:48


Accelerating ML Training And Delivery With In-Database Machine Learning - E195

Summary

When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don’t have to move the information?…

Summary

When you build a machine learning model,…

15 June 2021 | 01:05:33


Taking A Tour Of The Google Cloud Platform For Data And Analytics - E194

Summary

Google pioneered an impressive number of the architectural underpinnings of the broader big data ecosystem. Now they offer the technologies that they run internally to external users of their cloud platform. In this episode Lak Lakshmanan enumerates the variety of services that are…

Summary

Google pioneered an impressive number of…

12 June 2021 | 00:53:17


Make Sure Your Records Are Reliable With The BookKeeper Distributed Storage Layer - E193

Summary

The way to build maintainable software and systems is through composition of individual pieces. By making those pieces high quality and flexible they can be used in surprising ways that the original creators couldn’t have imagined. One such component that has gone above and beyond its…

Summary

The way to build maintainable software…

09 June 2021 | 00:42:01


Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook - E192

Summary

SQL is the most widely used language for working with data, and yet the tools available for writing and collaborating on it are still clunky and inefficient. Frustrated with the lack of a modern IDE and collaborative workflow for managing the SQL queries and analysis of their big data…

Summary

SQL is the most widely used language for…

03 June 2021 | 00:52:36


Making Data Pipelines Self-Serve For Everyone With Shipyard - E191

Summary

Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipelines to transform, clean, and integrate it. In order for the true value of your data to be realized without burning out your engineers you need a way…

Summary

Every part of the business relies on…

02 June 2021 | 00:51:23


Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse - E190

Summary

The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage. Yellowbrick is a data warehouse…

Summary

The data warehouse has become the focal…

28 May 2021 | 00:52:41


Easily Build Advanced Similarity Search With The Pinecone Vector Database - E189

Summary

Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the models to integrate with external systems their internal state has to be translated into a lower dimension. To eliminate this impedance mismatch Edo…

Summary

Machine learning models use vectors as…

25 May 2021 | 00:46:48