Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

454 Episodes

Make Sure Your Records Are Reliable With The BookKeeper Distributed Storage Layer - E193

Summary

The way to build maintainable software and systems is through composition of individual pieces. By making those pieces high quality and flexible they can be used in surprising ways that the original creators couldn’t have imagined. One such component that has gone above and beyond its…

Summary

The way to build maintainable software…

09 June 2021 | 00:42:01


Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook - E192

Summary

SQL is the most widely used language for working with data, and yet the tools available for writing and collaborating on it are still clunky and inefficient. Frustrated with the lack of a modern IDE and collaborative workflow for managing the SQL queries and analysis of their big data…

Summary

SQL is the most widely used language for…

03 June 2021 | 00:52:36


Making Data Pipelines Self-Serve For Everyone With Shipyard - E191

Summary

Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipelines to transform, clean, and integrate it. In order for the true value of your data to be realized without burning out your engineers you need a way…

Summary

Every part of the business relies on…

02 June 2021 | 00:51:23


Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse - E190

Summary

The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage. Yellowbrick is a data warehouse…

Summary

The data warehouse has become the focal…

28 May 2021 | 00:52:41


Easily Build Advanced Similarity Search With The Pinecone Vector Database - E189

Summary

Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the models to integrate with external systems their internal state has to be translated into a lower dimension. To eliminate this impedance mismatch Edo…

Summary

Machine learning models use vectors as…

25 May 2021 | 00:46:48


A Holistic Approach To Data Governance Through Self Reflection At Collibra - E188

Summary

Data governance is a phrase that means many different things to many different people. This is because it is actually a concept that encompasses the entire lifecycle of data, across all of the people in an organization who interact with it. Stijn Christiaens co-founded Collibra with the…

Summary

Data governance is a phrase that means…

21 May 2021 | 00:55:53


Unlocking The Power of Data Lineage In Your Platform with OpenLineage - E187

Summary

Data lineage is the common thread that ties together all of your data pipelines, workflows, and systems. In order to get a holistic understanding of your data quality, where errors are occurring, or how a report was constructed you need to track the lineage of the data from beginning to…

Summary

Data lineage is the common thread that…

18 May 2021 | 00:57:39


Building Your Data Warehouse On Top Of PostgreSQL - E186

Summary

There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you…

Summary

There is a lot of attention on the…

14 May 2021 | 01:15:07


Making Analytical APIs Fast With Tinybird - E185

Summary

Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wants to make it easy to turn a continuous stream of data into a production ready API or data product. In this episode CEO Jorge Sancha explains how…

Summary

Building an API for real-time data is a…

11 May 2021 | 00:54:24


Making Spark Cloud Native At Data Mechanics - E184

Summary

Spark is one of the most well-known frameworks for data processing, whether for batch or streaming, ETL or ML, and at any scale. Because of its popularity it has been deployed on every kind of platform you can think of. In this episode Jean-Yves Stephan shares the work that he is doing at…

Summary

Spark is one of the most well-known…

07 May 2021 | 00:40:16