Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

444 Episodes

Powering Vector Search With Real Time And Incremental Vector Indexes - E393

Summary The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector…

Summary The rapid growth of machine learning, especially large language models, have led to a…

25 September 2023 | 00:59:16


Building Linked Data Products With JSON-LD - E392

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for…

Summary A significant amount of time in data engineering is dedicated to building connections and…

17 September 2023 | 01:01:31


An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem - E391

Summary Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this…

Summary Data systems are inherently complex and often require integration of multiple technologies.…

10 September 2023 | 01:01:26


Eliminate The Overhead In Your Data Integration With The Open Source dlt Library - E390

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to…

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of…

04 September 2023 | 00:42:13


Building An Internal Database As A Service Platform At Cloudflare - E389

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low…

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of…

28 August 2023 | 01:01:10


Harnessing Generative AI For Creating Educational Content With Illumidesk - E388

Summary Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating…

Summary Generative AI has unlocked a massive opportunity for content creation. There is also an…

20 August 2023 | 00:54:52


Unpacking The Seven Principles Of Modern Data Pipelines - E387

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your…

Summary Data pipelines are the core of every data product, ML model, and business intelligence…

14 August 2023 | 00:47:03


Quantifying The Return On Investment For Your Data Team - E386

Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your…

Summary As businesses increasingly invest in technology and talent focused on data engineering and…

06 August 2023 | 01:01:53


Strategies For A Successful Data Platform Migration - E385

Summary All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that…

Summary All software systems are in a constant state of evolution. This makes it impossible to…

31 July 2023 | 01:09:53


Build Real Time Applications With Operational Simplicity Using Dozer - E384

Summary Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. Despite that, it is still a complex set of capabilities. To bring streaming data in reach of application engineers Matteo Pelati helped to create Dozer. In this episode he explains how investing in high performance…

Summary Real-time data processing has steadily been gaining adoption due to advances in the…

24 July 2023 | 00:40:43