Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Surveying The Market Of Database Products - E398

Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she…

Summary Databases are the core of most applications, whether transactional or analytical. In recent…

30 October 2023 | 00:47:12


Defining A Strategy For Your Data Products - E397

Summary The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the…

Summary The primary application of data has moved beyond analytics. With the broader audience comes…

23 October 2023 | 01:03:50


Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable - E396

Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems…

Summary Building streaming applications has gotten substantially easier over the past several years.…

15 October 2023 | 01:08:29


Using Data To Illuminate The Intentionally Opaque Insurance Industry - E395

Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual…

Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact…

09 October 2023 | 00:51:58


Building ETL Pipelines With Generative AI - E394

Summary Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reached the level of sophistication seen in the various generative models it is being used to build new ETL workflows. In this episode Jay Mishra shares his experiences and insights building ETL pipelines with…

Summary Artificial intelligence applications require substantial high quality data, which is…

01 October 2023 | 00:51:37


Powering Vector Search With Real Time And Incremental Vector Indexes - E393

Summary The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector…

Summary The rapid growth of machine learning, especially large language models, have led to a…

25 September 2023 | 00:59:16


Building Linked Data Products With JSON-LD - E392

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for…

Summary A significant amount of time in data engineering is dedicated to building connections and…

17 September 2023 | 01:01:31


An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem - E391

Summary Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this…

Summary Data systems are inherently complex and often require integration of multiple technologies.…

10 September 2023 | 01:01:26


Eliminate The Overhead In Your Data Integration With The Open Source dlt Library - E390

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to…

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of…

04 September 2023 | 00:42:13


Building An Internal Database As A Service Platform At Cloudflare - E389

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low…

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of…

28 August 2023 | 01:01:10