Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

444 Episodes

Build Your Second Brain One Piece At A Time - E423

Summary Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful…

Summary Generative AI promises to accelerate the productivity of human collaborators. Currently the…

28 April 2024 | 00:50:10


Making Email Better With AI At Shortwave - E422

Summary Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his…

Summary Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee…

21 April 2024 | 00:53:43


Designing A Non-Relational Database Engine - E421

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing…

Summary Databases come in a variety of formats for different use cases. The default association with…

14 April 2024 | 01:16:02


Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer - E420

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological…

Summary Maintaining a single source of truth for your data is the biggest challenge in data…

07 April 2024 | 00:56:23


Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary - E419

Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different…

Summary Working with data is a complicated process, with numerous chances for something to go wrong.…

31 March 2024 | 00:50:44


Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+ - E418

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode…

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on…

24 March 2024 | 00:55:40


Reconciling The Data In Your Databases With Datafold - E417

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold,…

Summary A significant portion of data workflows involve storing and processing information in…

17 March 2024 | 00:58:14


Version Your Data Lakehouse Like Your Software With Nessie - E416

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond…

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost…

10 March 2024 | 00:40:55


When And How To Conduct An AI Program - E415

Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about…

Summary Artificial intelligence technologies promise to revolutionize business and produce new…

03 March 2024 | 00:46:25


Find Out About The Technology Behind The Latest PFAD In Analytical Database Development - E414

Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow…

Summary Building a database engine requires a substantial amount of engineering effort and time…

25 February 2024 | 00:56:01