Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

454 Episodes

Building The DataDog Platform For Processing Timeseries Data At Massive Scale - E113

Summary

DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to support their customers, they need to capture, process, and analyze massive amounts of timeseries data with a high degree of uptime and reliability. Vadim…

Summary

DataDog is one of the most successful…

30 December 2019 | 00:45:55


Building The Materialize Engine For Interactive Streaming Analytics In SQL - E112

Summary

Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of records. Data warehouses are optimized for batched writes and complex analytical queries. Between those use cases there are varying levels of support for…

Summary

Transactional databases used in…

23 December 2019 | 00:48:07


Solving Data Lineage Tracking And Data Discovery At WeWork - E111

Summary

Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data…

Summary

Building clean datasets with reliable and…

16 December 2019 | 01:01:53


SnowflakeDB: The Data Warehouse Built For The Cloud - E110

Summary

Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage engines, to the current generation of cloud-native analytical engines. SnowflakeDB has been leading the charge to take advantage of cloud services that…

Summary

Data warehouses have gone through many…

09 December 2019 | 00:58:57


Organizing And Empowering Data Engineers At Citadel - E109

Summary

The financial industry has long been driven by data, requiring a mature and robust capacity for discovering and integrating valuable sources of information. Citadel is no exception, and in this episode Michael Watson and Robert Krzyzanowski share their experiences managing and leading the…

Summary

The financial industry has long been…

03 December 2019 | 00:45:50


Building A Real Time Event Data Warehouse For Sentry - E108

Summary

The team at Sentry has built a platform for anyone in the world to send software errors and events. As they scaled the volume of customers and data they began running into the limitations of their initial architecture. To address the needs of their business and continue to improve their…

Summary

The team at Sentry has built a platform…

26 November 2019 | 01:01:15


Escaping Analysis Paralysis For Your Data Platform With Data Virtualization - E107

Summary

With the constant evolution of technology for data management it can seem impossible to make an informed decision about whether to build a data warehouse, or a data lake, or just leave your data wherever it currently rests. What’s worse is that any time you have to migrate to a new…

Summary

With the constant evolution of technology…

18 November 2019 | 00:55:42


Designing For Data Protection - E106

Summary

The practice of data management is one that requires technical acumen, but there are also many policy and regulatory issues that inform and influence the design of our systems. With the introduction of legal frameworks such as the EU GDPR and California’s CCPA it is necessary to…

Summary

The practice of data management is one…

11 November 2019 | 00:51:24


Automating Your Production Dataflows On Spark - E105

Summary

As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break or degrade that have nothing to do with the business logic or data transformations that we write and maintain. Sean Knapp founded Ascend to address the…

Summary

As data engineers the health of our…

04 November 2019 | 00:48:51


Build Maintainable And Testable Data Applications With Dagster - E104

Summary

Despite the fact that businesses have relied on useful and accurate data to succeed for decades now, the state of the art for obtaining and maintaining that information still leaves much to be desired. In an effort to create a better abstraction for building data applications Nick Schrock…

Summary

Despite the fact that businesses have…

28 October 2019 | 01:07:49