Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

463 Episodes

Wallaroo with Sean T. Allen - Episode 12 - E12

Summary

Data oriented applications that need to operate on large, fast-moving sterams of information can be difficult to build and scale due to the need to manage their state. In this episode Sean T. Allen, VP of engineering for Wallaroo Labs, explains how Wallaroo was designed and built to reduce the cognitive overhead of building this…

Summary

Data oriented applications that need to operate on large, fast-moving sterams of…

25 December 2017 | 00:59:13


SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden - Episode 11 - E11

Summary

Time series databases have long been the cornerstone of a robust metrics system, but the existing options are often difficult to manage in production. In this episode Jeroen van der Heijden explains his motivation for writing a new database, SiriDB, the challenges that he faced in doing so, and how it works under the…

Summary

Time series databases have long been the cornerstone of a robust metrics system,…

18 December 2017 | 00:33:52


Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10 - E10

Summary

To process your data you need to know what shape it has, which is why schemas are important. When you are processing that data in multiple systems it can be difficult to ensure that they all have an accurate representation of that schema, which is why Confluent has built a schema registry that plugs into Kafka. In this episode…

Summary

To process your data you need to know what shape it has, which is why schemas…

10 December 2017 | 00:49:22


data.world with Bryon Jacob - Episode 9 - E9

Summary

We have tools and platforms for collaborating on software projects and linking them together, wouldn’t it be nice to have the same capabilities for data? The team at data.world are working on building a platform to host and share data sets for public and private use that can be linked together to build a semantic web of…

Summary

We have tools and platforms for collaborating on software projects and linking…

03 December 2017 | 00:46:24


Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8 - E8

Summary

With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They…

Summary

With the wealth of formats for sending and storing data it can be difficult…

22 November 2017 | 00:51:43


Buzzfeed Data Infrastructure with Walter Menendez - Episode 7 - E7

Summary

Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This lets them produce new content that will continue to be well-received. To surface the insights that they need to grow their business they need a robust data infrastructure to reliably capture…

Summary

Buzzfeed needs to be able to understand how its users are interacting with the…

14 November 2017 | 00:43:40


Astronomer with Ry Walker - Episode 6 - E6

Summary

Building a data pipeline that is reliable and flexible is a difficult task, especially when you have a small team. Astronomer is a platform that lets you skip straight to processing your valuable business data. Ry Walker, the CEO of Astronomer, explains how the company got started, how the platform works, and their commitment to…

Summary

Building a data pipeline that is reliable and flexible is a difficult task,…

06 August 2017 | 00:42:50


Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5 - E5

Summary

Yelp needs to be able to consume and process all of the user interactions that happen in their platform in as close to real-time as possible. To achieve that goal they embarked on a journey to refactor their monolithic architecture to be more modular and modern, and then they open sourced it! In this episode Justin Cunningham…

Summary

Yelp needs to be able to consume and process all of the user interactions that…

18 June 2017 | 00:42:28


ScyllaDB with Eyal Gutkind - Episode 4 - E4

Summary

If you like the features of Cassandra DB but wish it ran faster with fewer resources then ScyllaDB is the answer you have been looking for. In this episode Eyal Gutkind explains how Scylla was created and how it differentiates itself in the crowded database market.

Preamble

  • Hello and welcome to the Data…

Summary

If you like the features of Cassandra DB but wish it ran faster with fewer…

18 March 2017 | 00:35:07


Defining Data Engineering with Maxime Beauchemin - Episode 3 - E3

Summary

What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this episode, Maxime Beauchemin joins me to discuss these questions and more.

Summary

What exactly is data engineering? How has it evolved in recent years and where…

05 March 2017 | 00:45:21