Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

463 Episodes

Speed Up And Simplify Your Streaming Data Workloads With Red Panda - E152

Summary

Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To…

Summary

Kafka has become a de facto standard…

29 September 2020 | 00:59:41


Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor - E151

Summary

Data engineering is a constantly growing and evolving discipline. There are always new tools, systems, and design patterns to learn, which leads to a great deal of confusion for newcomers. Daniel Molnar has dedicated his time to helping data professionals get back to basics through…

Summary

Data engineering is a constantly growing…

22 September 2020 | 00:47:40


Distributed In Memory Processing And Streaming With Hazelcast - E150

Summary

In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is a platform for managing stateful in-memory storage and computation across a distributed cluster of commodity hardware. On top of this foundation, the…

Summary

In memory computing provides significant…

15 September 2020 | 00:44:07


Simplify Your Data Architecture With The Presto Distributed SQL Engine - E149

Summary

Databases are limited in scope to the information that they directly contain. For analytical use cases you often want to combine data across multiple sources and storage locations. This frequently requires cumbersome and time-consuming data integration. To address this problem Martin…

Summary

Databases are limited in scope to the…

07 September 2020 | 00:53:59


Building A Better Data Warehouse For The Cloud At Firebolt - E148

Summary

Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation of compute and storage.…

Summary

Data warehouse technology has been around…

01 September 2020 | 01:05:51


Metadata Management And Integration At LinkedIn With DataHub - E147

Summary

In order to scale the use of data across an organization there are a number of challenges related to discovery, governance, and integration that need to be solved. The key to those solutions is a robust and flexible metadata management system. LinkedIn has gone through several iterations on…

Summary

In order to scale the use of data across…

25 August 2020 | 00:51:04


Exploring The TileDB Universal Data Engine - E146

Summary

Most databases are designed to work with textual data, with some special purpose engines that support domain specific formats. TileDB is a data engine that was built to support every type of data by using multi-dimensional arrays as the foundational primitive. In this episode the creator…

Summary

Most databases are designed to work with…

17 August 2020 | 01:05:44


Closing The Loop On Event Data Collection With Iteratively - E145

Summary

Event based data is a rich source of information for analytics, unless none of the event structures are consistent. The team at Iteratively are building a platform to manage the end to end flow of collaboration around what events are needed, how to structure the attributes, and how they are…

Summary

Event based data is a rich source of…

10 August 2020 | 00:59:17


A Practical Introduction To Graph Data Applications - E144

Summary

Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. In this episode Denise Gosnell and Matthias…

Summary

Finding connections between data and the…

04 August 2020 | 01:00:43


Build More Reliable Distributed Systems By Breaking Them With Jepsen - E143

Summary

A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and…

Summary

A majority of the scalable data…

28 July 2020 | 00:49:38