Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Building A Better Data Warehouse For The Cloud At Firebolt - E148

Summary

Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation of compute and storage.…

Summary

Data warehouse technology has been around…

01 September 2020 | 01:05:51


Metadata Management And Integration At LinkedIn With DataHub - E147

Summary

In order to scale the use of data across an organization there are a number of challenges related to discovery, governance, and integration that need to be solved. The key to those solutions is a robust and flexible metadata management system. LinkedIn has gone through several iterations on…

Summary

In order to scale the use of data across…

25 August 2020 | 00:51:04


Exploring The TileDB Universal Data Engine - E146

Summary

Most databases are designed to work with textual data, with some special purpose engines that support domain specific formats. TileDB is a data engine that was built to support every type of data by using multi-dimensional arrays as the foundational primitive. In this episode the creator…

Summary

Most databases are designed to work with…

17 August 2020 | 01:05:44


Closing The Loop On Event Data Collection With Iteratively - E145

Summary

Event based data is a rich source of information for analytics, unless none of the event structures are consistent. The team at Iteratively are building a platform to manage the end to end flow of collaboration around what events are needed, how to structure the attributes, and how they are…

Summary

Event based data is a rich source of…

10 August 2020 | 00:59:17


A Practical Introduction To Graph Data Applications - E144

Summary

Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. In this episode Denise Gosnell and Matthias…

Summary

Finding connections between data and the…

04 August 2020 | 01:00:43


Build More Reliable Distributed Systems By Breaking Them With Jepsen - E143

Summary

A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and…

Summary

A majority of the scalable data…

28 July 2020 | 00:49:38


Making Wind Energy More Efficient With Data At Turbit Systems - E142

Summary

Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute…

Summary

Wind energy is an important component of…

21 July 2020 | 00:40:48


Open Source Production Grade Data Integration With Meltano - E141

Summary

The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that…

Summary

The first stage of every data pipeline is…

13 July 2020 | 01:05:19


DataOps For Streaming Systems With Lenses.io - E140

Summary

There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That’s what…

Summary

There are an increasing number of use…

06 July 2020 | 00:45:36


Data Collection And Management To Power Sound Recognition At Audio Analytic - E139

Summary

We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environment is a much greater challenge. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition…

Summary

We have machines that can listen to and…

30 June 2020 | 00:57:29