Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

454 Episodes

Evolving An ETL Pipeline For Better Productivity - E83

Summary

Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he…

Summary

Building an ETL pipeline can be a…

04 June 2019 | 01:02:22


Data Lineage For Your Pipelines - E82

Summary

Some problems in data are well defined and benefit from a ready-made set of tools. For everything else, there’s Pachyderm, the platform for data science that is built to scale. In this episode Joe Doliner, CEO and co-founder, explains how Pachyderm started as an attempt to make data…

Summary

Some problems in data are well defined…

27 May 2019 | 00:49:01


Build Your Data Analytics Like An Engineer With DBT - E81

Summary

In recent years the traditional approach to building data warehouses has shifted from transforming records before loading, to transforming them afterwards. As a result, the tooling for those transformations needs to be reimagined. The data build tool (dbt) is designed to bring battle tested…

Summary

In recent years the traditional approach…

20 May 2019 | 00:56:46


Using FoundationDB As The Bedrock For Your Distributed Systems - E80

Summary

The database market continues to expand, offering systems that are suited to virtually every use case. But what happens if you need something customized to your application? FoundationDB is a distributed key-value store that provides the primitives that you need to build a custom database…

Summary

The database market continues to expand,…

07 May 2019 | 01:06:02


Running Your Database On Kubernetes With KubeDB - E79

Summary

Kubernetes is a driving force in the renaissance around deploying and running applications. However, managing the database layer is still a separate concern. The KubeDB project was created as a way of providing a simple mechanism for running your storage system in the same platform as your…

Summary

Kubernetes is a driving force in the…

29 April 2019 | 00:50:55


Unpacking Fauna: A Global Scale Cloud Native Database - E78

Summary

One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. FaunaDB is a cloud native database built by the engineers behind Twitter’s infrastructure and designed to serve the needs of modern systems. Evan Weaver is…

Summary

One of the biggest challenges for any…

22 April 2019 | 00:53:51


Index Your Big Data With Pilosa For Faster Analytics - E77

Summary

Database indexes are critical to ensure fast lookups of your data, but they are inherently tied to the database engine. Pilosa is rewriting that equation by providing a flexible, scalable, performant engine for building an index of your data to enable high-speed aggregate analysis. In this…

Summary

Database indexes are critical to ensure…

15 April 2019 | 00:43:42


Serverless Data Pipelines On DataCoral - E76

Summary

How much time do you spend maintaining your data pipeline? How much end user value does that provide? Raghu Murthy founded DataCoral as a way to abstract the low level details of ETL so that you can focus on the actual problem that you are trying to solve. In this episode he explains his…

Summary

How much time do you spend maintaining…

08 April 2019 | 00:53:42


Why Analytics Projects Fail And What To Do About It - E75

Summary

Analytics projects fail all the time, resulting in lost opportunities and wasted resources. There are a number of factors that contribute to that failure and not all of them are under our control. However, many of them are and as data engineers we can help to keep our projects on the path…

Summary

Analytics projects fail all the time,…

01 April 2019 | 00:36:30


Building An Enterprise Data Fabric At CluedIn - E74

Summary

Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur naturally across business units. The CluedIn team experienced this issue…

Summary

Data integration is one of the most…

25 March 2019 | 00:57:50