Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

454 Episodes

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33 - E33

Summary

Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new requirements are added and greater scalability becomes necessary. Rather than duplicating the efforts of other engineers it might be best to use a hosted service to handle the plumbing so…

Summary

Building an ETL pipeline is a common need across businesses and industries.…

28 May 2018 | 00:47:50


PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32 - E32

Summary

Most businesses end up with data in a myriad of places with varying levels of structure. This makes it difficult to gain insights from across departments, projects, or people. Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse.…

Summary

Most businesses end up with data in a myriad of places with varying levels of…

21 May 2018 | 00:42:08


Brief Conversations From The Open Data Science Conference: Part 2 - Episode 31 - E31

Summary

The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up you’ll hear from Andy Eschbacher of Carto. He dscribes some of the complexities inherent to working with…

Summary

The Open Data Science Conference brings together a variety of data professionals…

14 May 2018 | 00:26:06


Brief Conversations From The Open Data Science Conference: Part 1 - Episode 30 - E30

Summary

The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up you’ll hear from Alan Anders, the CTO of Applecart about their challenges with getting Spark to scale for…

Summary

The Open Data Science Conference brings together a variety of data professionals…

07 May 2018 | 00:32:39


Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29 - E29

Summary

Business Intelligence software is often cumbersome and requires specialized knowledge of the tools and data to be able to ask and answer questions about the state of the organization. Metabase is a tool built with the goal of making the act of discovering information and asking questions of an organizations data easy and…

Summary

Business Intelligence software is often cumbersome and requires specialized…

30 April 2018 | 00:44:46


Octopai: Metadata Management for Better Business Intelligence with Amnon Drori - Episode 28 - E28

Summary

The information about how data is acquired and processed is often as important as the data itself. For this reason metadata management systems are built to track the journey of your business data to aid in analysis, presentation, and compliance. These systems are frequently cumbersome and difficult to maintain, so Octopai was…

Summary

The information about how data is acquired and processed is often as important…

23 April 2018 | 00:39:53


Data Engineering Weekly with Joe Crobak - Episode 27 - E27

Summary

The rate of change in the data engineering industry is alternately exciting and exhausting. Joe Crobak found his way into the work of data management by accident as so many of us do. After being engrossed with researching the details of distributed systems and big data management for his work he began sharing his findings with…

Summary

The rate of change in the data engineering industry is alternately exciting and…

15 April 2018 | 00:43:32


Defining DataOps with Chris Bergh - Episode 26 - E26

Summary

Managing an analytics project can be difficult due to the number of systems involved and the need to ensure that new information can be delivered quickly and reliably. That challenge can be met by adopting practices and principles from lean manufacturing and agile software development, and the cross-functional collaboration,…

Summary

Managing an analytics project can be difficult due to the number of systems…

08 April 2018 | 00:54:31


ThreatStack: Data Driven Cloud Security with Pete Cheslock and Patrick Cable - Episode 25 - E25

Summary

Cloud computing and ubiquitous virtualization have changed the ways that our applications are built and deployed. This new environment requires a new way of tracking and addressing the security of our systems. ThreatStack is a platform that collects all of the data that your servers generate and monitors for unexpected anomalies…

Summary

Cloud computing and ubiquitous virtualization have changed the ways that our…

01 April 2018 | 00:51:52


MarketStore: Managing Timeseries Financial Data with Hitoshi Harada and Christopher Ryan - Episode 24 - E24

Summary

The data that is used in financial markets is time oriented and multidimensional, which makes it difficult to manage in either relational or timeseries databases. To make this information more manageable the team at Alapaca built a new data store specifically for retrieving and analyzing data generated by trading markets. In…

Summary

The data that is used in financial markets is time oriented and…

25 March 2018 | 00:33:28