Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Leveraging Human Intelligence For Better AI At Alegion With Cheryl Martin - Episode 38 - E38

Summary

Data is often messy or incomplete, requiring human intervention to make sense of it before being usable as input to machine learning projects. This is problematic when the volume scales beyond a handful of records. In this episode Dr. Cheryl Martin, Chief Data Scientist for Alegion, discusses the importance of properly labeled…

Summary

Data is often messy or incomplete, requiring human intervention to make sense of…

02 July 2018 | 00:46:14


Package Management And Distribution For Your Data Using Quilt with Kevin Moore - Episode 37 - E37

Summary

Collaboration, distribution, and installation of software projects is largely a solved problem, but the same cannot be said of data. Every data team has a bespoke means of sharing data sets, versioning them, tracking related metadata and changes, and publishing them for use in the software systems that rely on them. The CEO and…

Summary

Collaboration, distribution, and installation of software projects is largely a…

25 June 2018 | 00:41:43


User Analytics In Depth At Heap with Dan Robinson - Episode 36 - E36

Summary

Web and mobile analytics are an important part of any business, and difficult to get right. The most frustrating part is when you realize that you haven’t been tracking a key interaction, having to write custom logic to add that event, and then waiting to collect data. Heap is a platform that automatically tracks every…

Summary

Web and mobile analytics are an important part of any business, and difficult to…

17 June 2018 | 00:45:27


CockroachDB In Depth with Peter Mattis - Episode 35 - E35

Summary

With the increased ease of gaining access to servers in data centers across the world has come the need for supporting globally distributed data storage. With the first wave of cloud era databases the ability to replicate information geographically came at the expense of transactions and familiar query languages. To address…

Summary

With the increased ease of gaining access to servers in data centers across the…

11 June 2018 | 00:43:41


ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34 - E34

Summary

Using a multi-model database in your applications can greatly reduce the amount of infrastructure and complexity required. ArangoDB is a storage engine that supports documents, dey/value, and graph data formats, as well as being fast and scalable. In this episode Jan Steeman and Jan Stücke explain where Arango fits in the…

Summary

Using a multi-model database in your applications can greatly reduce the amount…

04 June 2018 | 00:40:05


The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33 - E33

Summary

Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new requirements are added and greater scalability becomes necessary. Rather than duplicating the efforts of other engineers it might be best to use a hosted service to handle the plumbing so…

Summary

Building an ETL pipeline is a common need across businesses and industries.…

28 May 2018 | 00:47:50


PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32 - E32

Summary

Most businesses end up with data in a myriad of places with varying levels of structure. This makes it difficult to gain insights from across departments, projects, or people. Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse.…

Summary

Most businesses end up with data in a myriad of places with varying levels of…

21 May 2018 | 00:42:08


Brief Conversations From The Open Data Science Conference: Part 2 - Episode 31 - E31

Summary

The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up you’ll hear from Andy Eschbacher of Carto. He dscribes some of the complexities inherent to working with…

Summary

The Open Data Science Conference brings together a variety of data professionals…

14 May 2018 | 00:26:06


Brief Conversations From The Open Data Science Conference: Part 1 - Episode 30 - E30

Summary

The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up you’ll hear from Alan Anders, the CTO of Applecart about their challenges with getting Spark to scale for…

Summary

The Open Data Science Conference brings together a variety of data professionals…

07 May 2018 | 00:32:39


Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29 - E29

Summary

Business Intelligence software is often cumbersome and requires specialized knowledge of the tools and data to be able to ask and answer questions about the state of the organization. Metabase is a tool built with the goal of making the act of discovering information and asking questions of an organizations data easy and…

Summary

Business Intelligence software is often cumbersome and requires specialized…

30 April 2018 | 00:44:46