Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

463 Episodes

Keeping Your Data Warehouse In Order With DataForm - E102

Summary

Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Dataform is a platform that helps you apply engineering principles to your data transformations and table definitions, including unit testing SQL scripts, defining repeatable…

Summary

Managing a data warehouse can be…

15 October 2019 | 00:47:04


Fast Analytics On Semi-Structured And Structured Data In The Cloud - E101

Summary

The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of the recent options is Rockset, a serverless platform for fast SQL analytics on semi-structured and structured data. In this episode CEO Venkat…

Summary

The process of exposing your data through…

08 October 2019 | 00:54:39


Ship Faster With An Opinionated Data Pipeline Framework - E100

Summary

Building an end-to-end data pipeline for your machine learning projects is a complex task, made more difficult by the variety of ways that you can structure it. Kedro is a framework that provides an opinionated workflow that lets you focus on the parts that matter, so that you don’t…

Summary

Building an end-to-end data pipeline for…

01 October 2019 | 00:35:09


Open Source Object Storage For All Of Your Data - E99

Summary

Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. S3 from Amazon has quickly become the de-facto API for interacting with…

Summary

Object storage is quickly becoming the…

23 September 2019 | 01:08:20


Navigating Boundless Data Streams With The Swim Kernel - E98

Summary

The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for analysis and interpretation. Unfortunately this strategy is not viable for handling real-time, real-world use cases such as traffic management or supply…

Summary

The conventional approach to analytics…

18 September 2019 | 00:57:56


Building A Reliable And Performant Router For Observability Data - E97

Summary

The first stage in every data project is collecting information and routing it to a storage system for later analysis. For operational data this typically means collecting log messages and system metrics. Often a different tool is used for each class of data, increasing the overall…

Summary

The first stage in every data project is…

10 September 2019 | 00:55:20


Building A Community For Data Professionals at Data Council - E96

Summary

Data professionals are working in a domain that is rapidly evolving. In order to stay current we need access to deeply technical presentations that aren’t burdened by extraneous marketing. To fulfill that need Pete Soderling and his team have been running the Data Council series of…

Summary

Data professionals are working in a…

02 September 2019 | 00:52:46


Building Tools And Platforms For Data Analytics - E95

Summary

Data engineers are responsible for building tools and platforms to power the workflows of other members of the business. Each group of users has their own set of requirements for the way that they access and interact with those platforms depending on the insights they are trying to gather.…

Summary

Data engineers are responsible for…

26 August 2019 | 00:48:07


A High Performance Platform For The Full Big Data Lifecycle - E94

Summary

Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully…

Summary

Managing big data projects at scale is a…

19 August 2019 | 01:13:46


Digging Into Data Replication At Fivetran - E93

Summary

The extract and load pattern of data replication is the most commonly needed process in data engineering workflows. Because of the myriad sources and destinations that are available, it is also among the most difficult tasks that we encounter. Fivetran is a platform that does the hard work…

Summary

The extract and load pattern of data…

12 August 2019 | 00:44:41