Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

459 Episodes

Building A Real Time Event Data Warehouse For Sentry - E108

Summary

The team at Sentry has built a platform for anyone in the world to send software errors and events. As they scaled the volume of customers and data they began running into the limitations of their initial architecture. To address the needs of their business and continue to improve their…

Summary

The team at Sentry has built a platform…

26 November 2019 | 01:01:15


Escaping Analysis Paralysis For Your Data Platform With Data Virtualization - E107

Summary

With the constant evolution of technology for data management it can seem impossible to make an informed decision about whether to build a data warehouse, or a data lake, or just leave your data wherever it currently rests. What’s worse is that any time you have to migrate to a new…

Summary

With the constant evolution of technology…

18 November 2019 | 00:55:42


Designing For Data Protection - E106

Summary

The practice of data management is one that requires technical acumen, but there are also many policy and regulatory issues that inform and influence the design of our systems. With the introduction of legal frameworks such as the EU GDPR and California’s CCPA it is necessary to…

Summary

The practice of data management is one…

11 November 2019 | 00:51:24


Automating Your Production Dataflows On Spark - E105

Summary

As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break or degrade that have nothing to do with the business logic or data transformations that we write and maintain. Sean Knapp founded Ascend to address the…

Summary

As data engineers the health of our…

04 November 2019 | 00:48:51


Build Maintainable And Testable Data Applications With Dagster - E104

Summary

Despite the fact that businesses have relied on useful and accurate data to succeed for decades now, the state of the art for obtaining and maintaining that information still leaves much to be desired. In an effort to create a better abstraction for building data applications Nick Schrock…

Summary

Despite the fact that businesses have…

28 October 2019 | 01:07:49


Data Orchestration For Hybrid Cloud Analytics - E103

Summary

The scale and complexity of the systems that we build to satisfy business requirements is increasing as the available tools become more sophisticated. In order to bridge the gap between legacy infrastructure and evolving use cases it is necessary to create a unifying set of components. In…

Summary

The scale and complexity of the systems…

22 October 2019 | 00:42:51


Keeping Your Data Warehouse In Order With DataForm - E102

Summary

Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Dataform is a platform that helps you apply engineering principles to your data transformations and table definitions, including unit testing SQL scripts, defining repeatable…

Summary

Managing a data warehouse can be…

15 October 2019 | 00:47:04


Fast Analytics On Semi-Structured And Structured Data In The Cloud - E101

Summary

The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of the recent options is Rockset, a serverless platform for fast SQL analytics on semi-structured and structured data. In this episode CEO Venkat…

Summary

The process of exposing your data through…

08 October 2019 | 00:54:39


Ship Faster With An Opinionated Data Pipeline Framework - E100

Summary

Building an end-to-end data pipeline for your machine learning projects is a complex task, made more difficult by the variety of ways that you can structure it. Kedro is a framework that provides an opinionated workflow that lets you focus on the parts that matter, so that you don’t…

Summary

Building an end-to-end data pipeline for…

01 October 2019 | 00:35:09


Open Source Object Storage For All Of Your Data - E99

Summary

Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. S3 from Amazon has quickly become the de-facto API for interacting with…

Summary

Object storage is quickly becoming the…

23 September 2019 | 01:08:20