Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

463 Episodes

Easier Stream Processing On Kafka With ksqlDB - E122

Summary

Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems that were engineered in isolation. The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka…

Summary

Building applications on top of unbounded…

02 March 2020 | 00:43:36


Shining A Light on Shadow IT In Data And Analytics - E121

Summary

Misaligned priorities across business units can lead to tensions that drive members of the organization to build data and analytics projects without the guidance or support of engineering or IT staff. The availability of cloud platforms and managed services makes this a viable option, but…

Summary

Misaligned priorities across business…

25 February 2020 | 00:46:09


Data Infrastructure Automation For Private SaaS At Snowplow - E120

Summary

One of the biggest challenges in building reliable platforms for processing event pipelines is managing the underlying infrastructure. At Snowplow Analytics the complexity is compounded by the need to manage multiple instances of their platform across customer environments. In this episode…

Summary

One of the biggest challenges in building…

18 February 2020 | 00:49:01


Data Modeling That Evolves With Your Business Using Data Vault - E119

Summary

Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and types of information that they need to integrate, they need a data modeling strategy that provides them with flexibility and speed. Data Vault is an…

Summary

Designing the structure for your data…

09 February 2020 | 01:06:22


The Benefits And Challenges Of Building A Data Trust - E118

Summary

Every business collects data in some fashion, but sometimes the true value of the collected information only comes when it is combined with other data sources. Data trusts are a legal framework for allowing businesses to collaboratively pool their data. This allows the members of the trust…

Summary

Every business collects data in some…

03 February 2020 | 00:56:53


Pay Down Technical Debt In Your Data Pipeline With Great Expectations - E117

Summary

Data pipelines are complicated and business critical pieces of technical infrastructure. Unfortunately they are also complex and difficult to test, leading to a significant amount of technical debt which contributes to slower iteration cycles. In this episode James Campbell describes how he…

Summary

Data pipelines are complicated and…

27 January 2020 | 00:46:31


Replatforming Production Dataflows - E116

Summary

Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpected events that require a change in your platform architecture. In this episode the head of data for Mayvenn shares their experience migrating an…

Summary

Building a reliable data platform is a…

20 January 2020 | 00:39:00


Planet Scale SQL For The New Generation Of Applications With YugabyteDB - E115

Summary The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to…

Summary The modern era of software development is identified by ubiquitous access to elastic…

13 January 2020 | 01:01:17


Change Data Capture For All Of Your Databases With Debezium - E114

Summary

Databases are useful for inspecting the current state of your application, but inspecting the history of that data can get messy without a way to track changes as they happen. Debezium is an open source platform for reliable change data capture that you can use to build supplemental systems…

Summary

Databases are useful for inspecting the…

06 January 2020 | 00:53:01


Building The DataDog Platform For Processing Timeseries Data At Massive Scale - E113

Summary

DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to support their customers, they need to capture, process, and analyze massive amounts of timeseries data with a high degree of uptime and reliability. Vadim…

Summary

DataDog is one of the most successful…

30 December 2019 | 00:45:55