Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

459 Episodes

Making Data Collection In Your Code Easy With Rookout - E128

Summary

The software applications that we build for our businesses are a rich source of data, but accessing and extracting that data is often a slow and error-prone process. Rookout has built a platform to separate the data collection process from the lifecycle of your code. In this episode, CTO…

Summary

The software applications that we build…

14 April 2020 | 00:26:00


Building A Knowledge Graph Of Commercial Real Estate At Cherre - E127

Summary

Knowledge graphs are a data resource that can answer questions beyond the scope of traditional data analytics. By organizing and storing data to emphasize the relationship between entities, we can discover the complex connections between multiple sources of information. In this episode John…

Summary

Knowledge graphs are a data resource that…

07 April 2020 | 00:45:20


The Life Of A Non-Profit Data Professional - E126

Summary

Building and maintaining a system that integrates and analyzes all of the data for your organization is a complex endeavor. Operating on a shoe-string budget makes it even more challenging. In this episode Tyler Colby shares his experiences working as a data professional in the non-profit…

Summary

Building and maintaining a system that…

30 March 2020 | 00:44:36


Behind The Scenes Of The Linode Object Storage Service - E125

Summary

There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes of the companies that run these systems at scale so you don’t have to? In this episode Will Smith shares the journey that he and his team at…

Summary

There are a number of platforms available…

23 March 2020 | 00:35:53


Building A New Foundation For CouchDB - E124

Summary

CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being…

Summary

CouchDB is a distributed document…

17 March 2020 | 00:55:25


Scaling Data Governance For Global Businesses With A Data Hub Architecture - E123

Summary

Data governance is a complex endeavor, but scaling it to meet the needs of a complex or globally distributed organization requires a well considered and coherent strategy. In this episode Tim Ward describes an architecture that he has used successfully with multiple organizations to scale…

Summary

Data governance is a complex endeavor,…

09 March 2020 | 00:54:08


Easier Stream Processing On Kafka With ksqlDB - E122

Summary

Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems that were engineered in isolation. The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka…

Summary

Building applications on top of unbounded…

02 March 2020 | 00:43:36


Shining A Light on Shadow IT In Data And Analytics - E121

Summary

Misaligned priorities across business units can lead to tensions that drive members of the organization to build data and analytics projects without the guidance or support of engineering or IT staff. The availability of cloud platforms and managed services makes this a viable option, but…

Summary

Misaligned priorities across business…

25 February 2020 | 00:46:09


Data Infrastructure Automation For Private SaaS At Snowplow - E120

Summary

One of the biggest challenges in building reliable platforms for processing event pipelines is managing the underlying infrastructure. At Snowplow Analytics the complexity is compounded by the need to manage multiple instances of their platform across customer environments. In this episode…

Summary

One of the biggest challenges in building…

18 February 2020 | 00:49:01


Data Modeling That Evolves With Your Business Using Data Vault - E119

Summary

Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and types of information that they need to integrate, they need a data modeling strategy that provides them with flexibility and speed. Data Vault is an…

Summary

Designing the structure for your data…

09 February 2020 | 01:06:22