Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

478 Episodes

Manage Your Unstructured Data Assets Across Cloud And Hybrid Environments With Komprise - E268

Summary

There are a wealth of options for managing structured and textual data, but unstructured binary data assets are not as well supported across the ecosystem. As organizations start to adopt cloud technologies they need a way to manage the distribution, discovery, and collaboration of data…

Summary

There are a wealth of options for…

28 February 2022 | 00:54:47


Build Your Python Data Processing Your Way And Run It Anywhere With Fugue - E266

Summary

Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning. Along with that growth has come an explosion of tools and engines that help power these workflows, which introduces a great deal of complexity when…

Summary

Python has grown to be one of the top…

21 February 2022 | 01:01:08


Understanding The Immune System With Data At ImmunAI - E265

Summary

The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make it possible to analyze massive amounts of genomic information. In this episode Guy Yachdav, director of software engineering for ImmunAI, shares the…

Summary

The life sciences as an industry has seen…

21 February 2022 | 00:43:07


Bring Your Code To Your Streaming And Static Data Without Effort With The Deephaven Real Time Query Engine - E264

Summary

Streaming data sources are becoming more widely available as tools to handle their storage and distribution mature. However it is still a challenge to analyze this data as it arrives, while supporting integration with static data in a unified syntax. Deephaven is a project that was designed…

Summary

Streaming data sources are becoming more…

14 February 2022 | 01:02:05


Build Your Own End To End Customer Data Platform With Rudderstack - E263

Summary

Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even more complex. To simplify the work of managing the full flow of your customer data and keep you in full control the team at Rudderstack created their…

Summary

Collecting, integrating, and activating…

14 February 2022 | 00:47:35


Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions - E262

Summary

Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth in commerce, communications, and other activities. In order to make geospatial analytics more maintainable and scalable there has been an increase in…

Summary

Along with globalization of our societies…

07 February 2022 | 00:59:54


Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets - E261

Summary

There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, departments, or businesses then it is of utmost importance that you eliminate or obfuscate personal information. In this episode Will Thompson explores…

Summary

There are many dimensions to the work of…

06 February 2022 | 01:00:06


A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know - E260

Summary

The Data Engineering Podcast has been going for five years now and has included conversations and interviews with a huge number of guests, covering a broad range of topics. In addition to that, the host curated the essays contained in the book "97 Things Every Data Engineer Should…

Summary

The Data Engineering Podcast has been…

31 January 2022 | 00:41:36


Effective Pandas Patterns For Data Engineering - E259

Summary

Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. Matt Harrison is a Python expert with a long history of working with data…

Summary

Pandas is a powerful tool for cleaning,…

31 January 2022 | 01:00:22


The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam - E258

Summary

Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to make this a tractable problem it is necessary to define boundaries for communication between concerns, which brings with it the need to establish…

Summary

Data platforms are exemplified by a…

23 January 2022 | 00:56:00