This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Support the show!Listen in your favorite app:
FountainHere are shows you might like
There are a wealth of options for managing structured and textual data, but unstructured binary data assets are not as well supported across the ecosystem. As organizations start to adopt cloud technologies they need a way to manage the distribution, discovery, and collaboration of data…
There are a wealth of options for…
28 February 2022 | 00:54:47
Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning. Along with that growth has come an explosion of tools and engines that help power these workflows, which introduces a great deal of complexity when…
Python has grown to be one of the top…
21 February 2022 | 01:01:08
The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make it possible to analyze massive amounts of genomic information. In this episode Guy Yachdav, director of software engineering for ImmunAI, shares the…
The life sciences as an industry has seen…
21 February 2022 | 00:43:07
Streaming data sources are becoming more widely available as tools to handle their storage and distribution mature. However it is still a challenge to analyze this data as it arrives, while supporting integration with static data in a unified syntax. Deephaven is a project that was designed…
Streaming data sources are becoming more…
14 February 2022 | 01:02:05
Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even more complex. To simplify the work of managing the full flow of your customer data and keep you in full control the team at Rudderstack created their…
Collecting, integrating, and activating…
14 February 2022 | 00:47:35
Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth in commerce, communications, and other activities. In order to make geospatial analytics more maintainable and scalable there has been an increase in…
Along with globalization of our societies…
07 February 2022 | 00:59:54
There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, departments, or businesses then it is of utmost importance that you eliminate or obfuscate personal information. In this episode Will Thompson explores…
There are many dimensions to the work of…
06 February 2022 | 01:00:06
The Data Engineering Podcast has been going for five years now and has included conversations and interviews with a huge number of guests, covering a broad range of topics. In addition to that, the host curated the essays contained in the book "97 Things Every Data Engineer Should…
The Data Engineering Podcast has been…
31 January 2022 | 00:41:36
Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. Matt Harrison is a Python expert with a long history of working with data…
Pandas is a powerful tool for cleaning,…
31 January 2022 | 01:00:22
Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to make this a tractable problem it is necessary to define boundaries for communication between concerns, which brings with it the need to establish…
Data platforms are exemplified by a…
23 January 2022 | 00:56:00