Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

459 Episodes

Designing And Building Data Platforms As A Product - E218

Summary

The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your organization? In this episode Lior Gavish, Lior Solomon, and Atul Gupte share their view of what it means to have a data platform, discuss their…

Summary

The term "data platform" gets…

04 September 2021 | 01:00:00


Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana - E217

Summary

The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the community has focused their efforts on making it the fastest possible option for running your analytics in the cloud. In this episode Dipti Borkar…

Summary

The Presto project has become the de…

02 September 2021 | 01:00:31


Do Away With Data Integration Through A Dataware Architecture With Cinchy - E216

Summary

The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software be the owner of the data that it generates, we have to go through the trouble of extracting the information to then be used elsewhere. The team at…

Summary

The reason that so much time and energy…

28 August 2021 | 00:51:27


Decoupling Data Operations From Data Infrastructure Using Nexla - E215

Summary

The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of this stage in our collective journey the focus has been shifting toward operation and automation of the infrastructure and workflows that power our…

Summary

The technological and social ecosystem of…

25 August 2021 | 00:57:48


Let Your Analysts Build A Data Lakehouse With Cuelake - E214

Summary

Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. In this episode Vikrant Dubey discusses his work on…

Summary

Data lakes have been gaining popularity…

21 August 2021 | 00:27:38


Migrate And Modify Your Data Platform Confidently With Compilerworks - E213

Summary

A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the vendor fails? What if the technology can’t do what I need it to? Compilerworks set out to reduce the pain and complexity of migrating between…

Summary

A major concern that comes up when…

18 August 2021 | 01:06:09


Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop - E212

Summary

The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do when you need to manage unstructured information, or build a computer vision model? Activeloop was created for exactly that purpose. In this episode…

Summary

The vast majority of data tools and…

15 August 2021 | 00:48:39


Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma - E211

Summary

All of the fancy data platform tools and shiny dashboards that you use are pointless if the consumers of your analysis don’t have trust in the answers. Stemma helps you establish and maintain that trust by giving visibility into who is using what data, annotating the reports with…

Summary

All of the fancy data platform tools and…

10 August 2021 | 00:52:37


Data Discovery From Dashboards To Databases With Castor - E210

Summary

Every organization needs to be able to use data to answer questions about their business. The trouble is that the data is usually spread across a wide and shifting array of systems, from databases to dashboards. The other challenge is that even if you do find the information you…

Summary

Every organization needs to be able to…

07 August 2021 | 00:52:47


Charting A Path For Streaming Data To Fill Your Data Lake With Hudi - E209

Summary

Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large,…

Summary

Data lake architectures have largely been…

03 August 2021 | 01:09:37