Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

486 Episodes

When And How To Conduct An AI Program - E415

Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about…

Summary Artificial intelligence technologies promise to revolutionize business and produce new…

03 March 2024 | 00:46:25


Find Out About The Technology Behind The Latest PFAD In Analytical Database Development - E414

Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow…

Summary Building a database engine requires a substantial amount of engineering effort and time…

25 February 2024 | 00:56:01


Using Trino And Iceberg As The Foundation Of Your Data Lakehouse - E413

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the…

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable…

18 February 2024 | 00:58:46


Data Sharing Across Business And Platform Boundaries - E412

Summary Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains…

Summary Sharing data is a simple concept, but complicated to implement well. There are numerous…

11 February 2024 | 00:59:56


Tackling Real Time Streaming Data With SQL Using RisingWave - E411

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on…

Summary Stream processing systems have long been built with a code-first design, adding SQL as a…

04 February 2024 | 00:56:55


Build A Data Lake For Your Security Logs With Scanner - E410

Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying…

Summary Monitoring and auditing IT systems for security events requires the ability to quickly…

29 January 2024 | 01:02:38


Modern Customer Data Platform Principles - E409

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how…

Summary Databases and analytics architectures have gone through several generational shifts. A…

22 January 2024 | 01:01:33


Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel - E408

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user…

Summary Data processing technologies have dramatically improved in their sophistication and raw…

07 January 2024 | 00:50:26


Designing Data Platforms For Fintech Companies - E407

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data…

Summary Working with financial data requires a high degree of rigor due to the numerous regulations…

01 January 2024 | 00:47:57


Troubleshooting Kafka In Production - E406

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production". In this episode he…

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events…

24 December 2023 | 01:14:44