Data Engineering Podcast
Episode Archive
Episode Archive
419 episodes of Data Engineering Podcast since the first episode, which aired on January 7th, 2017.
-
Navigating Boundless Data Streams With The Swim Kernel
September 18th, 2019 | 57 mins 55 secs
An interview about using stateful computation on data streams with the SwimOS kernel to improve your analytics
-
Building A Reliable And Performant Router For Observability Data
September 9th, 2019 | 55 mins 19 secs
An interview about building the Vector project to unify delivery of logs and metrics for better system observability
-
Building A Community For Data Professionals at Data Council
September 2nd, 2019 | 52 mins 46 secs
An interview with Pete Soderling about building and growing the Data Council events and helping engineers build businesses
-
Building Tools And Platforms For Data Analytics
August 26th, 2019 | 48 mins 6 secs
An interview on what data engineers need to know about building tools and platforms for data analytics
-
A High Performance Platform For The Full Big Data Lifecycle
August 19th, 2019 | 1 hr 13 mins
An interview about the HPCC Systems platform, its journey to open source, and how it handle the full lifecycle of big data for enterprise scale analytics
-
Digging Into Data Replication At Fivetran
August 12th, 2019 | 44 mins 40 secs
An interview about how the Fivetran platform is designed to handle data replication as a service
-
Solving Data Discovery At Lyft
August 5th, 2019 | 51 mins 48 secs
An interview about the open source Amundsen platform for data discovery and how Lyft is using it to improve their analytics workflow
-
Simplifying Data Integration Through Eventual Connectivity
July 28th, 2019 | 53 mins 47 secs
An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
-
Straining Your Data Lake Through A Data Mesh
July 22nd, 2019 | 1 hr 4 mins
An interview about how the data mesh architectural and organizational pattern can lead to a more maintainable data platform
-
Data Labeling That You Can Feel Good About With CloudFactory
July 14th, 2019 | 57 mins 50 secs
An interview about the Cloud Factory platform for data labeling and social good in developing nations
-
Scale Your Analytics On The Clickhouse Data Warehouse
July 8th, 2019 | 1 hr 11 mins
An interview about Clickhouse, an open source, columnar data warehouse built for massive scale and speed to enable interactive analytics
-
Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection
July 1st, 2019 | 38 mins 2 secs
An interview about testing the limits of scaling Kafka and Cassandra for real-time anomaly detection at Instaclustr
-
The Workflow Engine For Data Engineers And Data Scientists
June 24th, 2019 | 1 hr 8 mins
An interview about how the Prefect workflow engine unifies the needs of data engineers and data scientists with a pure Python API
-
Maintaining Your Data Lake At Scale With Spark
June 16th, 2019 | 50 mins 50 secs
A conversation with the architect of Delta Lake on the challenges of building a sustainable data lake at scale
-
Managing The Machine Learning Lifecycle
June 9th, 2019 | 1 hr 2 mins
An interview about how the open source Hydrosphere platform simplifies management of the full machine learning lifecycle
-
Evolving An ETL Pipeline For Better Productivity
June 4th, 2019 | 1 hr 2 mins
An interview about how and why Greenhouse migrated their homegrown ETL pipeline onto DataCoral