Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

444 Episodes

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems - E353

Summary Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution…

Summary Encryption and security are critical elements in data analytics and machine learning…

26 December 2022 | 01:08:25


Making Sense Of The Technical And Organizational Considerations Of Data Contracts - E352

Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction in the process of collecting, processing, and using data. In order to reduce the potential for broken pipelines some teams have started to adopt the idea of data contracts. In this episode Abe Gong brings…

Summary One of the reasons that data work is so challenging is because no single person or team owns…

19 December 2022 | 00:47:01


Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle - E351

Summary The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that…

Summary The data ecosystem has seen a constant flurry of activity for the past several years, and it…

19 December 2022 | 01:05:29


Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee - E350

Preamble

This is a cross-over episode from our new show

Preamble

This is a

12 December 2022 | 00:53:46


Run Your Applications Worldwide Without Worrying About The Database With Planetscale - E349

Summary

One of the most critical aspects of software projects is managing its data. Managing the operational concerns for your database can be complex and expensive, especially if you need to scale to large volumes of data, high traffic, or geographically distributed usage. Planetscale is a…

Summary

One of the most critical aspects of…

12 December 2022 | 00:49:41


Business Intelligence In The Palm Of Your Hand With Zing Data - E348

Summary

Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is through a web or desktop application running on a powerful laptop. Zing Data is building a mobile native platform for business intelligence. This opens…

Summary

Business intelligence is the foremost…

05 December 2022 | 00:46:47


Adopting Real-Time Data At Organizations Of Every Size - E347

Summary

The term "real-time data" brings with it a combination of excitement, uncertainty, and skepticism. The promise of insights that are always accurate and up to date is appealing to organizations, but the technical realities to make it possible have been complex and expensive. In…

Summary

The term "real-time data"…

05 December 2022 | 00:50:25


Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data - E346

Summary

The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. The Arrow project is designed to…

Summary

The data ecosystem has been growing…

28 November 2022 | 00:50:25


Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase - E345

Summary

The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information. FeatureBase (formerly Pilosa) avoids that overhead by converting the data into bitmaps. In this episode Matt Jaffee explains how to model your data…

Summary

The most expensive part of working with…

28 November 2022 | 00:59:25


A Look At The Data Systems Behind The Gameplay For League Of Legends - E344

Summary

The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. In this episode Ian Schweer shares his experiences at Riot Games supporting…

Summary

The majority of blog posts and…

21 November 2022 | 01:01:29