Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Unpacking Fauna: A Global Scale Cloud Native Database - E78

Summary

One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. FaunaDB is a cloud native database built by the engineers behind Twitter’s infrastructure and designed to serve the needs of modern systems. Evan Weaver is…

Summary

One of the biggest challenges for any…

22 April 2019 | 00:53:51


Index Your Big Data With Pilosa For Faster Analytics - E77

Summary

Database indexes are critical to ensure fast lookups of your data, but they are inherently tied to the database engine. Pilosa is rewriting that equation by providing a flexible, scalable, performant engine for building an index of your data to enable high-speed aggregate analysis. In this…

Summary

Database indexes are critical to ensure…

15 April 2019 | 00:43:42


Serverless Data Pipelines On DataCoral - E76

Summary

How much time do you spend maintaining your data pipeline? How much end user value does that provide? Raghu Murthy founded DataCoral as a way to abstract the low level details of ETL so that you can focus on the actual problem that you are trying to solve. In this episode he explains his…

Summary

How much time do you spend maintaining…

08 April 2019 | 00:53:42


Why Analytics Projects Fail And What To Do About It - E75

Summary

Analytics projects fail all the time, resulting in lost opportunities and wasted resources. There are a number of factors that contribute to that failure and not all of them are under our control. However, many of them are and as data engineers we can help to keep our projects on the path…

Summary

Analytics projects fail all the time,…

01 April 2019 | 00:36:30


Building An Enterprise Data Fabric At CluedIn - E74

Summary

Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur naturally across business units. The CluedIn team experienced this issue…

Summary

Data integration is one of the most…

25 March 2019 | 00:57:50


A DataOps vs DevOps Cookoff In The Data Kitchen - E73

Summary

Delivering a data analytics project on time and with accurate information is critical to the success of any business. DataOps is a set of practices to increase the probability of success by creating value early and often, and using feedback loops to keep your project on course. In this…

Summary

Delivering a data analytics project on…

18 March 2019 | 00:54:31


Customer Analytics At Scale With Segment - E72

Summary

Customer analytics is a problem domain that has given rise to its own industry. In order to gain a full understanding of what your users are doing and how best to serve them you may need to send data to multiple services, each with their own tracking code or APIs. To simplify this process…

Summary

Customer analytics is a problem domain…

04 March 2019 | 00:47:47


Deep Learning For Data Engineers - E71

Summary

Deep learning is the latest class of technology that is gaining widespread interest. As data engineers we are responsible for building and managing the platforms that power these models. To help us understand what is involved, we are joined this week by Thomas Henson. In this episode he…

Summary

Deep learning is the latest class of…

25 February 2019 | 00:42:46


Speed Up Your Analytics With The Alluxio Distributed Storage System - E70

Summary

Distributed storage systems are the foundational layer of any big data stack. There are a variety of implementations which support different specialized use cases and come with associated tradeoffs. Alluxio is a distributed virtual filesystem which integrates with multiple persistent…

Summary

Distributed storage systems are the…

19 February 2019 | 00:59:44


Machine Learning In The Enterprise - E69

Summary

Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first…

Summary

Machine learning is a class of…

11 February 2019 | 00:48:19