Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Listen in your favorite app:

Pick your app with Episodes.fm

More options

Amazon Music

Show RSS Feed

Click to copy to clipboard

Here are shows you might like

See show recommendations

AI Engineering Podcast
Tobias Macey

The Python Podcast.__init__
Tobias Macey

Take Control Of Your Web Analytics Using Snowplow With Alexander Dean - Episode 48 - E48

Summary

Every business with a website needs some way to keep track of how much traffic they are getting, where it is coming from, and which actions are being taken. The default in most cases is Google Analytics, but this can be limiting when you wish to perform detailed analysis of the captured data. To address this problem, Alex Dean…

Summary

Every business with a website needs some way to keep track of how much traffic…

17 September 2018 | 00:47:49

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47 - E47

Summary

Elasticsearch is a powerful tool for storing and analyzing data, but when using it for logs and other time oriented information it can become problematic to keep all of your history. Chaos Search was started to make it easy for you to keep all of your data and make it usable in S3, so that you can have the best of both worlds.…

Summary

Elasticsearch is a powerful tool for storing and analyzing data, but when using…

10 September 2018 | 00:48:09

An Agile Approach To Master Data Management with Mark Marinelli - Episode 46 - E46

Summary

With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more important to have a canonical view of the entities that you care about. Is customer number 342 in your ERP the same as Bob Smith on Twitter? Using master data management to build a data catalog…

Summary

With the proliferation of data sources to give a more comprehensive view of the…

03 September 2018 | 00:47:16

Protecting Your Data In Use At Enveil with Ellison Anne Williams - Episode 45 - E45

Summary

There are myriad reasons why data should be protected, and just as many ways to enforce it in tranist or at rest. Unfortunately, there is still a weak point where attackers can gain access to your unencrypted information. In this episode Ellison Anny Williams, CEO of Enveil, describes how her company uses homomorphic encryption…

Summary

There are myriad reasons why data should be protected, and just as many ways to…

27 August 2018 | 00:24:42

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44 - E44

Summary

The way that you store your data can have a huge impact on the ways that it can be practically used. For a substantial number of use cases, the optimal format for storing and querying that information is as a graph, however databases architected around that use case have historically been difficult to use at scale or for serving…

Summary

The way that you store your data can have a huge impact on the ways that it can…

20 August 2018 | 00:42:40

Putting Airflow Into Production With James Meickle - Episode 43 - E43

Summary

The theory behind how a tool is supposed to work and the realities of putting it into practice are often at odds with each other. Learning the pitfalls and best practices from someone who has gained that knowledge the hard way can save you from wasted time and frustration. In this episode James Meickle discusses his recent…

Summary

The theory behind how a tool is supposed to work and the realities of putting it…

13 August 2018 | 00:48:06

Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42 - E42

Summary

One of the longest running and most popular open source database projects is PostgreSQL. Because of its extensibility and a community focus on stability it has stayed relevant as the ecosystem of development environments and data requirements have changed and evolved over its lifetime. It is difficult to capture any single facet…

Summary

One of the longest running and most popular open source database projects is…

06 August 2018 | 00:56:22

Mobile Data Collection And Analysis Using Ona And Canopy With Peter Lubell-Doughtie - Episode 41 - E41

Summary

With the attention being paid to the systems that power large volumes of high velocity data it is easy to forget about the value of data collection at human scales. Ona is a company that is building technologies to support mobile data collection, analysis of the aggregated information, and user-friendly presentations. In this…

Summary

With the attention being paid to the systems that power large volumes of high…

30 July 2018 | 00:29:14

Ceph: A Reliable And Scalable Distributed Filesystem with Sage Weil - Episode 40 - E40

Summary

When working with large volumes of data that you need to access in parallel across multiple instances you need a distributed filesystem that will scale with your workload. Even better is when that same system provides multiple paradigms for interacting with the underlying storage. Ceph is a highly available, highly scalable, and…

Summary

When working with large volumes of data that you need to access in parallel…

16 July 2018 | 00:48:31

Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39 - E39

Summary

Data integration and routing is a constantly evolving problem and one that is fraught with edge cases and complicated requirements. The Apache NiFi project models this problem as a collection of data flows that are created through a self-service graphical interface. This framework provides a flexible platform for building a wide…

Summary

Data integration and routing is a constantly evolving problem and one that is…

08 July 2018 | 01:04:16