Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Listen in your favorite app:

Podlink

More options

Amazon Music

Show RSS Feed

Click to copy to clipboard

Here are shows you might like

See show recommendations

AI Engineering Podcast
Tobias Macey

The Python Podcast.__init__
Tobias Macey

Of Checklists, Ethics, and Data with Emily Miller and Peter Bull (Cross Post from Podcast.__init__) - Episode 53 - E53

Summary

As data science becomes more widespread and has a bigger impact on the lives of people, it is important that those projects and products are built with a conscious consideration of ethics. Keeping ethical principles in mind throughout the lifecycle of a data project helps to reduce the overall effort of preventing negative…

Summary

As data science becomes more widespread and has a bigger impact on the lives of…

22 October 2018 | 00:45:32

Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52 - E52

Summary

With the growth of the Hadoop ecosystem came a proliferation of implementations for the Hive table format. Unfortunately, with no formal specification, each project works slightly different which increases the difficulty of integration across systems. The Hive format is also built with the assumptions of a local filesystem which…

Summary

With the growth of the Hadoop ecosystem came a proliferation of implementations…

15 October 2018 | 00:53:46

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - E51

Summary One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same…

Summary One of the most complex aspects of managing data for analytical workloads is moving it from…

09 October 2018 | 00:56:55

Building A Knowledge Graph From Public Data At Enigma With Chris Groskopf - Episode 50 - E50

Summary

There are countless sources of data that are publicly available for use. Unfortunately, combining those sources and making them useful in aggregate is a time consuming and challenging process. The team at Enigma builds a knowledge graph for use in your own data projects. In this episode Chris Groskopf explains the platform they…

Summary

There are countless sources of data that are publicly available for use.…

01 October 2018 | 00:52:53

A Primer On Enterprise Data Curation with Todd Walter - Episode 49 - E49

Summary

As your data needs scale across an organization the need for a carefully considered approach to collection, storage, organization, and access becomes increasingly critical. In this episode Todd Walter shares his considerable experience in data curation to clarify the many aspects that are necessary for a successful platform…

Summary

As your data needs scale across an organization the need for a carefully…

24 September 2018 | 00:49:35

Take Control Of Your Web Analytics Using Snowplow With Alexander Dean - Episode 48 - E48

Summary

Every business with a website needs some way to keep track of how much traffic they are getting, where it is coming from, and which actions are being taken. The default in most cases is Google Analytics, but this can be limiting when you wish to perform detailed analysis of the captured data. To address this problem, Alex Dean…

Summary

Every business with a website needs some way to keep track of how much traffic…

17 September 2018 | 00:47:49

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47 - E47

Summary

Elasticsearch is a powerful tool for storing and analyzing data, but when using it for logs and other time oriented information it can become problematic to keep all of your history. Chaos Search was started to make it easy for you to keep all of your data and make it usable in S3, so that you can have the best of both worlds.…

Summary

Elasticsearch is a powerful tool for storing and analyzing data, but when using…

10 September 2018 | 00:48:09

An Agile Approach To Master Data Management with Mark Marinelli - Episode 46 - E46

Summary

With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more important to have a canonical view of the entities that you care about. Is customer number 342 in your ERP the same as Bob Smith on Twitter? Using master data management to build a data catalog…

Summary

With the proliferation of data sources to give a more comprehensive view of the…

03 September 2018 | 00:47:16

Protecting Your Data In Use At Enveil with Ellison Anne Williams - Episode 45 - E45

Summary

There are myriad reasons why data should be protected, and just as many ways to enforce it in tranist or at rest. Unfortunately, there is still a weak point where attackers can gain access to your unencrypted information. In this episode Ellison Anny Williams, CEO of Enveil, describes how her company uses homomorphic encryption…

Summary

There are myriad reasons why data should be protected, and just as many ways to…

27 August 2018 | 00:24:42

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44 - E44

Summary

The way that you store your data can have a huge impact on the ways that it can be practically used. For a substantial number of use cases, the optimal format for storing and querying that information is as a graph, however databases architected around that use case have historically been difficult to use at scale or for serving…

Summary

The way that you store your data can have a huge impact on the ways that it can…

20 August 2018 | 00:42:40