Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

469 Episodes

Adding Context And Comprehension To Your Analytics Through Data Discovery With SelectStar - E208

Summary

Companies of all sizes and industries are trying to use the data that they and their customers generate to survive and thrive in the modern economy. As a result, they are relying on a constantly growing number of data sources being accessed by an increasingly varied set of users. In order…

Summary

Companies of all sizes and industries are…

31 July 2021 | 00:51:23


Building a Multi-Tenant Managed Platform For Streaming Data With Pulsar at Datastax - E207

Summary

Everyone expects data to be transmitted, processed, and updated instantly as more and more products integrate streaming data. The technology to make that possible has been around for a number of years, but the barriers to adoption have still been high due to the level of technical…

Summary

Everyone expects data to be transmitted,…

28 July 2021 | 01:00:13


Bringing The Metrics Layer To The Masses With Transform - E206

Summary

Collecting and cleaning data is only useful if someone can make sense of it afterward. The latest evolution in the data ecosystem is the introduction of a dedicated metrics layer to help address the challenge of adding context and semantics to raw information. In this episode Nick Handel…

Summary

Collecting and cleaning data is only…

23 July 2021 | 01:01:17


Strategies For Proactive Data Quality Management - E205

Summary

Data quality is a concern that has been gaining attention alongside the rising importance of analytics for business success. Many solutions rely on hand-coded rules for catching known bugs, or statistical analysis of records to detect anomalies retroactively. While those are useful tools,…

Summary

Data quality is a concern that has been…

20 July 2021 | 01:01:07


Low Code And High Quality Data Engineering For The Whole Organization With Prophecy - E204

Summary

There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is still lacking. This is particularly important in large and complex organizations where domain knowledge and context is paramount and there may not be…

Summary

There is a wealth of tools and systems…

16 July 2021 | 01:12:35


Exploring The Design And Benefits Of The Modern Data Stack - E203

Summary

We have been building platforms and workflows to store, process, and analyze data since the earliest days of computing. Over that time there have been countless architectures, patterns, and "best practices" to make that task manageable. With the growing popularity of cloud…

Summary

We have been building platforms and…

13 July 2021 | 00:49:02


Democratize Data Cleaning Across Your Organization With Trifacta - E202

Summary

Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning,…

Summary

Every data project, whether it’s…

09 July 2021 | 01:07:13


Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager - E201

Summary

At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a…

Summary

At the core of every data pipeline is an…

05 July 2021 | 00:55:31


Leveling Up Open Source Data Integration With Meltano Hub And The Singer SDK - E200

Summary

Data integration in the form of extract and load is the critical first step of every data project. There are a large number of commercial and open source projects that offer that capability but it is still far from being a solved problem. One of the most promising community efforts is that…

Summary

Data integration in the form of extract…

03 July 2021 | 01:05:24


A Candid Exploration Of Timeseries Data Analysis With InfluxDB - E199

Summary

While the overall concept of timeseries data is uniform, its usage and applications are far from it. One of the most demanding applications of timeseries data is for application and server monitoring due to the problem of high cardinality. In his quest to build a generalized platform for…

Summary

While the overall concept of timeseries…

29 June 2021 | 01:06:03