Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

471 Episodes

Keeping A Bigeye On The Data Quality Market - E160

Summary

One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no surprise. With the growth in projects, platforms, and services that aim to help you establish and maintain control of the health and reliability of…

Summary

One of the oldest aphorisms about data is…

23 November 2020 | 00:49:26


Self Service Data Management From Ingest To Insights With Isima - E159

Summary

The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form of business intelligence dashboards, machine learning models, or APIs on top of a cleaned and curated data set. Despite the rapid progression of…

Summary

The core mission of data engineers is to…

17 November 2020 | 00:44:03


Building A Cost Effective Data Catalog With Tree Schema - E158

Summary

A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external. While there are a number of platforms available for building that catalog, many of them are either difficult to deploy and integrate, or expensive…

Summary

A data catalog is a critical piece of…

10 November 2020 | 00:51:53


Add Version Control To Your Data Lake With LakeFS - E157

Summary

Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional complexities to consider, including how to safely integrate new data sources or test out changes to existing pipelines. In order to address these…

Summary

Data lakes are gaining popularity due to…

03 November 2020 | 00:50:15


Cloud Native Data Security As Code With Cyral - E156

Summary

One of the most challenging aspects of building a data platform has nothing to do with pipelines and transformations. If you are putting your workflows into production, then you need to consider how you are going to implement data security, including access controls and auditing. Different…

Summary

One of the most challenging aspects of…

26 October 2020 | 00:48:33


Better Data Quality Through Observability With Monte Carlo - E155

Summary

In order for analytics and machine learning projects to be useful, they require a high degree of data quality. To ensure that your pipelines are healthy you need a way to make them observable. In this episode Barr Moses and Lior Gavish, co-founders of Monte Carlo, share the leading causes…

Summary

In order for analytics and machine…

19 October 2020 | 00:55:53


Rapid Delivery Of Business Intelligence Using Power BI - E154

Summary

Business intelligence efforts are only as useful as the outcomes that they inform. Power BI aims to reduce the time and effort required to go from information to action by providing an interface that encourages rapid iteration. In this episode Rob Collie shares his enthusiasm for the Power…

Summary

Business intelligence efforts are only as…

12 October 2020 | 01:02:55


Self Service Real Time Data Integration Without The Headaches With Meroxa - E153

Summary

Analytical workloads require a well engineered and well maintained data integration process to ensure that your information is reliable and up to date. Building a real-time pipeline for your data lakes and data warehouses is a non-trivial effort, requiring a substantial investment of time…

Summary

Analytical workloads require a well…

05 October 2020 | 01:00:56


Speed Up And Simplify Your Streaming Data Workloads With Red Panda - E152

Summary

Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To…

Summary

Kafka has become a de facto standard…

29 September 2020 | 00:59:41


Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor - E151

Summary

Data engineering is a constantly growing and evolving discipline. There are always new tools, systems, and design patterns to learn, which leads to a great deal of confusion for newcomers. Daniel Molnar has dedicated his time to helping data professionals get back to basics through…

Summary

Data engineering is a constantly growing…

22 September 2020 | 00:47:40