Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

459 Episodes

Make Your Business Metrics Reusable With Open Source Headless BI Using Metriql - E228

Summary

The key to making data valuable to business users is the ability to calculate meaningful metrics and explore them along useful dimensions. Business intelligence tools have provided this capability for years, but they don’t offer a means of exposing those metrics to other systems.…

Summary

The key to making data valuable to…

08 October 2021 | 00:43:37


Adding Support For Distributed Transactions To The Redpanda Streaming Engine - E227

Summary

Transactions are a necessary feature for ensuring that a set of actions are all performed as a single unit of work. In streaming systems this is necessary to ensure that a set of messages or transformations are all executed together across different queues. In this episode Denis Rystsov…

Summary

Transactions are a necessary feature for…

06 October 2021 | 00:45:59


Building Real-Time Data Platforms For Large Volumes Of Information With Aerospike - E226

Summary

Aerospike is a database engine that is designed to provide millisecond response times for queries across terabytes or petabytes. In this episode Chief Strategy Officer, Lenley Hensarling, explains how the ability to process these large volumes of information in real-time allows businesses…

Summary

Aerospike is a database engine that is…

02 October 2021 | 01:07:38


Delivering Your Personal Data Cloud With Prifina - E225

Summary

The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you…

Summary

The promise of online services is that…

30 September 2021 | 01:12:11


Digging Into Data Reliability Engineering - E224

Summary

The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of site reliability engineering as a means of ensuring consistent uptime of web services, there has been a new trend of building data reliability…

Summary

The accuracy and availability of data has…

26 September 2021 | 00:58:07


Massively Parallel Data Processing In Python Without The Effort Using Bodo - E223

Summary

Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with…

Summary

Python has beome the de facto language…

25 September 2021 | 01:04:17


Declarative Machine Learning Without The Operational Overhead Using Continual - E222

Summary

Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating the model itself, and it’s not surprising that a majority of companies that could greatly benefit from machine learning have yet to either put…

Summary

Building, scaling, and maintaining the…

19 September 2021 | 01:11:52


An Exploration Of The Data Engineering Requirements For Bioinformatics - E221

Summary

Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges…

Summary

Biology has been gaining a lot of…

19 September 2021 | 00:55:10


Setting The Stage For The Next Chapter Of The Cassandra Database - E220

Summary

The Cassandra database is one of the first open source options for globally scalable storage systems. Since its introduction in 2008 it has been powering systems at every scale. The community recently released a new major version that marks a milestone in its maturity and stability as a…

Summary

The Cassandra database is one of the…

12 September 2021 | 00:59:29


A View From The Round Table Of Gartner's Cool Vendors - E219

Summary

Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For businesses that are working in the data management and analytics space they recognized the efforts of Timbr.ai, Soda Data, Nexla, and Tada. In this…

Summary

Gartner analysts are tasked with…

09 September 2021 | 01:04:16