Latest Episodes

Building a Multi-Tenant Managed Platform For Streaming Data With Pulsar at Datastax - Episode 207

Everyone expects data to be transmitted, processed, and updated instantly as more and more products integrate streaming data. The technology to make that possible has been around for a number of years, but the barriers to adoption have still been high due to the level of technical understanding and operational capacity that have been required to run at scale. Datastax has recently introduced a new managed offering for Pulsar workloads in the form of Astra Streaming that lowers those barriers and make stremaing workloads accessible to a wider audience....

Play Episode

Bringing The Metrics Layer To The Masses With Transform - Episode 206

Collecting and cleaning data is only useful if someone can make sense of it afterward. The latest evolution in the data ecosystem is the introduction of a dedicated metrics layer to help address the challenge of adding context and semantics to raw information. In this episode Nick Handel shares the story behind Transform, a new platform that provides a managed metrics layer for your data platform. He explains the challenges that occur when metrics are maintained across a variety of systems, the benefits of unifying them in a common...

Play Episode

Strategies For Proactive Data Quality Management - Episode 205

Data quality is a concern that has been gaining attention alongside the rising importance of analytics for business success. Many solutions rely on hand-coded rules for catching known bugs, or statistical analysis of records to detect anomalies retroactively. While those are useful tools, it is far better to prevent data errors before they become an outsized issue. In this episode Gleb Mezhanskiy shares some strategies for adding quality checks at every stage of your development and deployment workflow to identify and fix problematic changes to your data before they...

Play Episode

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy - Episode 204

There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is still lacking. This is particularly important in large and complex organizations where domain knowledge and context is paramount and there may not be access to engineers for codifying that expertise. Raj Bains founded Prophecy to address this need by creating a UI first platform for building and executing data engineering workflows that orchestrates Airflow and Spark. Rather than locking your business logic into a proprietary storage...

Play Episode

Exploring The Design And Benefits Of The Modern Data Stack - Episode 203

We have been building platforms and workflows to store, process, and analyze data since the earliest days of computing. Over that time there have been countless architectures, patterns, and "best practices" to make that task manageable. With the growing popularity of cloud services a new pattern has emerged and been dubbed the "Modern Data Stack". In this episode members of the GoDataDriven team, Guillermo Sanchez, Bram Ochsendorf, and Juan Perafan, explain the combinations of services that comprise this architecture, share their experiences working with clients to employ the stack,...

Play Episode

Democratize Data Cleaning Across Your Organization With Trifacta - Episode 202

Every data project, whether it's analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning, and preparing your information more approachable for everyone in the business. In this episode CEO Adam Wilson shares the story behind the business, discusses the myriad ways that data wrangling is performed across the business, and how the platform is architected to adapt to...

Play Episode

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager - Episode 201

At the core of every data workflow is an orchestration engine (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team's energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a managed service that lets you connect all of your systems, across clouds and physical infrastructure, and spanning all of your programming languages. In this episode Bart and Rich Wood explain how SaaSGlue is architected to allow for a high degree of...

Play Episode

Leveling Up Open Source Data Integration With Meltano Hub And The Singer SDK - Episode 200

Data integration in the form of extract and load is the critical first step of every data project. There are a large number of commercial and open source projects that offer that capability but it is still far from being a solved problem. One of the most promising community efforts is that of the Singer ecosystem, but it has been plagued by inconsistent quality and design of plugins. In this episode the members of the Meltano project share the work they are doing to improve the discovery, quality, and...

Play Episode

A Candid Exploration Of Timeseries Data Analysis With InfluxDB - Episode 199

While the overall concept of timeseries data is uniform, its usage and applications are far from it. One of the most demanding applications of timeseries data is for application and server monitoring due to the problem of high cardinality. In his quest to build a generalized platform for managing timeseries Paul Dix keeps getting pulled back into the monitoring arena. In this episode he shares the history of the InfluxDB project, the business that he has helped to build around it, and the architectural aspects of the engine that...

Play Episode

Lessons Learned From The Pipeline Data Engineering Academy - Episode 198

Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that, Daniel Molnar and Peter Fabian started the Pipeline Academy to do exactly that. In this episode they reflect on the lessons that they learned while teaching the first cohort of their bootcamp how to be effective data engineers. By focusing on the fundamentals, and making everyone write code, they were able to build confidence and impart the importance of context for their students.

Play Episode

Join The Mailing List