Data pipelines are complicated and business critical pieces of technical infrastructure. Unfortunately they are also complex and difficult to test, leading to a significant amount of technical debt which contributes to slower iteration cycles. In this episode James Campbell describes how he helped create the Great Expectations framework to help you gain control and confidence in your data delivery workflows, the challenges of validating and monitoring the quality and accuracy of your data, and how you can use it in your own environments to improve your ability to move fast.
Pay Down Technical Debt In Your Data Pipeline With Great Expectations - Episode 117January 27, 2020 Comments Off on Pay Down Technical Debt In Your Data Pipeline With Great Expectations - Episode 117
Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpected events that require a change in your platform architecture. In this episode the head of data for Mayvenn shares their experience migrating an existing set of streaming workflows onto the Ascend platform after their previous vendor was acquired and changed their offering. This is an interesting discussion about the ongoing maintenance and decision making required to keep your business data up to date and accurate.
The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth. YugabyteDB is an open source database designed to support planet scale workloads with high data density and full ACID compliance. In this episode Karthik Ranganathan explains how Yugabyte is architected, their motivations for being fully open source, and how they simplify the process of scaling your application from greenfield to global.
Databases are useful for inspecting the current state of your application, but inspecting the history of that data can get messy without a way to track changes as they happen. Debezium is an open source platform for reliable change data capture that you can use to build supplemental systems for everything from maintaining audit trails to real-time updates of your data warehouse. In this episode Gunnar Morling and Randall Hauch explain why it got started, how it works, and some of the myriad ways that you can use it. If you have ever struggled with implementing your own change data capture pipeline, or understanding when it would be useful then this episode is for you.
DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to support their customers, they need to capture, process, and analyze massive amounts of timeseries data with a high degree of uptime and reliability. Vadim Semenov works on their data engineering team and joins the podcast in this episode to discuss the challenges that he works through, the systems that DataDog has built to power their business, and how their teams are organized to allow for rapid growth and massive scale. Getting an inside look at the companies behind the services we use is always useful, and this conversation was no exception.