Linode

Exploring The Expanding Landscape Of Data Professions with Josh Benamram of Databand - Episode 180

“Business as usual” is changing, with more companies investing in data as a first class concern. As a result, the data team is growing and introducing more specialized roles. In this episode Josh Benamram, CEO and co-founder of Databand, describes the motivations for these emerging roles, how these positions affect the team dynamics, and the types of visibility that they need into the data platform to do their jobs effectively. He also talks about how his experience working with these teams informs his work at Databand. If you are wondering how to apply your talents and interests to working with data then this episode is a must listen.

Read More

Data Quality Management For The Whole Team With Soda Data - Episode 178

Data quality is on the top of everyone’s mind recently, but getting it right is as challenging as ever. One of the contributing factors is the number of people who are involved in the process and the potential impact on the business if something goes wrong. In this episode Maarten Masschelein and Tom Baeyens share the work they are doing at Soda to bring everyone on board to make your data clean and reliable. They explain how they started down the path of building a solution for managing data quality, their philosophy of how to empower data engineers with well engineered open source tools that integrate with the rest of the platform, and how to bring all of the stakeholders onto the same page to make your data great. There are many aspects of data quality management and it’s always a treat to learn from people who are dedicating their time and energy to solving it for everyone.

Read More

Real World Change Data Capture At Datacoral - Episode 177

The world of business is becoming increasingly dependent on information that is accurate up to the minute. For analytical systems, the only way to provide this reliably is by implementing change data capture (CDC). Unfortunately, this is a non-trivial undertaking, particularly for teams that don’t have extensive experience working with streaming data and complex distributed systems. In this episode Raghu Murthy, founder and CEO of Datacoral, does a deep dive on how he and his team manage change data capture pipelines in production.

Read More

Managing The DoorDash Data Platform - Episode 176

The team at DoorDash has a complex set of optimization challenges to deal with using data that they collect from a multi-sided marketplace. In order to handle the volume and variety of information that they use to run and improve the business the data team has to build a platform that analysts and data scientists can use in a self-service manner. In this episode the head of data platform for DoorDash, Sudhir Tonse, discusses the technologies that they are using, the approach that they take to adding new systems, and how they think about priorities for what to support for the whole company vs what to leave as a specialized concern for a single team. This is a valuable look at how to manage a large and growing data platform with that supports a variety of teams with varied and evolving needs.

Read More

Leave Your Data Where It Is And Automate Feature Extraction With Molecula - Episode 175

A majority of the time spent in data engineering is copying data between systems to make the information available for different purposes. This introduces challenges such as keeping information synchronized, managing schema evolution, building transformations to match the expectations of the destination systems. H.O. Maycotte was faced with these same challenges but at a massive scale, leading him to question if there is a better way. After tasking some of his top engineers to consider the problem in a new light they created the Pilosa engine. In this episode H.O. explains how using Pilosa as the core he built the Molecula platform to eliminate the need to copy data between systems in able to make it accessible for analytical and machine learning purposes. He also discusses the challenges that he faces in helping potential users and customers understand the shift in thinking that this creates, and how the system is architected to make it possible. This is a fascinating conversation about what the future looks like when you revisit your assumptions about how systems are designed.

Read More

Bridging The Gap Between Machine Learning And Operations At Iguazio - Episode 174

The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. In this episode Yaron Haviv, co-founder of Iguazio, discusses the complexities inherent to the process, as well as how he has worked to democratize the technologies necessary to make machine learning operations maintainable.

Read More

Self Service Open Source Data Integration With AirByte - Episode 173

Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed platforms available, but the list of options for an open source system that supports a large variety of sources and destinations is still embarrasingly short. The team at Airbyte is adding a new entry to that list with the goal of making robust and easy to use data integration more accessible to teams who want or need to maintain full control of their data. In this episode co-founders John Lafleur and Michel Tricot share the story of how and why they created Airbyte, discuss the project’s design and architecture, and explain their vision of what an open soure data integration platform should offer. If you are struggling to maintain your extract and load pipelines or spending time on integrating with a new system when you would prefer to be working on other projects then this is definitely a conversation worth listening to.

Read More

Building The Foundations For Data Driven Businesses at 5xData - Episode 172

Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data that an organization collects, there are certain foundational capabilities that they need to have capacity for. In order to help more businesses build those foundations, Tarush Aggarwal created 5xData, offering collaborative workshops to assist in setting up the technical and organizational systems that are necessary to succeed. In this episode he shares his thoughts on the core elements that are necessary for every business to be data driven, how he is helping companies incorporate those capabilities into their structure, and the ongoing support that he is providing through a network of mastermind groups. This is a great conversation about the initial steps that every group should be thinking of as they start down the road to making data informed decisions.

Read More

How Shopify Is Building Their Production Data Warehouse Using DBT - Episode 171

With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of the best ways to get a true understanding of how a technology works in practice is to hear from people who are running it in production. In this episode Zeeshan Qureshi and Michelle Ark share their experiences using DBT to manage the data warehouse for Shopify. They explain how the structured the project to allow for multiple teams to collaborate in a scalable manner, the additional tooling that they added to address the edge cases that they have run into, and the optimizations that they baked into their continuous integration process to provide fast feedback and reduce costs. This is a great conversation about the lessons learned from real world use of a specific technology and how well it lives up to its promises.

Read More