Latest Episodes

A Practical Introduction To Graph Data Applications - Episode 144

Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. In this episode Denise Gosnell and Matthias Broecheler discuss their recent book, the Practitioner's Guide To Graph Data, including the fundamental principles that you need to know about graph structures, the current state of graph support in database engines, tooling, and query languages, as well as useful tips on potential pitfalls when...

Play Episode

Build More Reliable Distributed Systems By Breaking Them With Jepsen - Episode 143

A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and identifying when and why they break. In this episode he shares his approach to testing complex systems, the common challenges that are faced by engineers who build them, and why it is important to understand their limitations. This was a great...

Play Episode

Making Wind Energy More Efficient With Data At Turbit Systems - Episode 142

Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute to suboptimal power outputs. In this episode he shares the story of how he got started working with wind energy, the system that he has built to collect data from the individual turbines, and how he is using machine learning to provide valuable insights...

Play Episode

Open Source Production Grade Data Integration With Meltano - Episode 141

The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that situation. In this episode, project lead Douwe Maan shares the history of how Meltano got started, the motivation for the recent shift in focus, and how it is implemented. The Singer ecosystem has laid the groundwork for a...

Play Episode

DataOps For Streaming Systems With Lenses.io - Episode 140

There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That's what the Lenses.io DataOps platform is built for. In this episode CTO Andrew Stevenson discusses the challenges that arise from building decoupled systems, the benefits of using SQL as the common interface for your data, and the metrics that...

Play Episode

Data Collection And Management For Teaching Machines To Hear At Audio Analytic - Episode 139

We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environment is a much greater challenge. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition technology. In this episode Dr. Chris Mitchell and Dr. Thomas le Cornu describe the challenges that they are faced with in the collection and labelling of high quality data to make this possible, including the lack of a publicly...

Play Episode

Bringing Business Analytics To End Users With GoodData - Episode 138

The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves there is a growing need to bring business intelligence use cases to a broader audience. GoodData is a platform focused on simplifying the work of bringing data to employees and end users. In this episode Sheila Jung and Philip Farr discuss how the GoodData platform is being used, how it is architected to provide scalable...

Play Episode

Accelerate Your Machine Learning With The StreamSQL Feature Store - Episode 137

Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data being processed. In order to reduce friction in the process of developing and delivering models there has been a recent trend toward building a dedicated feature. In this episode Simba Khadder discusses his work at StreamSQL building a feature store to make creation, discovery, and monitoring of features fast and easy to manage. He describes the architecture of the system, the benefits of streaming data for machine...

Play Episode

Data Management Trends From An Investor Perspective - Episode 136

The landscape of data management and processing is rapidly changing and evolving. There are certain foundational elements that have remained steady, but as the industry matures new trends emerge and gain prominence. In this episode Astasia Myers of Redpoint Ventures shares her perspective as an investor on which categories she is paying particular attention to for the near to medium term. She discusses the work being done to address challenges in the areas of data quality, observability, discovery, and streaming. This is a useful conversation to gain a macro...

Play Episode

Building A Data Lake For The Database Administrator At Upsolver - Episode 135

Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. In order to bring the DBA into the new era of data management the team at Upsolver added a SQL interface to their data lake platform. In this episode Upsolver CEO Ori Rafael and CTO Yoni...

Play Episode

Join The Mailing List