Databases

A Candid Exploration Of Timeseries Data Analysis With InfluxDB - Episode 199

While the overall concept of timeseries data is uniform, its usage and applications are far from it. One of the most demanding applications of timeseries data is for application and server monitoring due to the problem of high cardinality. In his quest to build a generalized platform for managing timeseries Paul Dix keeps getting pulled back into the monitoring arena. In this episode he shares the history of the InfluxDB project, the business that he has helped to build around it, and the architectural aspects of the engine that allow for its flexibility in managing various forms of timeseries data. This is a fascinating exploration of the technical and organizational evolution of the Influx Data platform, with some promising glimpses of where they are headed in the near future.

Read More

Make Database Performance Optimization A Playful Experience With OtterTune - Episode 197

The database is the core of any system because it holds the data that drives your entire experience. We spend countless hours designing the data model, updating engine versions, and tuning performance. But how confident are you that you have configured it to be as performant as possible, given the dozens of parameters and how they interact with each other? Andy Pavlo researches autonomous database systems, and out of that research he created OtterTune to find the optimal set of parameters to use for your specific workload. In this episode he explains how the system works, the challenge of scaling it to work across different database engines, and his hopes for the future of database systems.

Read More

Accelerating Machine Learning Training And Delivery With In-Database ML - Episode 195

When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don’t have to move the information? In this episode Paige Roberts explains the benefits of pushing the machine learning processing into the database layer and the approach that Vertica has taken for their implementation. If you are looking for a way to speed up your experimentation, or an easy way to apply AutoML then this conversation is for you.

Read More

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse - Episode 190

The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage. Yellowbrick is a data warehouse platform that was built from the ground up for speed, and can work across clouds and all the way to the edge. In this episode CTO Mark Cusack explains how the engine is architected, the benefits that speed and predictable pricing has for the organization, and how you can simplify your platform by putting the warehouse close to the data, instead of the other way around.

Read More

Easily Build Advanced Similarity Search With The Pinecone Vector Database - Episode 189

Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the models to integrate with external systems their internal state has to be translated into a lower dimension. To eliminate this impedance mismatch Edo Liberty founded Pinecone to build database that works natively with vectors. In this episode he explains how this technology will allow teams to accelerate the speed of innovation, how vectors make it possible to build more advanced search functionality, and how Pinecone is architected. This is an interesting conversation about how reconsidering the architecture of your systems can unlock impressive new capabilities.

Read More

Building Your Data Warehouse On Top Of PostgreSQL - Episode 186

There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you might consider building on top of the venerable PostgreSQL project. In this episode Thomas Richter and Joshua Drake share their advice on how to build a production ready data warehouse with Postgres.

Read More

Making Analytical APIs Fast With Tinybird - Episode 185

Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wants to make it easy to turn a continuous stream of data into a production ready API or data product. In this episode CEO Jorge Sancha explains how they have architected their system to handle high data throughput and fast response times, and why they have invested heavily in Clickhouse as the core of their platform. This is a great conversation about the challenges of building a maintainable business from a technical and product perspective.

Read More

System Observability For The Cloud Native Era With Chronosphere - Episode 170

Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of milliseconds or single-digit seconds, and the consumers of the data need to be able to query the information quickly and flexibly. As the systems that we build continue to grow in scale and complexity the need for reliable and manageable monitoring platforms increases proportionately. In this episode Rob Skillington, CTO of Chronosphere, shares his experiences building metrics systems that provide observability to companies that are operating at extreme scale. He describes how the M3DB storage engine is designed to manage the pressures of a critical system component, the inherent complexities of working with telemetry data, and the motivating factors that are contributing to the growing need for flexibility in querying the collected metrics. This is a fascinating conversation about an area of data management that is often taken for granted.

Read More

Enabling Version Controlled Data Collaboration With TerminusDB - Episode 167

As data professionals we have a number of tools available for storing, processing, and analyzing data. We also have tools for collaborating on software and analysis, but collaborating on data is still an underserved capability. Gavin Mendel-Gleason encountered this problem first hand while working on the Sesshat databank, leading him to create TerminusDB and TerminusHub. In this episode he explains how the TerminusDB system is architected to provide a versioned graph storage engine that allows for branching and merging of data sets, how that opens up new possibilities for individuals and teams to work together on building new data repositories. This is a fascinating conversation on the technical challenges involved, the opportunities that such as system provides, and the complexities inherent to building a successful business on open source.

Read More

Simplify Your Data Architecture With The Presto Distributed SQL Engine - Episode 149

Databases are limited in scope to the information that they directly contain. For analytical use cases you often want to combine data across multiple sources and storage locations. This frequently requires cumbersome and time-consuming data integration. To address this problem Martin Traverso and his colleagues at Facebook built the Presto distributed query engine. In this episode he explains how it is designed to allow for querying and combining data where it resides, the use cases that such an architecture unlocks, and the innovative ways that it is being employed at companies across the world. If you need to work with data in your cloud data lake, your on-premise database, or a collection of flat files, then give this episode a listen and then try out Presto today.

Read More