Taking A Tour Of The Google Cloud Platform For Data And Analytics - Episode 194

Google pioneered an impressive number of the architectural underpinnings of the broader big data ecosystem. Now they offer the technologies that they run internally to external users of their cloud platform. In this episode Lak Lakshmanan enumerates the variety of services that are available for building your various data processing and analytical systems. He shares some of the common patterns for building pipelines to power business intelligence dashboards, machine learning applications, and data warehouses. If you've ever been overwhelmed or confused by the array of services available in the Google Cloud Platform then this episode is for you.

Play Episode

Make Sure Your Records Are Reliable With The BookKeeper Distributed Storage Layer - Episode 193

The way to build maintainable software and systems is through composition of individual pieces. By making those pieces high quality and flexible they can be used in surprising ways that the original creators couldn't have imagined. One such component that has gone above and beyond its originally envisioned use case is BookKeeper, a distributed storage system that is optimized for durability and speed. In this episode Matteo Merli shares the story behind the creation of BookKeeper, the various ways that it is being used today, and the architectural aspects that make it such a strong building block for projects such as Pulsar. He also shares some of the other interesting systems that have been built on top of it...

Play Episode

Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook - Episode 192

SQL is the most widely used language for working with data, and yet the tools available for writing and collaborating on it are still clunky and inefficient. Frustrated with the lack of a modern IDE and collaborative workflow for managing the SQL queries and analysis of their big data environments, the team at Pinterest created Querybook. In this episode Justin Mejorada-Pier and Charlie Gu share the story of how the initial prototype for a data catalog ended up as one of their most widely used interfaces to their analytical data. They also discuss the unique combination of features that it offers, how it is implemented, and the path to releasing it as open source. Querybook is an impressive and...

Play Episode

Making Data Pipelines Self-Serve For Everyone With Shipyard - Episode 191

Every part of the business relies on data, yet only a small team has the context and expertise to build workflows and pipelines to transform, clean, and integrate it. In order for the true value of your data to be realized without burning out your engineers you need a way for everyone to get access to the information they care about. To help make that a more tractable problem Blake Burch co-founded Shipyard. In this episode he explains the utility of a low code solution that lets non engineers create their own self-serve pipelines, how the Shipyard platform is designed to make that possible, and how it allows engineers to create reusable tasks to satisfy the specific needs of...

Play Episode

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse - Episode 190

The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage. Yellowbrick is a data warehouse platform that was built from the ground up for speed, and can work across clouds and all the way to the edge. In this episode CTO Mark Cusack explains how the engine is architected, the benefits that speed and predictable pricing has for the organization, and how you can simplify your platform by putting the warehouse close to the data, instead of the other way around.

Play Episode

Join The Mailing List