Latest Episodes

Setting The Stage For The Next Chapter Of The Cassandra Database - Episode 220

The Cassandra database is one of the first open source options for globally scalable storage systems. Since its introduction in 2008 it has been powering systems at every scale. The community recently released a new major version that marks a milestone in its maturity and stability as a project and database. In this episode Ben Bromhead, CTO of Instaclustr, shares the challenges that the community has worked through, the work that went into the release, and how the stability and testing improvements are setting the stage for the future...

Play Episode

A View From The Round Table Of Gartner's Cool Vendors - Episode 219

Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For businesses that are working in the data management and analytics space they recognized the efforts of Timbr.ai, Soda Data, Nexla, and Tada. In this episode the founders and leaders of each of these organizations share their perspective on the current state of the market, and the challenges facing businesses and data professionals today.

Play Episode

Designing And Building Data Platforms As A Product - Episode 218

The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your organization? In this episode Lior Gavish, Lior Solomon, and Atul Gupte share their view of what it means to have a data platform, discuss their experiences building them at various companies, and provide advice on how to treat them like a software product. This is a valuable conversation about how to approach the work of selecting the tools that you use to power your data...

Play Episode

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana - Episode 217

The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the community has focused their efforts on making it the fastest possible option for running your analytics in the cloud. In this episode Dipti Borkar discusses the work that she and her team are doing at Ahana to simplify the work of running your own PrestoDB environment in the cloud. She explains how they are optimizin the runtime to reduce latency and increase query throughput,...

Play Episode

Do Away With Data Integration Through A Dataware Architecture With Cinchy - Episode 216

The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software be the owner of the data that it generates, we have to go through the trouble of extracting the information to then be used elsewhere. The team at Cinchy are working to bring about a new paradigm of software architecture that puts the data as the central element. In this episode Dan DeMers, Cinchy's CEO, explains how their concept of a "Dataware" platform eliminates...

Play Episode

Decoupling Data Operations From Data Infrastructure Using Nexla - Episode 215

The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of this stage in our collective journey the focus has been shifting toward operation and automation of the infrastructure and workflows that power our analytical workloads. It is an encouraging sign for the industry, but it is still a complex and challenging undertaking. In order to make this world of DataOps more accessible and manageable the team at Nexla has built a platform that decouples the logical unit...

Play Episode

Let Your Analysts Build A Data Lakehouse With Cuelake - Episode 214

Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. In this episode Vikrant Dubey discusses his work on the Cuelake project which allows data analysts to build a lakehouse with SQL queries. By building on top of Zeppelin, Spark, and Iceberg he and his team at Cuebook have built an autoscaled cloud native system that abstracts the underlying complexity.

Play Episode

Migrate And Modify Your Data Platform Confidently With Compilerworks - Episode 213

A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the vendor fails? What if the technology can't do what I need it to? Compilerworks set out to reduce the pain and complexity of migrating between platforms, and in the process added an advanced lineage tracking capability. In this episode Shevek, CTO of Compilerworks, takes us on an interesting journey through the many technical and social complexities that are involved in evolving your data platform...

Play Episode

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop - Episode 212

The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do when you need to manage unstructured information, or build a computer vision model? Activeloop was created for exactly that purpose. In this episode Davit Buniatyan, founder and CEO of Activeloop, explains why he is spending his time and energy on building a platform to simplify the work of getting your unstructured data ready for machine learning. He discusses the inefficiencies that teams run into from...

Play Episode

Join The Mailing List