08 October 2019

Fast Analytics On Semi-Structured And Structured Data In The Cloud - E101

The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of the recent options is Rockset, a serverless platform for fast SQL analytics on semi-structured and structured data. In this episode CEO Venkat Venkataramani and SVP of Product Shruti Bhat explain the origins of Rockset, how it is architected to allow for fast and flexible SQL analytics on your data, and how their serverless platform can save you the time and effort of implementing portions of your own infrastructure.


  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Your host is Tobias Macey and today I’m interviewing Shruti Bhat and Venkat Venkataramani about Rockset, a serverless platform for enabling fast SQL queries across all of your data


  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing what Rockset is and your motivation for creating it?
    • What are some of the use cases that it enables which would otherwise be impractical or intractable?
  • How does Rockset fit into the infrastructure and workflow of data teams and what portions of a typical stack does it replace?
  • Can you describe how the Rockset platform is architected and how it has evolved as you onboard more customers?
  • Can you describe the flow of a piece of data as it traverses the full lifecycle in Rockset?
  • How is your storage backend implemented to allow for speed and flexibility in the query layer?
    • How does it manage distribution, balancing, and durability of the data?
    • What are your strategies for handling node and region failure in the cloud?
  • You have a whitepaper describing your architecture as being oriented around microservices on Kubernetes in order to be cloud agnostic. How do you handle the case where customers have data sources that span multiple cloud providers or regions and the latency that can result?
  • How is the query engine structured to allow for optimizing so many different query types (e.g. search, graph, timeseries, etc.)?
  • With Rockset handling a large portion of the underlying infrastructure work that a data engineer might be involved with, what are some ways that you have seen them use the time that they have gained and how has that benefitted the organizations that they work for?
  • What are some of the most interesting/unexpected/innovative ways that you have seen Rockset used?
  • When is Rockset the wrong choice for a given project?
  • What have you found to be the most challenging and the most exciting aspects of building the Rockset platform and company?
  • What do you have planned for the future of Rockset?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

