Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

08 December 2025

Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics - E491

0:00/0:00

Share on social media:

Description
Transcript
Chapters

Summary
In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with data beyond SQL. Michael shares the origins of Malloy from his and Lloyd Tabb’s experience at Looker, why SQL’s mental model often fights human problem solving, and how Malloy aims to be a composable, maintainable language that treats SQL as the assembly layer rather than something humans should write. He explores Malloy’s core ideas — semantic modeling tightly coupled with a query language, hierarchical data as the default mental model, and preserving context so analysis stays interactive and open-ended. He also digs into the developer experience and ecosystem: Malloy’s TypeScript implementation, VS Code integration, CLI, emerging notebook support, and how Malloy can sit alongside or replace parts of existing transformation workflows. Michael discusses practical trade-offs in language design, the surprising fit for LLM-generated queries, and near-term roadmap areas like dimensional filtering, better aggregation strategies across levels, and closing gaps that still require escaping to SQL. He closes with an invitation to contribute to the open-source project and help shape its evolution.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.
You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at MongoDB.com/Build
Your host is Tobias Macey and today I'm interviewing Michael Toy about Malloy, a modern language for building composable and maintainable analytics and data models on relational engines

Interview

Introduction
How did you get involved in the area of data management?
Can you describe what Malloy is and the story behind it?
- What is the core problem that you are trying to solve with Malloy?
There are countless projects that aim to reimagine/reinvent/replace SQL. What are the factors that make Malloy stand out in your mind?
Who are the target personas for the Malloy language?
One of the key success factors for any language is the ecosystem around it and the integrations available to it. How does Malloy fit in the toolchains and workflows for data engineers and analysts?
Can you describe the key design and syntax elements of Malloy?
- How have the scope and focus of the language evolved since you first started working on it?
How do the structure and semantics of Malloy change the ways that teams think about their data models?
SQL-focused tools have gained prominence as the means of building the tranfromation stage of data pipelines. How would you characterize the capabilities of Malloy as a tool for building translation pipelines?
What are the most interesting, innovative, or unexpected ways that you have seen Malloy used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Malloy?
When is Malloy the wrong choice?
What do you have planned for the future of Malloy?

Contact Info

Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Introductions and Michael Toys path from Looker to Malloy