In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with data beyond SQL. Michael shares the origins of Malloy from his and Lloyd Tabb’s experience at Looker, why SQL’s mental model often fights human problem solving, and how Malloy aims to be a composable, maintainable language that treats SQL as the assembly layer rather than something humans should write. He explores Malloy’s core ideas — semantic modeling tightly coupled with a query language, hierarchical data as the default mental model, and preserving context so analysis stays interactive and open-ended. He also digs into the developer experience and ecosystem: Malloy’s TypeScript implementation, VS Code integration, CLI, emerging notebook support, and how Malloy can sit alongside or replace parts of existing transformation workflows. Michael discusses practical trade-offs in language design, the surprising fit for LLM-generated queries, and near-term roadmap areas like dimensional filtering, better aggregation strategies across levels, and closing gaps that still require escaping to SQL. He closes with an invitation to contribute to the open-source project and help shape its evolution.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
- Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
- Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.
- You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at MongoDB.com/Build
- Your host is Tobias Macey and today I'm interviewing Michael Toy about Malloy, a modern language for building composable and maintainable analytics and data models on relational engines
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Malloy is and the story behind it?
- What is the core problem that you are trying to solve with Malloy?
- There are countless projects that aim to reimagine/reinvent/replace SQL. What are the factors that make Malloy stand out in your mind?
- Who are the target personas for the Malloy language?
- One of the key success factors for any language is the ecosystem around it and the integrations available to it. How does Malloy fit in the toolchains and workflows for data engineers and analysts?
- Can you describe the key design and syntax elements of Malloy?
- How have the scope and focus of the language evolved since you first started working on it?
- How do the structure and semantics of Malloy change the ways that teams think about their data models?
- SQL-focused tools have gained prominence as the means of building the tranfromation stage of data pipelines. How would you characterize the capabilities of Malloy as a tool for building translation pipelines?
- What are the most interesting, innovative, or unexpected ways that you have seen Malloy used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Malloy?
- When is Malloy the wrong choice?
- What do you have planned for the future of Malloy?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Malloy
- Lloyd Tabb
- SQL
- Looker
- LookML
- dbt
- Relational Algebra
- Typescript
- Ruby
- [Truffle](
- Malloy VSCode Plugin
- Malloy CLI
- Malloy Pick Statement
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data teams everywhere face the same problem. They're forcing ML models, streaming data, and real time processing through orchestration tools built for simple ETL. The result, inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed, flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high memory machines or distributed compute.
Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI engineering, streaming, Prefect runs it all from ingestion to activation in one platform. WHOOP and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workloads, see what it can do for you at dataengineeringpodcast.com/prefect. Composable data infrastructure is great until you spend all of your time gluing it back together. BRUIN is an open source framework driven from the command line that makes integration a breeze. Write Python and SQL to handle the business logic and let BRUIN handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement.
Bruin allows you to build end to end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for DBT Cloud customers, they'll give you a thousand dollar credit to migrate to Bruin Cloud. Your host is Tobias Macy, and today I'm interviewing Michael Toy about Malloy, a modern language for building composable and maintainable analytics and data models on relational engines. So, Michael, can you start by introducing yourself?
[00:02:10] Michael Toy:
Hi. I'm Michael Toy. I'm one of the creators of Malloy.
[00:02:15] Tobias Macey:
And do you remember how you first got started working in data?
[00:02:18] Michael Toy:
Yes. The way I got started working in data is that a friend dragged me into his company, and the friend is Lloyd Tabb, who's the other cocreator, codiscoverer of Malloy. And he's someone who's been passionate about data his whole life. I'll probably be talking about Lloyd a lot, because his his vision of of what the data experience should be is is a huge influence in what Malloy is. You know, he started a company to build a data tool, and, he said, hey. You wanna come start with this company with me? I said, oh, that's stupid. Nobody pays money for tools. And then a year later, I didn't start the company with him. I but I joined him, and then that company became Looker, which was a big deal. I don't know. But we we did pretty good. And so I I I worked a lot on on on making Looker, and that was my first experience in, in in being seriously involved in in data as opposed to, like, you know, using data.
[00:03:11] Tobias Macey:
And so digging now into Malloy, can you give a bit of an overview about what it is and some of the story behind how it got started? Well, sure. And and and it's it's kinda bound up in the history of, of Looker and and of what problems,
[00:03:24] Michael Toy:
exist in the world. So there's a there's a set of problems, in trying to understand what the right user experience is to create, are for people who are trying to do work based on data. And and Looker was a an attempt to address some of those things. And, the at the core of Looker, there's a little thing which we now call semantic modeling. I don't I don't know if what we called it before Looker. But now that Looker exists, everyone calls it semantic modeling. I don't wanna claim that we invented it, but we did it really well where, you could write the SQL for your model like you always did, but you had to write it in kind of a structured way, so that there was enough metadata about your computation so that you could mix and match and sort of compose queries based on what the user wanted to see, and then you could compose the SQL from there. And, Looker was based main selling point for Looker when we walked into people who'd never seen Looker before was, look, it's just SQL. Right? Everything that's getting written here is is either SQL that we're generating or that you wrote yourself and look at the queries we're writing and they're just SQL. And Looker was tremendously successful, but there were some places where SQL itself was kind of in the way, and it needed to not just be SQL because the the the data surface, of SQL doesn't really match great to the way that that people like to think about data. But, you know, we made a multibillion dollar company on a pretty good idea, and that was great. But as, Lloyd and I, stopped being so involved in the day to day operation of Looker and then Google bought Looker, we had some time to say, well, what should data look like? And and so Malloy was just you know, we had some time to to sort of take a fresh cut at thinking about how to interact with data and our frustrations with with Looker, resulted in Malloy. So
[00:04:59] Tobias Macey:
And so in terms of the Malloy language and project, as you mentioned, SQL can be very verbose. It can be very obtuse to try and get to the thing that you're really trying to do, and it is also verging on the area of Pearl of being a write once, read never language if you're not careful. And so in terms of Malloy in particular, I'm curious what the core problem is that you're trying to solve and who you're trying to solve it for.
[00:05:28] Michael Toy:
Yeah. Well, this is where so I'm I'm gonna claim to be some things that I am not, but let's pretend that I'm standing in for all the people who did a thing. And when I say I, I really mean we, because I don't wanna be weird about it. And a lot of the stuff that I say I is me, and a lot of it is me and the people language, designing the language, implementing the language. And that's not again, that's not true. There's a bunch of other people who's who did a a ton of this work, but let's pretend it's just me just for the purposes of this conversation. And then purposes of this conversation. And then Lloyd is the person who is responsible for I have spent my entire life figuring out how to make great user experiences with data forever.
And every place I go, this is the job that I'm solving, and I know more about this than anyone in the world. He he wouldn't say that, but I would say that because I've I've watched him work. And I wish Lloyd were here to say great things about me, because I'm sure that that would happen too because he also thinks highly of me. So when we come to this problem of what's the best way to think about data and can it be expressed as a language? I'm super passionate about all the ways that computer languages work. And so a lot of things you mentioned about SQL are true and and shouldn't be held against SQL because SQL is, I don't know, 45 years old. It's it's a quite old computer language, and we've learned a lot about computer languages since then. And if someone were designing SQL today, they they would write a a a different language that didn't have all those problems because we've learned a lot about software engineering. And we and so I am passionate about all those things, and and I've tried to fix absolutely all those problems in the design of Molloy. It that it's a it's a computer language which is, you know, composable so that you can build things out of smaller things and glue them together and and and predictable and readable by humans and, you know, a a a bunch of things that that modern language are that that SQL SQL isn't. But then the other impetus for Malloy is not even even really as a as a language. The, so there's a I'm gonna try and channel Lloyd here. So all interactions with data are, should be interactive and open ended. So that at one point, the traditional way that data worked is there's a big room with spinning disks in them, and then there's some people who write SQL to reduce that down to a smaller subset of data. And then there's some people who worked on that smaller subset of data who then hand that to decision a curated version of that to decision makers. And then all decision, it's are being made on three generations away from the original data. So I've got a spreadsheet that the c that only the CFO sees or and and that's the ground truth for the company, but only one person can have that because it's so highly curated by the time it gets to that person. But all data action should be iterative and open ended means that at any point, if I'm holding a piece of data, I need both the insight of this piece of data reveals and all of the context that went into its generation so that I can ask the next question. Right? And that anytime that I produce a thing that's a dead end, I've I've limited your ability to do the right thing with data. And so we didn't start to design a language. We started by saying, in a world where there's no SQL, there's but there is SQL because SQL tells us a lot about data that we shouldn't ignore. What are these sort of objects and actions of humans who are on a sort of iterative data journey? And and then once we sort of know what that is, then what's the natural language of of someone on that journey? And and if we if we describe the problem of SQL first of all, I'm gonna say a lot of bad things about SQL, but SQL is great. There's so much software engineering into making these SQL engines be able to do the things that they do. And Malloy generates SQL. And we think of SQL as the assembly language of data. Like, if you want to do things with data, the machine that you that it should be doing your data thing is SQL.
We just shouldn't ever ask humans to write SQL. That's just an unfair thing in the same way that we don't ask humans to write assembly language. So there are, you know, if there are a million programmers in the world, there are 10 who write assembly language. We would love to live in a world where if there are a million people in the world doing data work, 10 of them have to write an SQL and everyone else says I have I have no time for that, and I really don't need to because I get my work done faster another way. So, anyway, what are the objects and and what's the natural language of someone on that journey? And I I I haven't I almost lost my thread. I was busy praising SQL.
[00:09:31] Tobias Macey:
And I think that the good transition to the topic of the numerous projects that have tried, and some of them have partially succeeded. Many of them have faded into the mists of time as far as trying to be a layer above SQL and compile down to SQL or completely obviate SQL. And some of them are trying to be more pure to the original concepts of relational algebra. Some of them are just trying to be a way of being able to do some string smashing to generate some SQL. I'm gonna call out DBT in that regard. And then there are numerous various ones that are various points along that continuum. And I'm wondering if you can give your stance on how you think about Malloy in that overall ecosystem and some of the things that you've learned from your predecessors.
[00:10:20] Michael Toy:
Sure. So I I would put Looker in the string smashing to generate SQL universe. So we we've certainly been in that world for a while. And and, again, the reason that the the the history of Malloy is, we wanna have a user centric view of data interactions, and then we wanna build an ecosystem to support that. And, it turns out that the the language that you use to describe what's going on with data strongly influences, how you think about the the journey. And one of the reasons that, SQL is a problem is it forces everybody to think about data like SQL does. And SQL thinks about in a data in a way which is super useful, you know, like, to to generate efficient queries, but it doesn't necessarily think about data, the way that people do. And if you ever sit down to write I'm not I'm talking to the wrong people. Anyone who's done any querying in SQL just knows that, like, I have a simple question, and now I have to figure out how to write join statement, right, that doesn't blow up the database that answers my question, and and everything is inverted from the way that I like it to think about it. And so SQL is a problem there. And then SQL is also a problem because it's an it's an it's an ancient language. And so there's an opportunity to replace SQL, which we took, but in, replacing SQL isn't a goal. Right? Our goal is the all data interactions for all time should be done, through the Malloy view of of what data is, and, and and then executed by SQL engines that answer the question. Oh, and what makes Malloy stand out? Yeah. It's based on do what questions do we need to ask? What operations do we need to do in order to answer questions?
Not what is the you know, what's relational algebra and and and how do how does data sit on disks and, you know, those aren't part of Malloy. You you can things that you do in Malloy result in those kinds of operations, but that's not your because those aren't your nouns and verbs. Your nouns and verbs are what's the information I have and what's the relationships between these different pieces of information and what do I wish I knew and keeps you focused sort of on the tasks that you you wanna like, any good high level language. Right?
[00:12:16] Tobias Macey:
I I think it's also interesting as far as the overall history of SQL and the patterns around it that it's only in, I'll say, the past decade, maybe maybe even more recent than that, that we've really started to invest in SQL as a version controlled language rather than it being just a pile of SQL scripts that somebody has in a drive somewhere or stored in their DBMS or stored procedures. And I'm wondering how that change in the approach to SQL, particularly in the analytics space, has changed some of the ways that people think about what SQL or, I guess, the job of SQL and the ways that they're using it and, in particular, the complexity of the operations that they're trying to do versus just doing relying on maybe ETL scripts where most of that logic is in a programming language that happens before the data even lands in the database.
[00:13:08] Michael Toy:
Yeah. No. I think, again, this is a place where Looker was kinda forward looking. So, you're crafting some sort of, data experience. You have, you know, your users and your transactions and and then the, you know, your business things that that are happening. You're trying to build dashboards and insights based on that. And there's what that looked like yesterday, and then there's the better thing I'm trying to do today that's gonna be a new version of that. So Looker had a full IDE and with branches and commits and all those things and some way for you to roll stuff out, like real software engineering, not because we're brilliant. We're like, we know how this works in software and this data, interface that we're creating is just software. And we just want the things that we know to be good in software to also be good in data engineering. And so, this isn't to us, it's like we already decided that these curated data experiences are things which which change over time and are software and should be treated like software. And so it was like that what was a was a no brain. The more interesting question to me is there's still sort of two data universes that are not well integrated, and I I sometimes wonder why that is. One is all of the things that we would call data, you know, like columns of numbers and names and and time stamps and stuff. And then there's the other data, which is this curated set of, you know, data transformations. And they just sit in completely different places and have completely different rules for, like, how you access them and how you change them.
And and if you say, well, we should put all of Malloy code in the database. It's like, well, that really breaks because databases are designed to have certain kinds of, you know, controls, over, like, who can do stuff and who cannot do stuff. And it gets really clumsy to try and, like, oh, I wanna add a new table. Oh, gosh. You need to file a form with somebody to to add a new table because I mean, not not because there's bureaucracy. There are reasons that there are controls like that on the data, and we think of data as a thing which has a certain kind of controls. I think a source code is things that have a different kind of controls. It continues to feel to me like if we just thought about this better that all data could sort of live in a beautiful, world where, you could get, version control on in branches on data like you do with source code. Right? And you could get sort of infinite shareability because everybody's looking at the same data,
[00:15:16] Tobias Macey:
that that they could they could live together. But I don't know how that works. Yeah. It's definitely another whole problem that people are working on solving, just the idea of versioning data and the challenges of data gravity. And
[00:15:27] Michael Toy:
Well, and and and we know that it's it's it's it's a great thing because we in this other world of data, we do that all the time, and it's super useful. And if you think about it well, it it it's not that expensive. Right? But you just need to have a high enough level of of view of what it is to actually implement it. I think another interesting element of what you're doing with Malloy is
[00:15:49] Tobias Macey:
the audience where SQL as a language was originally conceived of as something that was close enough to English that business users would be able to write it and execute it, which never truly came to fruition. There are pockets where there are people who are somewhat removed from the engineering department who do write SQL and are able to explore things and fulfill their needs, whereas Malloy is much more targeted at, a software engineering audience. It's treated as a programming language. It has elements of composability and reuse, which SQL never really incorporated. And I'm wondering how you think about the design of the language given the original formulation of what SQL was supposed to be versus where we are today?
[00:16:33] Michael Toy:
Of all the people in the Malloy project, I'm the perfect person to ask about this because I've thought about this more than anybody. So there was a a long there's been a long standing experiment in programming languages to make them like human speech, and, none of those have been long term successful. They all sort of like bubble up and then go away. And and I've I have my theory about that, which I will now stand on my soapbox and and and speak of, because a lot of this, goes into why Malloy is designed like it is. One of the most important activities in some sort of long lived maintainable piece of software is, I open it up and throw it on a page. And then with my eyeballs, I understand something about its structure. And so it's like I'm flying over a city and I can see that there's the football stadium and there's the high school at right, and there's the place where people live, and there's the thing that's obviously the industrial district. And I kinda I can kinda see at 10,000 feet where those pieces are. And then, you know, if I had an airplane that I could steer, I would then go land in the place where I needed to land and then walk around and do the things that I needed to do. And so when you're scanning a piece of code and it's a conversation, right, you have to read the whole conversation to know what the pieces are that you need to change. And so having some textual markers so that as your eyes scan across it, you know what parts you don't have to read is is super important. And a lot of that has gone into design of why Molloy, is designed, both top to bottom and left to right where it is so that you can get to the part that you need to care about and not the parts that you don't care about. And so I have tremendous respect for, let's let's I I I often use the word poetry for the poetry, of a programming language that, when you when you read a sentence, you you know, in your gut what it means, right? That the words make sense to you. And so like that, Pearl would be the worst language in the universe for this. There's no poetry in Pearl. I've written significant applications in Pearl. I don't wanna say bad things about Pearl, but there's, you know, been around a long time and did a lot of things. But it being poetic is not one of the, the things that's that's great about Pearl. Pearl is only poetic, after you're a Pearl expert. And so so there's a human there's a there's a human dimension that that individual sentences need to read and scan like, oh, I know I don't even know the language, but I kinda know what you're doing there that matches well to to SQL where you're saying, you know, sum of cost, right, from table. Right? Oh, I I know what's happening. And there's a that whole thing is a sentence which scans to human beings. So in the design of Malloy, we started with, we're just gonna sit inside of SQL. Like you could write a select statement and an whatever the Malloy version of a select statement is, which we originally called an explorer. And and you could there there there's a tool which turns the the Malloy into SQL, and it's just sort of like a a a macro. And sometimes I I regret walking away from there and that the the reasons that we did that are are interesting, but we eventually walked away from there. And and our our new position was most of our users are experts in SQL. And so we don't wanna be surprising to them in places where we don't have to be. But we do wanna be surprising to them in places where they actually need to think about data differently to get stuff done in Malloy. So we have a where statement and a having statement, and we use those words because that's what SQL users expect. Right? And, and, select and select. So we try to use SQL words and SQL gestures wherever we can, because not because we think that's the best way to talk about data, but because all the people who are really experts in data already know those things, and we wanna be useful to
[00:19:54] Tobias Macey:
I think that the elements of the language are very important, and I think it's also interesting the ways that you are constrained in terms of the keywords and semantics of the language when you do have another target language that people are actively developing in. So I think the closest parallel that we have is probably things like TypeScript to JavaScript JavaScript or the various JavaScript transpiled languages that we've had over the past few years. And I'm wondering just how having that target representation as the intermediate state and the existing corpus of experience and reference material changes the ways that you think about what the language can and should do and how it can and should be structured?
[00:20:41] Michael Toy:
Okay. Well, there's there's a bunch of different questions. First first of all, I wanna say this about JavaScript. Like, I wish I could remember the name of this guy. So this is I think it's, like, a gentleman at Microsoft who did a really interesting video, and his video was, let's look at, like, a transition from c to c plus plus and the transition from JavaScript to TypeScript. Because one of those is, I mean, at this point, people who are switching from c are switching to Rust instead of c plus plus, like, c plus plus never really replaced c. And TypeScript replaced JavaScript, like, you turned around and suddenly everybody was programming in in TypeScript. And he was theorizing about why. Well, what's the difference between those two things? Because, maybe as language designers, we we could learn from that. And and he was trying to kill c plus plus. And so he wanted to learn that lesson so that he could, you know, kill it better. And and one of his theories was that the thing that TypeScript did really well is it sat in the tool chain.
Right? Like, any place where where you needed JavaScript, TypeScript sat perfectly, so it was not hard to decide to use TypeScript at at any one time. And in fact, you could write a line of TypeScript and then a line of JavaScript and then a line of TypeScript if if you wanted to in one file and and it would work okay. So in in Malloy, the relationship with Malloy and SQL, because we just start generating SQL, you can, with some limitations, write some Malloy and some SQL and some Malloy and some SQL and sort of build a computation or set of computations that that use, mixed expressions. And you could choose the expression that works best for you. But the the initial reason that we did that is that initially, Malloy didn't do very much. Right? And so, yes, if you wanted to, you know, group by some things and and add up some numbers, Malloy would be great. But as soon as you wanna do something complicated like a window function or then you'd have to go to SQL. And we wanted to be able to show Malloy doing interesting things. And so we had a way to say, oh, and there's this little piece of SQL here in this corner. We don't yet implement that, but, you know, here's this competition, which is all in Malloy for this little piece of SQL. And and the number the reasons that you would escape to SQL, continue to exist, and we would like to make them all go away, but they still continue to exist. Okay. So that was on the relationship between one language and another. So we would love to be, as easy a decision, as like the Java and JavaScript. Those are really two different languages. They have the same name for reasons. And actually I was at Netscape when JavaScript was invented and so was Lloyd and we were both involved in that. And so I know a lot why JavaScript is called JavaScript.
It was called Live Script at one point. We thought that Java was gonna take over the universe, and we thought that calling it JavaScript was was gonna be the the thing which made people take it seriously because it was just as powerful as Java. So but TypeScript and and and JavaScript is is a great example. We would love to be as as sort of fluid and fluent between those two, as possible, which is, again, another reason why we would like them to as much as possible not have cognitive dissonance between them, except where where it's necessary. Kinda felt like there was a third point. And so that was the other thing that I wanted to say is that, in SQL, the most important verb in SQL is the join. Right? That all the magic of computations come from being able to write creative joins. Right? And that the people who are really good at writing SQL can do things with joins that blow your mind.
But so I've been working in data for over a decade now, so I'm not a complete newbie. But I cannot tell you the number of times, in the last year that I've had to sit down with an LLM and say, hey. Let's talk about join types again. You know, explain them to me again so that I can make sure that I'm thinking about this right as I do this thing because, really, I wanna do this kind of operation. But what do joins do, and and how is that useful to do the thing that I wanna do with data? They're just it's just not the way that humans think about data. It's the way that a mathematician thinks about data, which is an important thing because the fact that they are mathematical is super useful. Right? But the the pieces of joins that you need to use, you'd like to be able to express those in a way which is human understandable. And so that's a a place where where SQL where looking at it like SQL as opposed to looking at like Malloy, is a big difference and you need to do it differently, and expose different sometimes I call them nouns and verbs. Expose different nouns and verbs, in the universe of the objects that we're we're talking about than SQL does in order to make the language better. Yeah. I I said that.
[00:24:58] Tobias Macey:
Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to a factor of six while guaranteeing accuracy? DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to DBT, or handling complex multisystem migrations, they deliver production ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they turn months long migration nightmares into week long success stories.
And then another aspect of building any sort of programming language is the ecosystem and the tooling around it and the capabilities of what it can integrate with. And I'm interested in how you're thinking about that as far as Malloy and the adoption of it and some of the, I guess, stumbling blocks or challenges of being a new language and, in particular, targeting a very prominent existing audience and existing set of workflows and tool chains?
[00:26:14] Michael Toy:
Yeah. That you're really good at what you do because these are great questions. So currently, Malloy is written entirely in TypeScript. If you have a TypeScript runtime, you you can compile Malloy to SQL. You can run queries, and you can get the data back from your queries and process them into charts and things. And that that's all sitting in one in one TypeScript runtime. And, there were a couple of reasons that we made that decision. One was Looker was entirely written in Ruby because it was the cool language thousand years ago when we started Looker. And so we understood a lot of them, and I have a deep love for Ruby. There's there's something really poetic about that language, which I will which I'll always cherish. But in terms of a foundation for an application that there we weren't entirely happy with all the things about Ruby. And so we on purpose picked TypeScript because it it felt to us like this was a language that was powerful enough that it wasn't gonna get in our way, but that the type checking in TypeScript and and the extension of type checking to actual interfaces, as opposed to just functions and, and, and variables was just a super powerful tool that was gonna let us do things. And we just felt like it looked like TypeScript runtimes were gonna be everywhere. And if we ran in TypeScript, we could run what all the places that Malloy needed to go, we would be able to get there with TypeScript. And and we just have gotten a tremendous benefits from from that one decision. Today, if I could have a wish, I don't know if I would have wished that we had done it all in Python, because today, a lot of the really important and interesting work that's happening in in database happens centered around Python. And if Malloy and you can, download a Python package and compile Malloy, but that Python package works by having a node runtime and talking to it and then getting the data out of the node runtime and putting in a Python thing, which I'm not I'm not the marketing guy from Malloy. That's just a bad way to do stuff. It it it's a nice demo of capabilities, but it doesn't it doesn't feel great to me as a software engineer for the thing that I would like to to to ship to people. But in order if I were in Python, all the things that we can do right now because we're in TypeScript, we would kind of lose because Wasm compiled Python works. Right?
In the same way that using Malloy from Python works, it's it's not it's not pleasant. And so, you know, I know in an in an infinitely great world, we would have a two parallel implementations of Malloy. One was in pure Python and one was in pure TypeScript, and and there was some way to keep them in sync. I I don't know what the the right answer to that is. Although I do spend I have an LLM context. I have it where I just talk about this problem to the LLM, and I say, hey. What about truffle? Hey. You know? What about, if we did this like this, what would it do that? And and then in the end, you know, I say, yeah. But if we did that, this would be a problem. At the end, the LLM says, yeah. It looks like you have a big problem. I don't know if it's solvable, but I keep dreaming that there's some way that that it is solvable. There's only one piece of core Malloy, and it's sort of available everywhere to all people.
[00:29:12] Tobias Macey:
Seems like the natural next step would be to just rewrite it all in Rust, and then you can compile to a WASM binary and link it to Python.
[00:29:20] Michael Toy:
That that was, one of the LLM's suggestions was was to rewrite everything in Rust. And and that's actually yeah. I I well, at at that time, I went, okay. I understand the size and scope of that project, and I can't afford to stop working on Malloy long enough to take the time to do that. But I hope that someday, somebody says it's time to do that. And and the day that somebody does that is the day that I know that Malloy has has has won. And so my goal is for that to be a real project. But you you went right to the thing that it took an LLM and I days to come to, so that was quite insightful.
[00:29:57] Tobias Macey:
And the other aspect of language ecosystem and integration is the set of tool chains around it where being in TypeScript, obviously, there are a lot of potentials for integrating particularly in front end style applications. I know that you have the Versus code plug in, which I'm sure was a very natural extension given the language that you're working with. But I'm also interested in things such as how it's being integrated into orchestration tools and data work flows and whether this is largely being used in the context of interactive notebooks for people who are doing exploratory analysis and then generating data visualizations, or if it is a potential replacement or augmentation to something like a DBT for being able to be the actual transformation engine for a sequence of data workflows, etcetera.
[00:30:48] Michael Toy:
Yeah. Well, the marketing answer is all of those, but the the the real answer is, so the we have a Versus code plug in which works as a sort of an IDE to Malloy. And then we have a an extension to our extension which lets you open up notebooks. So I can have, you know, some text and and some Malloy query and some text in the Malloy query, and I can successively, you know, transform data, in in in the course of a notebook. And there have been experiments, and I don't I'm not up on all the of integrating with Python notebooks with Malloy.
So if Malloy is the right way to to write a query, there's there's a way to to use Malloy in in a notebook like situation. And if it doesn't work in your notebook today, it wouldn't be that hard because the core run time is already browser resident. Right? So and that's and that's a goodness. For transformation pipelines, the the unit of, because transformation pipelines are operating on SQL, the one of the units of transformation is a table. Right? And Malloy works natively with tables because otherwise it wouldn't be interesting. And so you can write any stage of a transformation pipeline in Malloy. And for that job, if your transformation is at all complicated, it Malloy is a great language to write a stage of a transformation pipeline. But and and, there are lots of people who we have pull quotes from people who write SQL for a living in law. And a very common pull quote is if I could figure out how to never write SQL again and program in Malloy the rest of my life, I'd be the happiest guy ever. But the problem of saying that the table is the unit of transformation, is back to what I said at the beginning, that data experiences are interactive and ongoing. And so as soon as I make it a table, all of the context I use to create that table is gone. All I have is the end. And so if I then want to ask a different question of that table, which the context that produced the table could have answered, but the table can't answer. If all I have is the table, I can't answer the next question. And so in Michael's imaginary future world, transformation pipelines are centered around Malloy and not around SQL so that the context continues to to move down the chain. And so you could make decisions like, you know, what aggregation level do I wanna do different computations at? Because you you have access to all the computations that produce this answer, and all the connections, and you can make, efficient things about that. Today, because we do just read and write SQL, anytime you just wanna do a transformation one kind of SQL to another kind of SQL, we have a command line version of Malloy, and you could use it with DBT to do that. But the other thing that's happening with Malloy's and this is maybe a problem with our vision, is that it's not focused on a particular application, but on a a way of thinking about data.
Are the places where MOLO exists, we've gone wide but not deep. So there's a thing called Malloy CLI, which lets you, you know, run Malloy queries, from a Unix command prompt. But it's not really well integrated with DBT. And today, somebody really should just say, you know, if you wanna do DBT pipelines with Malloy, here's the plug in and you're done. We didn't do that. We said, let's spend a a day and a half. Well, maybe four days. I don't know. Some small period of time making a CLI version of Malloy and stick it up on the repository and and move on. And there's a lot of that in in Malloy where we can sort of demo. Here's what it would look like if Malloy sat in your universe, but there is no universe where Malloy is really sitting, where it's fully polished. The the most polished experience for Malloy is is the Versus code extension because that's where we sit.
[00:34:16] Tobias Macey:
I think another interesting aspect of that whole question is also where I think Malloy does have a lot of strengths is that question of the semantic layer where it has been a bolt on to things like DBT or a completely separate environment or a whole bunch of YAML with some SQL if you wanna have a metrics layer. And then another angle of the overall problem that DBT, for instance, is solving is the I'm hesitant to say lineage tracking because there are a lot of gaps in terms of dbt's out of the box capabilities, but that ability to understand the sequencing of transformations and managing their orchestration.
And then to add another layer on this, the the recent acquisition of SDF project that added a lot of very enriched type details where you could, for instance, flag a column as having PII of a certain type and then have that propagate through that orchestration graph so that at the end result table, five steps removed from the source system, you still have information about whether that is PII so that you can apply appropriate controls or various other type constraints, etcetera. And I'm just wondering how you're thinking about the potential for Malloy to be able to address any or all of those capabilities.
[00:35:34] Michael Toy:
Yeah. Okay. So that's a really big thing. And so I'm gonna I I'm a guarantee you I'm gonna forget what I'm doing on the way to this one. One of the interesting things about Malloy is that we can attach random metadata to things that you query sort of at the let's call it the table level. Malloy, it's called a source and and then a column level as well. And so there is a way sort of attach arbitrary metadata that you can attach, business logic to sort of any part of a query, and it sort of carries along through the query. And one of the and this is just our experience in the universe of doing Lookers. Like, we kind of know what capabilities you're going to need to build these these data applications on. And so, we're we're we're creating these mechanisms, but there's no, thing on top of the mechanism. There's just a mechanism going on. Pretty sure this mechanism is gonna scale to to answer a whole a whole bunch of problems. I've already lost this thread. Let's go right back to the question. There are a number of layers that I threw at you. So the first one, I think, was the managing of sequencing of transformations
[00:36:36] Tobias Macey:
so that being able to, for instance, use the source element in Malloy to understand that I have to wait until this previous stage has executed before I can actually have that source present to be able to execute the next stage. And then on top of that, being able to incorporate some of the rich type information and annotations of a particular column that doesn't exist within the data itself, so being able to attach
[00:37:03] Michael Toy:
richer metadata to be able to have that flow through the overall execution graph. Alright. And well, and the other thing was is about this this distinction now we have that we're trying to do where there's semantic modeling or or metrics definitions. So we're all sort of wrestling with the same problem, which which, again, I mentioned at the beginning is that data experiences are are are interactive and ongoing. And that the way that data is and the other interesting is that the is that the way that data is written into a database, right, is here are the pieces of data that I have. I need to put this on a disk right now because I have better things to do with my life. And I don't know what anyone's gonna wanna compute from this, but I know that this piece of data, like, you know, what time I did this and how many molecules that are in the air at that instant time needs to be written down and people will be able so there's the data in the day and then there's the information that we need, which is, you know, always some sort of computation and aggregation on those. And we would love to be able to think about data as the as the information that we care about, and not as the numbers that were used to produce it. And so this idea of a semantic model where I can think about data in the way that's meaningful to me as a, you know, my business or whatever my my activity is and not care that that's not the way it's stored. Right? I can pretend that it's stored like the the thing that I care about is a sort of a bolt on. It's like, oh, you have SQL and then you have this other thing. But in Malloy, that's that's integrated. And anytime that I think about, oh, you know what? I would really like to be able to think about as if this column existed, and it's actually a computation of these other things with this join, but let's just pretend it's a column. In Malloy, it as soon as you write that, it essentially is a column. It acts like a column. You don't know that it's not a column. Everyone in the universe can treat it treat it like a column. And so and the way that this interacts with the idea that data is iterative is that, in asking a question about this, like, I'd like to know this thing, and it's essentially a column. I have now made the data world richer because I answered that question. And now my answer to that question is part of the landscape that I'm standing on to answer the next question. And so it's not, you know, the table is over here and the answers are over here. It's as we ask questions, the table gets better. And so we have a slightly different, one of the things we said earlier early on in the dimension of Loy is that a semantic model without a semantic query language is not interesting, that the semantic model and the query language need to be integrated. And we don't really have all the insights about why we were saying that, but we were saying that a lot. And I think the thing that I'm saying now about data experiences being interactive and ongoing is is a better way to to encapsulate that. That felt like that's the answer I wanted to say.
[00:39:28] Tobias Macey:
I think circling back around to the construct of Malloy as a programming language and a language ecosystem, the other question then becomes that of packaging, where what is the unit of exchange for being able to say, I have a defined set of queries that will give you a useful view on this set of common data where maybe somebody has an export of their HubSpot or their Salesforce data. Here's the set of views and queries that will give you useful insights about that out of the box that you can then build on top of. How do I now take that, package it up, post it somewhere that somebody else can just download it and start using it on their own work?
[00:40:09] Michael Toy:
Okay. That's a great question, and I and I'm the right person to ask. So we're trying not to invent the universe where we don't need to. And so, we have a an import statement where you can say, hey. There's another collection of Malloy, and there's these objects in there that are interesting to me. Please them import them into this thing. So I lost a fight inside of the Malloy team about the word model because model means anything. If you, like, if you could read all the code from whatever you're interacting with on your your thing all the way down to the, you know, to the bits, the meaning of model would change six times by the time you got down to the bits. So everyone uses the word model, and it means nothing. Right? And so, one of the things I said is can we please not use the word model anywhere in Malloy? Let everyone else use the word model in any way that they want. But for the thing that we though they were calling model internally now, let's use a different word. I I lost.
So in Loy, a model is, represented by, one one file, say, and it's it's sort of like a JavaScript model in that it has objects that are defined and it can import objects from other models into its scope in order to do the modeling and computation that you have. And a model consists of basically, queries and sources, that that that answer a particular question. One of the interesting things about Malloy is is that there's this idea that, I could like, let's pretend I had this brilliant insight about how to do computations on, two way, two way markets with where buyers and sellers are selling to each other. And I just I had this brilliant idea about how to do that. You could define an abstract model that says, here's everything I need to know about two way markets. I just don't know what your users are and what the transaction is and what the value of the transactions. If you write those three things, right, then you can you can use my two way market things, and you can get graphs and stuff so that so that there's a there's a way in Malloy to sort of have sort of abstract business logic that's abstract from all business. Right? And you you can map it to that. And so there's this composition at the the the composition blocks are a a source, and a sort the simplest source is a is a table. It's a set of columns and a and a schema. And but, instantly in in Malloy, you you start saying, well, there's a a thing, which is if something is has no ship date and and and this and that, then there's a Boolean flag that I would like to imagine as a column because I'm gonna wanna filter on that Boolean flag. And so I'm gonna say, here's the Boolean expression for that flag, and that's now like a column in the database or, you know, here's the revenue computation or here's the value of this transaction, which is some secret thing. And so you extend sources and none of the this isn't a query. This is just saying, here's different computations that are interesting doing this or here's an interesting aggregation. And, also, the other thing that happens in a source is that in Malloy, the user view of data this is really important. I should have said this an hour ago. All data is hierarchical. Right? It's not stored in hierarchical form for for reasons.
And for any particular kind of data, sometimes it's better to store it normalized, and sometimes it's better to store to store it, you know, as a graph. And one of the problems with SQL is that those two ways of storing data change how you can interact with the data. And in Malloy, that's all data is hierarchical. You treat it like it's hierarchy. There you have dotted paths to things. And whether it's a join or a repeated set of records, it does matter in Malloy. You the user view of the data is a hierarchy. And so in a source, one of the things happening in the source is that you're explaining what the graph is that's interesting. You know, I wanna join this in and that join is a source, and it might also have other joins as all that happens. So there's sources, which end up being the suit these sort of rich descriptions of of how you'd like to think about the data. And then there's queries, which are, like, given that source, here's some interesting computation or here's some dashboards and and things. And then the other interesting thing about Malloy is that a query is a source because it it the output of a query is just a table, and and that can be the the place for a new piece of modeling. And so there's the the circularity. So when you're doing a a model, I hate that word, in Malloy, you start with some sources and then you write some queries. And maybe you're pulling some of your sources or just tables or maybe their sources that have been described in other models. That answer the question? I think so.
[00:44:14] Tobias Macey:
Yeah. And the other aspect of Malloy and particularly to your point of data being hierarchical and the restrictions of having these flat representations is that it has led to a proliferation of various opinions on how best to structure your data, particularly for analytical use cases where we have star schemas, snowflake schemas, data vault, anchor modeling, etcetera. And once you move to a workflow where you do have that hierarchical capability modeled into the language itself, how does that change the ways that teams think about the fundamental structure of their data at the storage layer?
[00:44:58] Michael Toy:
I'm really happy to say I'm the wrong person to answer that question. So I'm the I am so deep in, you know, how do I, parse this computer language, that we've invented into this other computer language that exists in the world? And and how do we make sure that the people who write this computer language, user facing things, have good user experiences, and how to make sure that the language, the code we generate is efficient and runs. I know everything about that. I I actually don't have a lot of conversations with people who who write Malloy. So I I I feel kind of lucky because I get to dodge the question, but I'm also, like, really honest. Like, I if I've answered that question, I'd just be making crap up.
[00:45:34] Tobias Macey:
Fair enough. Well, from your experience of building this tool chain, working with the people who you have been exposed to who are actually building with it, what are some of the most interesting or innovative or unexpected ways that you've seen it used?
[00:45:50] Michael Toy:
There are more than I can count on two hands, but, not enough people to fill an interesting, concert hall. People in the world who use Malloy for daily work to to find answers to questions that that that they don't know the answer to. They know that it's hidden somewhere in the data. And I am continually thrilled to like, I am just thrilled every time there's another person that pops up, says, oh my gosh. I could never have answered this question. I'm Malloy. Thank you for existing. But the most surprising thing, which was completely accidental, it just happened to be the right place at the right time, is because Loy is designed for humans to understand. It turns out to be a really good target or LMS to generate. Like, I want to make a really good, query experience for users, and I want them to interact in a natural language. And then I wanna generate something which actually generates queries. There are a number of efforts in the universe of people saying, Malloy is the correct semantic layer for us because it's a both a semantic layer and a query language. So once something is described in Malloy, it can also be queried in Malloy, and it's working really well in a number of places, which was a surprise to me because I didn't know what an AI was at all when we started Malloy, however many years ago it was, even though that's sort of the time that AIs were starting to become smart and useful. So that's probably the the biggest surprise is how useful it is in the AI world because it was never, designed for that, and it's just an artifact of the fact that because it's designed to work like a human brain and LLMs are some kind of mapping of the human brain, that it's really useful that.
[00:47:18] Tobias Macey:
You're a developer who wants to innovate. Instead, you're stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It's a flexible, unified platform that's built for developers by developers. MongoDB is asset compliant, enterprise ready with the capabilities you need to ship AI apps fast. That's why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at mongodb.com/build today. And in your experience of working on this project, building this language, exploring this problem space, what are some of the most interesting or unexpected or challenging lessons that you've learned personally?
[00:48:00] Michael Toy:
Oh, my gosh. So I I have written about this. I'm just gonna say some things that I've said before. There's a great thing and a horrible thing about about designing things, for people to use. And the good day, the great day when you're making tools is when you see people getting stuff done with the tool that you made and see them feeling like, oh, wow. I have got you know, I'm more productive or I'm exactly as productive, but I'm happier. Right? It's like, oh, I feel like I just made the world better, and and and I love that. But the other problem is that people don't want to learn new things. Like, some people do. Some people like the best thing in the world is, you know, yesterday, I solved this problem in language a day, and today, I'm gonna solve it in language b. And if I could solve it in language c tomorrow, I'd I'd be happier, and and I'll be sad when there are no more ways to solve this problem. Right? But other people, especially, like, the people that I really care about, which is, like, I'm trying to get stuff done and someone's paying me, and I wanna do that well. And I actually honor and respect the person who's paying me by not wasting a lot of time learning how to do this in 32 different languages, but being a really expert at doing it with the tools that I have. And so when I ask that person to learn a new thing, their first question is, can you please not ask me to learn a new thing but also help me? And so trying to figure out when something is important enough to say, hey. Please learn a new thing right now. It's gonna make your life better. And when to say, I'm just gonna do it your way.
So for in so there's lots of places where we've made that decision wrong or could theoretically made it wrong in terms of, like, we're gonna ask SQL people to say that thing that they always say one way and another way. Like oh, let's just as an example, let's just take the, case statement in SQL. There is not a case statement in Malloy. There's a thing called a pick statement, which I think if you search for Michael Toy in the universe, there's a I did a high tide boy presentation about Malloy, and I ended with waxing eloquent about how much I love the pick statement, because it's this beautiful thing that's kinda like a case statement where you get to, you know, say, when data looks like this, the the results should be like this. And I think it's just incredibly elegant and beautiful, and it's a disservice to all SQL people that I don't just have a case statement. Right? And it's because I'm so much in love with the PIC statement that I that it would feel like I would be desecrating my beautiful language to put the pick statement in put the case statement in. But, really, there should just be a case statement in Malloy. And so sometimes when I've made that decision, I feel like I've made the right decision. The way that we do null comparisons in Malloy, if you say x not equal null in SQL, x is null, the answer to that is null. And in Malloy, if you say x is not equal, no, the answer to that is false, right, if it's not null and true if it is null. And so we have a we have an opinion about Booleans, which is different than SQL, which mostly, people are appreciative of because they're having to work around the fact that SQL is has this weird thing with Booleans.
And so I'm okay with the weird decision we made about Booleans, which is different than SQL because there's the way to get the SQL behavior for the three people in the universe who want Booleans to be a three state value. Right? And then everyone else who just wants Booleans to be Booleans can treat them like they're Booleans, and it works. But it's really hard to know, and and you often have to do it wrong, before you do it right. Like, there's there's just no way to know except to put it in front of users and go, oh, that was wrong. I was really passionate about that. I really for me, if or my programming language, it would always work that way. But this isn't my programming language. This is a programming language for people who are engaged in data, and I need to listen to them when they tell me I was wrong.
[00:51:26] Tobias Macey:
And for people who are interested in not fighting with SQL all of the time, what are the cases where you would say that Malloy is the wrong choice?
[00:51:35] Michael Toy:
Yeah. Okay. So, if I had my marketing hat on, I'd say, oh, Malloy is always the right choice. But the reality is that right now, unless you are a full time data explorer or unless you're really excited about the possibility of law of Molloy and you want to help us refine the vision of what Molloy does and also help us to get there, it's the wrong choice. It's a super powerful language. It does a lot of things. It's wide but not deep, and it's full of possibility. But, again, if you're a full time data explorer and you have the ability to choose your own tools, then Malloy is a 100% the right choice. Right now, Lloyd is off, doing some consulting for another company, and they said, oh, you know, what are we doing? Help us understand what we do because he, you know, has started billion dollar companies. He's a very valuable thing. And the first thing he does is like, okay. Well, if I wanna answer any of those questions, first, I need to look at your data with Malloy. Right? And now I can start to answer real questions and produce a space where when you ask me the next question, I can answer that. So I hate that most of the things you would do with data, Malloy, is the wrong choice. But, the vision of Malloy is it would eventually be the right choice everywhere, except for the 10 people who are really good at writing optimized SQL and that you need those people somewhere. And, hopefully, two of those guys are working on the Malloy compiler.
[00:52:49] Tobias Macey:
And so given all of the potential, given the fact that it is, in your words, not the absolute right answer for everyone all the time yet, What are some of the things you have planned for the near to medium term or any particular projects or problem areas that you're excited to explore?
[00:53:07] Michael Toy:
Well, right now, there's an an active discussion about, dimensional filtering. It's it turns out and this is something we learned at Looker, and we haven't really folded it into Malloy because we didn't really understand what we learned at Looker. When you're doing a query, one of the interesting things about a query is what dimensional filtering you know, what filtering are you doing to to to reduce the amount of data? But and it turns out that that filter itself is a really interesting piece of metadata because, for example, when I join one thing, there might be a a an entity in that join that wants that would like to have that same filter applied to it. And if I change how I'm looking at the data as a whole, I would like that join to to to have that same change. And that's just one example of of a place where this idea of having enough data about your sort of low detail data about your your filtering, to to have that sort of be part of the modeling of your data. So then in this case, what you're modeling is a user's glance into data and and what in what ways that should change, how how the data looks. And so we're really close, I think, having nouns and verbs, to use my language, for for dimensional filtering and then surfacing that in the language so that, again, as users look at data, then they are modeling, hey. This is an interesting way to look at data, and we should remember that and make that available to the next person who who comes by. So that's that's that's one thing. Aggregation is super important. We've been working with, big companies. We were sitting inside Google as a sort of research project for a while, and then we were sitting inside Meta as a research project for a while. And so we had a chance to sort of measure Malloy against some of the biggest datasets on the planet. And, the way that people deal with giant sets of data is that they make, you know, successive kinds of of aggregations.
And there's an experiment in Malloy, called composite sources where if you have data that's aggregated at different levels, you can treat it as if it's not aggregated. And then based on the query, it'll select correct aggregation and run the query in the in the fastest way possible. And that experiment showed us what was brilliant and not brilliant about that idea. But because that's the way data sets, Malloy needs to be better at working in that universe, and and we have projects in mind for that. And then all the things that SQL does that you have to currently escape to SQL too that we don't handle very well. For example, like, databases have a, you know, a giant list of functions that you can call. And sometimes it's hard to call a function because Malloy needs to know the data types of everything. And so, sometimes it's a little ugly when you need to call a function that Moli doesn't want you to call. We would like to fix all those little places where people are escaping SQL to not escape to SQL.
The way that we do window functions is a vision for how window functions should work, but it's not complete. And so some of the things you'd like to do, and the way that custom aggregation is really important. We don't really support custom migration. I know. Well, we have a list, and it's long. And and we've been in data for a long time, and we understand they're important. We we feel like it's sorted well. Like, we know what order we wanna do things in, but we're not, you know, it would we're never gonna be done.
[00:56:07] Tobias Macey:
Well, for anybody who is interested in contributing, my understanding is that Malloy is an open source language. So absolutely, I I recommend helping to move this vision forward. I'm very excited for the tool chain that you're building. If there are any specific requests,
[00:56:23] Michael Toy:
feel free to throw them out there. Yeah. And I just wanna say the thing about open source is that we were open source, from day one. One of the things that's unfortunate about LookML is that it is a a secret. It's not a hard secret to clone. But, we feel like if if there's any kind of fear that this thing is proprietary, it would've just killed it. So even though it was developed inside Google, paid for Google, day one, we were open source because we just felt like that was the only way to go forward. If you're gonna ask people to trust something, there needs to be some reason for them to not fear. So it's been open source.
And up until recently, most of the contributors to the Malloy effort have been people who are on the Malloy team. We're getting better at being an open source community, but we're still learning how to do that. And, with with all apologies, please come and and and hang out with us. We're on GitHub and, and and Slack, and we'll, we'll do our best to to be helpful. And and, actually, you can show up on Slack and say, hey. How do I write this query in Malloy, this whole language? I can't be bothered to read the documentation, and and we'll we'll help you there too.
[00:57:25] Tobias Macey:
Are there any other aspects of the work that you're doing on Malloy or the overall vision for it or the ecosystem around it that we didn't discuss yet that you'd like to cover before we close out the show? No. My brain is a a dead sponge at this point. You suck me dry.
[00:57:39] Michael Toy:
Thanks.
[00:57:41] Tobias Macey:
Alright. Well, thank you very much for taking the time today to join me. Thank you for all of the great work that you're putting into Malloy. I'm very excited to see the project continue to grow. I've been monitoring it for a long time now, so I appreciate all the time and energy that you and the rest of the team are putting into that, and I hope you enjoy the rest of your day. Oh, thanks for letting me, ramble. This was fun. Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introductions and Michael Toys path from Looker to Malloy
Why Malloy: lessons from Looker and limits of SQL
Data work as iterative and open ended; SQL as assembly
Positioning Malloy among 22above SQL22 tools
Version control mindsets and the split between data and code
Designing a human readable query language for engineers
Targeting SQL as the runtime: TypeScript/TypeScript analogies
Ecosystem choices: TypeScript core, Python bindings, and Rust dreams
Tooling and workflows: VS Code, notebooks, pipelines, DBT
Semantic layer, lineage, and rich metadata propagation
Packaging and composition: models, sources, queries
Hierarchical data in Malloy vs. flat schemas
Real world usage and an unexpected AI fit
Hard lessons in language design and user adoption
When Malloy is not the right choice today
Roadmap: dimensional filtering, aggregations, and completeness
Open source, community, and how to get involved
Closing thoughts and sign off