Summary
In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds in data analytics and how their experiences have shaped their approach to leveraging AI for data analysis, emphasizing the potential of AI to democratize data insights and make sophisticated analysis accessible to companies of all sizes. They discuss the technical aspects of Orion, a multi-agent system designed to automate data analysis and provide actionable insights, highlighting the importance of integrating AI into existing workflows with accuracy and trustworthiness in mind. The conversation also explores how AI can free data analysts from routine tasks, enabling them to focus on strategic decision-making and stakeholder management, as they discuss the future of AI in data analytics and its transformative impact on businesses.
Announcements
Parting Question
In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds in data analytics and how their experiences have shaped their approach to leveraging AI for data analysis, emphasizing the potential of AI to democratize data insights and make sophisticated analysis accessible to companies of all sizes. They discuss the technical aspects of Orion, a multi-agent system designed to automate data analysis and provide actionable insights, highlighting the importance of integrating AI into existing workflows with accuracy and trustworthiness in mind. The conversation also explores how AI can free data analysts from routine tasks, enabling them to focus on strategic decision-making and stakeholder management, as they discuss the future of AI in data analytics and its transformative impact on businesses.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
- Your host is Tobias Macey and today I'm interviewing Lucas Thelosen and Drew Gilson about the engineering and impact of building an autonomous data analyst
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Orion is and the story behind it?
- How do you envision the role of an agentic analyst in an organizational context?
- There have been several attempts at building LLM-powered data analysis, many of which are essentially a text-to-SQL interface. How have the capabilities and architectural patterns grown in the past ~2 years to enable a more capable system?
- One of the key success factors for a data analyst is their ability to translate business questions into technical representations. How can an autonomous AI-powered system understand the complex nuance of the business to build effective analyses?
- Many agentic approaches to analytics require a substantial investment in data architecture, documentation, and semantic models to be effective. What are the gradations of effectiveness for autonomous analytics for companies who are at different points on their journey to technical maturity?
- Beyond raw capability, there is also a significant need to invest in user experience design for an agentic analyst to be useful. What are the key interaction patterns that you have found to be helpful as you have developed your system?
- How does the introduction of a system like Orion shift the workload for data teams?
- Can you describe the overall system design and technical architecture of Orion?
- How has that changed as you gained further experience and understanding of the problem space?
- What are the most interesting, innovative, or unexpected ways that you have seen Orion used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Orion?
- When is Orion/agentic analytics the wrong choice?
- What do you have planned for the future of Orion?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- Orion
- Looker
- Gravity
- VBA == Visual Basic for Applications
- Text-To-SQL
- One-shot
- LookML
- Data Grain
- LLM As A Judge
- Google Large Time Series Model
[00:00:11]
Tobias Macey:
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to a factor of six while guaranteeing accuracy? DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to DBT, or handling complex multisystem migrations, they deliver production ready code with a guaranteed time line and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they turn months long migration nightmares into week long success stories. Your host is Tobias Macey, and today I'm interviewing Lucas Thelosen and Drew Gilson about the engineering and impact of building an autonomous data analyst. So, Lucas, can you start by introducing yourself?
[00:01:08] Lucas Thelosen:
Yeah. Absolutely. I'll I live here in Boulder, Colorado, dad of four daughters, and have been in data analytics my whole career. So I've been analyst for a few different companies, before I started building my first AI in 2010 during the mortgage crisis and, realized that I have to build a central central foundation where all the models can run off and the different dashboards reporting use cases. So I came across a company called Looka, back in 2013 and, got that into everything in the company that I was running at at the time. So that then led me to build my first consulting business, and, I got to set the same up for Uber and and Walmart and Amazon, actually, which got a little awkward when we got acquired by Google. But, some good exposure there across across The US and and Europe. And then, and then, yeah, I got to run product for Google. That was my most recent gig here for data and AI before we started Gravity.
[00:02:08] Tobias Macey:
And, Drew, how about yourself?
[00:02:10] Drew Gilson:
Sure. So Lucas and I met quite a long time ago around the time that Lucas discovered Looker. I also discovered the product. So I live in Calgary. I'm Canadian, and I had been operating an ecommerce business, so in the outdoor category. So we shipped backpacks and tents and sleeping bags all over Canada. And I started a warehouse out of a garage and then put it into a much larger fulfillment center and then, built all the software and the systems to make that business work, and we were generating data like crazy. And I wish I could remember how I discovered Looker. It's kind of the part of the story that I can never tell properly. But one day, I came across this tiny little startup out of Santa Cruz, and it, allowed me to do the work of several data analysts, just all, you know, by myself with my small team. And so I loved it, and I ended up down in Santa Cruz for their first user conference, which was '22 people sitting around on beanbag chairs in an unfinished floor. And one of those, people was Lucas. And so the founder of Looker organized that, and he pulled us together. And he just said, you know, what do you what do you want the product to do next? You know? What do you what do you love about it? What do you not like about it? And so Lucas and I hit it off at that time. And then few years later, after we had wrapped that business up, I ended up joining Looker a little bit later than Lucas. But so we've worked closely together now for over ten years. I joined Google as well through the acquisition.
I've had the privilege of staying in Canada where, I love to hike and ski and climb. And so I haven't, actually lived in the Valley, but I've worked closely at, Looker and at Google where I took a slightly different path than Lucas. I ended up, spending a year in the cloud AI group at Google where I was working on document extraction. So LLM power document extraction, we call document AI. And so in '22, I really started to see the power of the large models, and we started to think, holy smokes. This is gonna be huge at about the same time chat GPT came out. And so, Lucas and I made plans to leave to capitalize on that, and so we left Google in April '20, '24.
And, now we're about, almost, you know, fifteen twelve, fifteen months into this idea at Gravity. We were building Orion.
[00:04:26] Tobias Macey:
And in terms of your background and I guess just wondering how you both got into data and what it is about the overall data ecosystem that has captured your interest for so long.
[00:04:38] Lucas Thelosen:
Yeah. That's a good question. So I'm German, and I love efficiency. And I and I really I you know, it's it's funny. I didn't quite realize this. Like, there's a lot of stereotypes Americans have about Germans. Right? And I didn't know that. Nobody told me those stereotypes when I came to The US. You know, I came here to learn English. And then when I got my first job, you know, I was, like, 22, and I was hired as an analyst, which is, like, you know, for many people just like me at that time, just a like a a random title for doing a lot of different things. And I I was I was hired as an analyst, and I just couldn't hold my my how do you say it? I couldn't hold my tongue. I couldn't I couldn't stop saying. So I was like, guys, this is an incredibly inefficient company. You guys make a lot of mistakes all over the place, and it's so vulnerable to things being messed up in these thousands of spreadsheets. And here I was, like, I was 22. I was, like, ten years younger than the next senior employee. And they're like, well, if if you if you think you know better, then why don't you suggest something? And I didn't. I also didn't pick up on sarcasm.
So I I I literally I wrote a proposal. I I went and researched, and, like, a week later, I presented my proposal for a cloud migration in 2007 to the to the CEO. And he was like, well, okay. Well, I mean, what what do we got to lose here? So why don't you just go for it? So that's how I found out about databases and and and Python and, you know, like, just yeah. SQL. I learned SQL. I taught myself SQL. So it was a wonderful, you know, experience, and, that just happened to be the thing I did over and over. So the next company was like, hey. We heard you did this thing over there. Could you do this with us too? And then that that was the next thing. And then I taught myself, actually, VBA. Like, I don't know if anyone still knows what that is, but, like, I I wrote something in VBA that did stuff in Excel and all these fun things. Yeah. So I was not by training. I was actually a FP and A guy. Like, I had to you know, that's what I learned in school, how to do forecasting and and financial analysis.
I am super passionate about it, though, still. Like, deep down, every like, I think the world would be, like, a lot fairer place if people were more data driven and looked at the data more versus just, like, you know, whoever has the loudest voice in the room and the best relationship. And, you know, most people do wanna make decisions based on their what their buddy thinks, and, and and sometimes we just ignore the facts or we just ask the analysis to be tweaked so it fits the narrative that we already want. And so I always brought you know, I I I really leaned into that German stereotype. I was like, hey. But this is what the numbers say, and maybe we should do that.
[00:07:10] Drew Gilson:
So for me, I one of my first jobs, this would have been that about 1998 or 1999, was data entry for, Red Cross. And so I had to type in a whole bunch of address records, and, many other people were also doing this. And and, of course, the deduplication of that data entry was a significant and ongoing problem. And that was my first exposure to SQL. So I similar to Lucas, we're both very applied. Right? I think at many points in our career, we've both sort of thought, this is silly. This takes too long. There's gotta be a better way. What are the tools available to solve this problem? And so that's led me kinda deeper and deeper into data and then eventually AI in my career. So so, yeah, I had come up with a way to just do a part of that job much more efficiently when I was a kid more or less. And that, through a somewhat long and circuitous journey led me to work at an advertising agency in the early two thousands. And I remember the first, like, paper that I ever authored kinda similar to Lucas. Like, you tryna really, at the end of the day, you're younger, you're trying to you think you know what it is that, the organization you're working for needs to do. You scribble some manifesto. You try to convince your, you know, your boss or your boss's boss. And in my case, it was like, I just I'll never forget it. It was like this meme back even before we called the memes of this, World War one fighter pilot, like, flying his plane with a blindfold on. And I said, operating this business without business intelligence is like flying blind. And I had written this, you know, this this long document, which, unfortunately, I lost. But, at some point, you know, maybe maybe it'll turn up, and it probably won't be nearly as, as good as I remember it being. But I just remember thinking there's so much that we know about how this agency operates in terms of resourcing and forecasting and the, you know, in essentially, like, timing of invoices and payables and all this stuff, and nobody's looking at it. Like, surely, if we put this on a dashboard and we got people to make decisions based on our resource load and based on the average profitability of different types of engagements that we do, we could make this way more successful. And so, you know, slowly and surely, we actually we actually did, and it was, very early to do that sort of thing. We were this was, like, on, like it was on, like, crystal reports and, web trends. Like, you know, long, long ago, there were all sorts of different, ways that you could get data to, you know, to to to run your business. But, anyways, then I ended up consulting on my own often in the data realm and then, ended up operating this ecommerce company that I had read that I had mentioned. But, data's just kind of been like this through line. Like, I've always gone back to what kind of data exhaust are we creating, how can we use that to create this iterative process of ongoing improvement, whether in, like, the group that I had been working for or perhaps the entire business or my client.
And, when the organization or the client is receptive to that kind of thinking, you can do some really, really awesome stuff, and it doesn't really have to be that complicated necessarily. A lot of the time, it's simply just looking at some KPIs and and doing that regularly and then making better decisions as a result as opposed to some of the really fancy stuff that we've started doing over the last few years, which is maybe hard to understand, more predictive, and and prescriptive, which I'm sure we can get into. But often, there's a ton of value just in the fundamentals. And I think, what I'm trying to connect to here is that what we are doing with our product has a lot more to do with the fundamentals than you would think. It's a lot more, let's just do the simple stuff consistently and remind you to do it than do the really fancy stuff that you might not necessarily understand or believe or trust enough to go action. So we can get into that, I'm sure, throughout the conversation.
[00:10:53] Tobias Macey:
And so now digging into what you're building with Orion and the overall focus there, obviously, business analytics, data analytics is something that is well understood. It's been part of the core requirements for any business operating efficiently and at scale to your points for decades now. So why isn't it a solved problem? What is it that can is still needs to be discovered and improved upon given the fact that so many companies have put so much time and money into it?
[00:11:26] Lucas Thelosen:
Yeah. That's so before before Drew and Iwa and their respective product teams at Google, we actually ran the consulting arm. So we were in the consulting arm of Looker and then subsequently, at Google Cloud. And the thing that we saw there over and over is that the vast majority of companies are under resourced when it comes to analytics. Data engineering, data analysis, they don't have the manpower to really support all that the business could do with the data that they have. And that is quite frustrating. Right? Like, you I have seen a couple a couple companies that are that are well resourced and or at least reasonably resourced, and they find really interesting things. Usually, they are they are in hypercompetitive markets where any kind of marginal improvement really matters a lot to to get the next sort of funding or to be like the the IPO one. You know? I was really close to a lot of the food delivery businesses in 2017, for example. Hypercompetitive.
A bunch of money had been poured in. They all had big data teams. They all were trying to figure out how can we be the best food delivery businesses. But the vast majority, let's say 95% of companies, do not resource their teams sufficiently, and so that means there's bandwidth constraints. And then, there is a a huge disconnect between the questions people in the business have and the people that understand the data. So there are these people that understand the data. They know what data is available. They know they they might not necessarily know all the way it could be used. And on the other hand, you have all these people that have a great understanding of the business that they're in and the specific function they're serving, but they don't know how to connect it to the data. And so ideally, right, and I always suggested this as in the consulting side, like, invite your analysts into your team meetings. Have them sit there with you. Right? Have them be part of the conversation so they can make that connection for you. And, unfortunately, you know, that that doesn't necessarily happen because the analyst bandwidth is is not there or, you know, they forgot to invite that person again. I don't know. All these different reasons. So what what we thought we can do now here is actually bridge the gap and actually give everybody a one to one analyst that is right there next to you and actually go even much further than waiting for you to have the question. But think of all the questions that you probably wanna ask. So if you are the head of marketing, if you are the head of procurement, right, or in any of these functions in a company, we can actually prepopulate what are the 100 questions in this job function that someone probably wants to ask. And with some additional context towards your company, we can make that even more specific using AI technology to say, you know, before you even have to think about the questions you wanna ask, we will ask them for you. Orion asks them for you. It thinks about them for you. It becomes aware of of the data available that the data engineering team has put together, the semantic layers there, the data dictionary is set up. Orion can read that and then connect it to all the different business functions. So if we can bridge that last mile and actually explain the actions that should be taken to the to the actual business user, right, we can close that last mile that was that has been the gap there. Like, when I look at, you know, the usage data in in Lookout, I can't talk about the specific numbers, but the re the customers had come back are very low. There are a couple power users in there that use it all the time, and this is not unique, right, to just Looker. It's any BI tool out there. You put out the dashboards. You put these nice, you you know, pivot tables together.
A couple people come back. The vast majority of the business does not. So why don't I bring it to them and explain it to them in the language that they like to speak and actually go as far as, you know what? I went through multiple root cause analysis. I created a couple cohorts, did some predictions, and here is what you probably wanna do on this Monday morning instead of having them to go to the tool, figure out how to use it because I haven't been there for a week, you know, all these different things. So that that was our thesis, and that's what we have built here with Orion.
[00:15:21] Tobias Macey:
And that points back to a conversation that I came across, I don't even know how many years ago at this point, but definitely on the order of at least four or five years ago of people making that same observation that having dashboards is all well and good, but it's ultimately useless because even though you might see, oh, this data is very interesting. It's telling me something about the business, but then it's okay. Well, now what? Now I have to figure out how to actually turn that into some sort of effective action or how am I supposed to actually take action from what I'm seeing because now I need to be able to replicate that context, which is a problem that we're seeing all over again in the case of AI where you might be able to actually see something and understand something, but now how do you actually feed that back into a system that can do something about it and starting to look at this more active analytics and being able to actually turn that into outcomes rather than just retroactive analysis.
[00:16:17] Lucas Thelosen:
Yeah. I think the eye opening moment was when we, we were engaged with this multibillion dollar, retail company, and they had already invested a lot in their data stack. And and yet we came in. We had time because we were the external consultants. You know, we had no meetings on our calendar, and we're able to go through the data and just realize, you know what? Over here, you're just throwing a lot of fresh produce away. And it's because it's in the wrong location at the wrong time, so we use weather patterns. We use, like, you know, search signals, all these different data points, and we're able to predict where the produce should and after a, you know, after a couple weeks of work, tell them exactly how to change their logistics so the produce was better allocated in essence. And it ended up saving that specific company $10,000,000 a week in fresh produce waste. And and that's just like and, you know, it's a massive scale. It's not applicable to everybody, but you can scale it then down and say, okay. Well, what if everybody had three analysts with a fresh perspective that can come in, take a look, and ask the questions that maybe that the entrenched team, you know, hasn't been asking because they didn't have the bandwidth. They didn't have the, you know, the time to go through all this and find these things that are maybe small, but they do add up.
[00:17:32] Drew Gilson:
And it's not just the bandwidth, though. Like, one of the things that made that possible was the consideration and inclusion of third party data. So from a data engineering perspective, actually, I think this is, you know, quite relevant to the audience going forward in this new world. Internal teams tend to focus on internal data. Now you have some special parts of the world where particularly in finance where it's all about seeking alpha and you find that edge with third party data or proprietary data. But in most industries, most of the time, like an under resourced analysis is not going to include third party data. And so once we have these large models that are aware of the entire internet and the data that can be obtained in both structured and unstructured form, you can start to layer that in a lot more quickly and easily. And I think that has, like, profound implications for what we do as, data engineers and analytics engineers because there's sometimes there's only so much signal in internal data. And I would argue that, like, in a lot of cases, particularly in, like, the SMB and mid mid market, space, that there could conceivably be more signal external to your business than inside. And and that's missed by a lot of data engineers out there. If we spent as much time figuring out how to obtain those sources which are of value to the business and, making them possible to, you know, maybe in the simplest, just be compared against for benchmarking. Right? It's just a simple example. You can make a big difference, but not a, you know, not a whole lot of people think that way in this in this industry. There's a lot of time that's spent modeling an internal data source. And then at the end of the day, again, particularly on the smaller end, there's only so much juice from the lemon that you can squeeze.
[00:19:18] Lucas Thelosen:
There's also like, now we can we don't have to join them on a join key. Right? We don't have to join the datasets on any kind of specific like, we can actually have one agent talk about what they found on your data and one agent talk about what they found on the weather data, on the demographic data, on the search trends data, right, on the news data. And it doesn't need to be joined. It doesn't I mean, they talk about, like, hey. On Thursday, this is what I saw. Is, anything that could be outside of our own data related to this, The weather, one might say, well, you know, in Florida, there was a certain event. Was this isolated to Florida? Well, let's check. You know? So they can have a conversation without having to do a specific join like we used to have.
[00:19:57] Tobias Macey:
And digging now into the capability of applying these models and agentic workflows to the problem of business analysis, this is something that has been sought after for a long time now. And with the introduction of these large language models, some of the early iterations were largely focused around, let's talk to your data by doing text to SQL where you say, oh, this is what I want to do. And then the model will generate a SQL query, execute it, and maybe perform some summarization on the response. But one of the requirements that that typically has is that you need to have already invested in having a very clean and scrutable data model and maybe some additional documentation and semantic metadata about it for the model to be able to actually figure out what query to write given the input. And I'm wondering how you've seen the overall architectural patterns and domain understanding of how to actually effectively do that talk to your data use case that has evolved over the past couple of years since that first round of text to SQL started to become en vogue.
[00:21:00] Lucas Thelosen:
Can I let Drew jump here in a second? I, I I I think I think text to SQL misses a really important point. Well, not not an important like, it's a tool for the right user. Right? Like, similar like a dashboard or a good spreadsheet is the right tool for some people. If you know the questions to ask, write Text to SQL if you know your data, you know the question to ask, you know how to prompt it, text to SQL can be a good tool. I do and then a second point there on on what you were asking that I do think a a good foundational data model is is great. I think it's a really good investment to make as a company, structure data, explain your data in some way. Right? Any AI use case will be much more powerful if you put that in place. There are some things we can do beyond that, and that's why I'm gonna hand it over to Drew.
[00:21:45] Drew Gilson:
Yes. Sure. So, I mean, this has evolved, like, very rapidly over the last even six months. Right? So if you wanna do one shot text to SQL prediction, like, if that so that's not necessarily part of the product that we're building. We believe very strongly in the value of the the semantic model. So we're always looking for all the context that we can get to help make sure that we shape your question. And so we can get that context from, like, DBT model files. We can get it from, LookML if you happen to use Looker or perhaps metadata from, like, a a Power BI dataset. Like, there's a bunch of places you can get it. But no matter what, if you get that context and try to do single shot prediction on it, you're probably going to lose fidelity at some point. And so there's a lot of alternatives now. So because tool calling has gotten quite a bit better, what we do is, at the high level anyway, is decompose the problem into its constituent parts. Right? So we will, first of all, determine, well, what is the forget the SQL. What is the data that we need to solve this problem? At what grain? And what are the conditions that we need to set in order to limit or slice the data in such a way that it will be valuable to answer the question that we think needs to be answered. And so all of those steps happen independently.
And at each step, we validate and we verify and we check. So for instance, if we're predicting which fields we need after we get the prediction back, and that might be a tool call, we'll make sure they exist. Now I'm giving a very simple example just so that it's understandable. But in a naive text to SQL approach, you know, you might predict a query that has a hook who knows, it might have a variety of issues, including fields that just simply don't exist. And that was what people were struggling with a few years ago. And and, of course, that left a pretty bad taste if, you know, you had these large models just predicting queries that didn't have any fidelity and in some cases just didn't even seem to understand the underlying schema. Right? But if you can decompose the problem into its constituent parts and then validate and verify at each step, you can get much better, outputs. You can essentially ensure that that you're you're correct at every step. Now it's not cheap and it's not fast, and I expect both of those things to change over time. But it's quite a bit, it's it's quite a bit better than just taking in a natural language question and then just doing essentially what is it, like a transformation or mutation into SQL. I do wanna just, like, jump back up though and emphasize something that Lucas said, which is for us, even if our product was unable to find the data on its own and you had to sort of feed it a little bit of of help, whether that's your complicated SQL query. Right? Like, maybe you have a notebook where you've been doing this analysis for five years.
It takes time. You know exactly how to do it. There's still a ton of value past that point. And so I would encourage other people who are doing this to kinda get out of the weeds of concerns about SQL generation accuracy and go, what is the downstream value of automating through AI all of the subsequent things that can happen? Which is like, to give a specific example, if you know the report that you're supposed to be looking at, but you're a human and you probably don't look at it every single morning with your cup of coffee and scrutinize it in all of its detail, and I can assure you many people don't, the model will happily do that. Right? So that there's benefit to the model simply looking at the results of a known good query, and not predicting the query in the first place. And so a lot of our product actually serves, like, those kind of use cases, which is like you're a human. As much as you say you wanna be data driven and you wake up to data every day and make decisions with data, probably, if we had something that could look at the results of your known good queries that you've had in that notebook and then tell you when there's something that needs to be acted upon or something that's particularly interesting, you know, that might be might be valuable. Right? And I think there's a lot of use cases and a lot of teams out there that might benefit. And so I think the point that I'm making is if you're still sort of worried about the accuracy of of text to SQL, I think that if you can get to high nineties accuracy for your use case or even to a 100% because you've kinda got it on rails, there's a tremendous amount of value that you can unlock with large models sort of past that point. So I don't think it should be an excuse to not be experimenting with large models and data engineering and data analytics at that at this time.
[00:26:13] Tobias Macey:
The other angle of the problem of data analytics in particular is to your earlier point that the analyst needs to have a lot of domain context and understanding of the business and its operations and its requirements to be able to build the appropriate answers and reports to the insights that the business needs to be able to actually improve on their efficiencies, and that requires a lot of nuance. There is a lot of missing documentation. There's a lot of lack of digital representation of that knowledge, and I'm wondering how you think about bridging that divide in terms of these agentic use cases of you throw a model into a sea of data and say, this is what I want to know. It might be able to give you an answer, but how do you know that it's actually the right answer or the answer that the person is actually looking for to be able to make a decision that it needs and just some of those gaps of how to manage that contextual representation and that domain representation of the business that you're operating within.
[00:27:18] Lucas Thelosen:
There are some when we get started with a new customer, right, like Orion. One of the first things we do is take a look at or Orion does, is take a look at the metadata and understand, you know, what are the most common date fields, what are the most common measures and dimensions. So it gets an understanding of what the business usually looks at. Right? It can look visually. You it can look at the dashboards. Right? Those kind of things, reports that are being used. Some organizations have onboarding materials. They're usually incomplete, but we can feed it actually into Orion as well. And then we we think of Orion like onboarding a new data analyst. But, you know, in the first three hours, it can already consume all available information, so it's already really fast on that front. We actually have a customer right now, who asked us if Orion can onboard the next analyst because it it now has such a, you know, thorough understanding of the business and, you know, can explain it with a lot of patience. I I didn't think about that, that that Orion could be used to onboard a new employee, which is really fun to see. There are some really interesting things that can be done now, beyond that. Right? Like, where a lot of a lot of meetings are recorded now. They're transcribed. Right? There's meeting notes attached to it. So you actually don't necessarily have to invite the analyst anymore to the meeting. Right? You can just share the the transcripts.
So there are other ways that you can then also add Orion to have that understanding of what's going on there. And then the l m, and then we added to it a a a vast library of of of understanding of how to do data analysis. Right? While every company is unique, and I also think there are a lot of patterns that are similar. Right? If you're running a sales team, we actually have a decent understanding with you know, I've been in hundreds of engagements, worked with customers on on analysis of inventory management, sales, marketing, right, all these different cost optimization, whatever it might be. And there are patterns that we can that that we put into Orion that can be repeated. You know? Like, I don't wanna I don't wanna neglect that every company is unique and different, but there's also a lot of, like, best practices on this probably what you wanna look at that we added to Orion and then Orion can suggest. So if we take the the understanding of the metadata of, you know, query history, all all of that, Add to it our library of this is how to do data analysis really well for team x. We add to it in Orion, actually. This is what the company is. This is what it does. You can add to it goals, write any additional context information you would like. You build quite a powerful data analyst, in some ways, even more informed so than maybe some of the human analysts, right, that it takes a year or it takes six months to really get that institutional knowledge, and then it's really hard to move it to the next, you know, the new hire in essence. And that's, I guess, why this company asked us if they can use Orion to onboard the new employee because it's finally right? We have a vector date we have a database behind it. Right? Orion stores some of the information so it can retrieve it and have actually a a meta analysis too. Yeah. So there's a lot of things we can do to get Orion into the room, so to speak.
[00:30:13] Drew Gilson:
I wanna go back to a comment I made about just the power and value of the simple things. So Orion can, for instance, look at your calendar. And if you are, say, a customer success manager and you have a few meetings throughout the week, after you've done that OAuth integration and it's able to see your calendar through the reasoning process that, you know, all large models have and, that we've kind of tuned to, you know, for this particular use case. We can pick off a meeting. We can go, looks like you're meeting with an important customer on Thursday morning. Here are the basic facts that you need to take into that meeting. And those facts, if you have a data engineering team that's doing a great job, should be in a flat fact table where there is absolutely no ambiguity or complexity to pull them out. You know, we're not talking about, like, there's some super complex 50 line, you know, SQL query with window functions and aggregations and lag and all that. I mean, literally, it's like, go get the churn and the number of seats and the price and the contract renewal date and all of those things and get it into the Slack messages of that individual so that they have it at hand in that meeting. And that's what data can do. And, unfortunately, I think in a lot of cases, we just still haven't gotten there. Right? Like, you don't have unless you have a very sophisticated data culture with truly self serve analytics and the change management in place to make sure that every CSM on the team is, like, drilled to go get that for themselves, It's kinda rare that somebody goes into that meeting that prepped.
But if you have this thing now that kinda just watches your calendar, pulls from the fact table, and maybe even drops some bullet points on a slide, you suddenly can say, like, yeah, we're data driven. Like, we bring data to every customer meeting. And and so I I just think that's really cool. And so I think there's so much opportunity to do that as opposed to maybe taking, oh, I don't know, 28 different, like, really complex features of your customers and then trying to figure out, well, what is the, you know, the through line? What are the missing thing? Where can we go acquire more customers who are gonna have a, you know, greater, you know, LTV than others? And all and all and I'm not saying that's not valuable. I mean, that's extremely valuable, but there's but that's there's a long tail of, like, basic data culture ability and just the way that you kinda operate that can still be turned on across, like, the the whole world in in in our industry. And so we really hope to help, companies do that simple use case. And there's just a couple other examples that I think we can give before we get into, any you know, when people put data and AI together, I think they kinda go, like, right to the far end of, like, the analytical maturity curve. And I think that stuff's really exciting, but it's actually not the biggest opportunity.
[00:33:04] Lucas Thelosen:
I mean, that's one of the things that gets me really passionate too is, like, you know, we worked at Google, obviously. We worked with Amazon. We worked at Walmart. Right? Like, we got to work with companies that had unlimited amounts of money pretty much to spend on data analysis. And now you can get sophistication, right, but also just the the lowest hanging fruits of data analysis to pretty much any company. Like, you know, people get worried about AI and AI taking away jobs. Right? But, like, Amazon, you know, Walmart, Google, they had the resources to have these kind of insights and to drive their business to be incredibly profitable and make billions of dollars. And the but most companies do not. And so now we have the opportunity to bring data analysis to every company. Right? Like, pretty much every company and that and and and level the playing field. Like, it's no longer reserved just to Walmart and Amazon. Right? There's so many retailers that were squeezed out. There are so many tech companies that got squeezed out because they didn't have the resources. And I think that's an often overlooked, right now, at least, you know, effect of of what AI can bring us, where we put our, you know, our our heads here together, like, guys from Google, guys from McKinsey, right, like, of what does really great analysis look like, put it into Orion, and can make it accessible at scale. And it gets me really excited.
[00:34:21] Tobias Macey:
Your point of being able to integrate with calendars, being able to integrate with Slack and send some bullet points or generate slides and integrate with maybe your Google Workspace to create a presentation introduces the question of the user interface to analytics where, for a long time, that was the business intelligence dashboard where you have to take the initiative to go look at a pane of glass and try to figure out what it's telling you, or maybe it was the job of somebody on your analytics team to prepare those reports for you. But more often than not, they're too busy just trying to keep up that dashboard. And I'm curious how you think about the user interface to analytics and some of the ways that you think about the integration of that insight into the day to day workflow of the people who need that information to be able to execute on their jobs and how these agentic capabilities maybe change the paradigm around what analytics is actually for.
[00:35:22] Lucas Thelosen:
So I I was I'm very passionate about like, I know I just said I'm passionate about this other thing, but I'm also passionate about this one. Can we go to where people are, you know, instead of forcing them to come to our proprietary user interface? Because I think, like, we all have, you know, 10 tools to log in to already. So as much as I can get to where you already are, and if that is in Slack, if that is in in your email system, you know, Gmail or Outlook, wherever you might be. If we can get to, right now, I mean, chat interface is, of course, super popular to to interact with AIs because it is conversational in its nature. But the closer we can get to where people already are, so it becomes second nature to use it, I I think that's a really good one. The the problem I'm facing, and, you know, I enjoy the conversation here on this one, is, like, there's an incredible compute power behind what we built. But if you put it into a chat interface, it seems just like another chatbot, and you're really devaluating all that is happening behind it. I mean, massive compute clusters are going through. A database are being pulled. Right? All this stuff is happening, and all of this is a little blurb in your chat. Right? It doesn't quite sell the value as much as, like, you know, I one of our coworkers, he was at McKinsey, and he was like, this is you know, we would have charged $300,000 for this insight. And now it's just an email in your inbox that might be ignored. Right? Like, how do we how do we get that right? That we convey the value of what you just got here. Right? We we're working with this, ecommerce company and and found this really great insight of markets that they hadn't explored yet that matches their ideal customer profile. Worth a lot of money. And it's just yet, you know, because it came in an email, they actually didn't quite read it. Right? Like, so we really gotta nail that experience. How can we make it as as valuable as that, you know, session you booked with McKinsey that you spent thousands of dollars on?
[00:37:13] Drew Gilson:
Yeah. These are hard problems. The user experience of working with an AI agent, I think, is a fascinating, UX design problem. We, you know, we have so much value that we can drop, like, into your, again, like your DMs. But, again, there's just so much noise in the workplace, and you might not realize just how much went into making sure that that was vetted and and accurate that it does become, you know, it does become a challenge. So we're figuring that out with along with everybody else. But I really do think, like, it's better than the prior attempt at this, which is super complex dashboards with a lot of filters that only a small subset of people used. And I think in almost, you know, without exception, it's better to just hit somebody with a message that says, our conversion rate dropped 2% last week. You know, seems to be due to different traffic from email. Maybe because of the holiday weekend, we should probably reengage these user users with an offer. Like you know? And that maybe goes to the person in marketing, like, proactively. Right? Now you coulda gone, you coulda clicked, and you coulda sliced, and you coulda thought, okay. Channel by channel. There seems to be a little dip here. Okay. Monday was a holiday. Maybe we should look at that segment. We should look at open rates. We should do this. We should do that, the other thing. But, nobody does. You know? And so I think getting to the place where we've got trust in something that is autonomously doing that on our behalf, surfacing recommendations, which are hopefully ranked in some sort of order such that we're not completely overwhelming you because that's another, you know, side of this coin. If we continually send you stuff to do and you just lose, you know, the ability to frankly just act on it because it's just too much and you're not sure where the, you know, signal is and the noise, that would also be bad. But I do think that if we hit you with one actionable insight a week that we could have a profound effect on, like, your success at work. Right? And that's without you logging into an application and trying to figure out how to use it and, you know, trying to understand whether you're doing it right and all that stuff. And there's opportunities everywhere. Right? You know, there are so many people who are not that close to the data teams today that through the work of data engineers and analytics engineers, we could really empower. And I think that's, really exciting. We've said it a couple times now, but, you know, Lucas and I have, like, a ton of empathy for that because we're both field guys. Like, we're pro serve to the bone. You know, we're not in the necessarily in the the server room or, like, deep in the guts of the, the data warehouse. We're we've been out with people trying to help them use data in their job every day for the last fifteen, twenty years. And so, that's what we wanna do with AI. I think it's, like, super cool.
[00:39:50] Tobias Macey:
You touched briefly on another important aspect of this challenge, particularly when you're bringing AI into the picture is accuracy and trustworthiness where you're not necessarily guaranteed that when you have a human doing the same work, but you have a higher expectation that they are going to have done that work since it's part of their job description. And I'm curious how you think about the validation and grounding of the AI workflow to make sure that you don't have any erroneous analyses or that you have some screwed ability as to how the agent reached a particular conclusion.
[00:40:30] Drew Gilson:
Yeah. Yeah. So the answer to that is a lot of you know, we spend a lot of money on compute. Right? So there's a bunch of ways to make it more likely that you're gonna drive accuracy as high as possible. It's brute force, but it does work with these probabilistic systems is simply doing things multiple times. Now that's not the only thing we do. There's a lot of other checks that we'll go through to ensure that the information that we're delivering is correct. But simply running the same kind of analysis a few times in parallel and then looking for consensus will get you pretty far in terms of ironing out, like, potentially strange anomalies introduced by the probabilistic large model in the first place because it's kinda unlikely over a long time horizon and and with enough attempts that you're going to, not converge on the thing that's true. Right? Now these large models do hallucinate. There's no doubt about it, but, you can minimize that by looking for consensus.
So that's one of the things. I think that's one of the key insights that, anybody doing this stuff should be looking at. It's not cheap, at least today, but it's cheaper than it was a year ago, and it's gonna continue to get cheaper. But then there's also just, like, a ton of guardrails and what we would call reflection or, LLM as a judge, right, where you go at this step in the process, given the output and given what we know about the business and the context for this assignment and the just simply common sense things that an analyst, you know, would think about. Like, maybe if something is, an order of magnitude larger than something else, that's suspicious. We should flag that. We should investigate deeper. There's all sorts of things that we can kinda spin off of the agentic process to go, that seems odd. We should double check that. We should maybe consider making sure that that number was correct, and then we'll spin off another process to do that. And so with all those things in place, you can you can get pretty close. You can get, extremely close to the performance of a human analyst.
And, with every day that goes by, I think we're getting better. I think that, it's it's really tough though because at the end of the day, if a self driving car, to take an example, I think that's that's certainly relevant, hurts one individual, that's gonna set back the, adoption of the self driving car really significantly for reasons that are just human reasons. Right? And so the same thing applies here, maybe not to the same severity, of course. You can spend a long time gaining trust. And then if you lose it, if you do one thing, you can lose it really quickly. And so I think from a user experience perspective and just from a, AI engineering perspective, everything that we can do to help users understand the limitations of the systems that they're working with in the first place and to continue to sort of be accountable for their outputs and be involved and engaged, I think, is a good thing. We don't wanna take the agency away from the people in any way, shape, or form. It's super important. Right? So I think, you know, as we think about building these applications and the experiences within them, everything we can do to kinda continue to bring people along and help them, work with the AI instead of just receiving the outputs is really good. And everything that we can do, of course, to make sure that, we're equipping them with the information that is not, you know, at least on the for the many reasons that are sort of obviously wrong. We wanna kinda eliminate those. But, yeah, it's just it's a it's certainly a brave new world. And I think that over the next couple of years, we're gonna see all sorts of interesting ways that we can help people work with machines a little bit better.
[00:44:05] Lucas Thelosen:
I think the, like, the self driving car example is a good one. And, you know, it is very applicable to data analysis too because we initially, when we started, we're like, hey. This is gonna be a a assistant to the data team. Right? And we expect people to review what the output is because that's how, you know, LMs are being expected to work right now. A human will always review it. I have changed my opinion on that since then, very similar to the self driving car, right, where if there's one mistake made, you know, we hold it to much higher standards. It's funny. Right? Like, at the same time, we're criticizing AI for being in its infancy still and and, of course, nobody should just blindly trust in AI. On the flip side, it makes one mistake. Right? And and John over here obviously made a mistake too as a data analyst, you know, but, hey. It's John. You know? It's fine. He's he's gonna fix it. But when the AI makes it, you know, I lose all trust. So we do have to, at least in in our approach to data analysis, have to hold it to a much higher standard. It can't be wrong. Right? Even though we put disclaimers on it and everything, right, it has to be right. Because we just it's it's in our nature right now. It you know, AI is is new to a lot of people's life. We we would be very quick to throw it out quick. And so so so we just have to hold it to a much higher standard than even we would have to we would hold a human analyst. I can't count the number of times someone forgot to copy down a formula in a spreadsheet, you know, or something like that. Like, I I mean, every single company I went to, I found something. It's like, hey. You the way you cut your profit margin over here, that's the wrong formula. You know? It's like like, really substantial things.
And that just can't happen, because even though it is a new tool, we are
[00:45:44] Drew Gilson:
super critical of it, just just how it is. But my hope would be that, like, as we adopt these tools and more people use data to make better decisions over time, that the net benefit to our organizations and our lives, certainly at this point, I do think is positive. The tools are good enough and the models are good enough that that I believe to be true. You know, as Lucas said, people make mistakes all the time. I I do think that these systems actually probably make fewer mistakes if we were to take a group of, oh, data analysis students, maybe. Lucas and I have been talking about doing this and and sort of pit them against Orion, and and perhaps next year, we'll do that and we'll we'll release the benchmark.
But, you know, people people are inconsistent. They do things different ways, you know, each time they they do something. This system that we're building and the large models that power it are kind of consistently curious and diligent in their in their pursuit of the the answers that they're looking for. And I think that counts, over time for quite quite a bit. And even just that consistency is valuable because you have more data to then consider whether the individual answer that you maybe got that day is in line with all the other analyses that have been done at, like, 04:30 in the morning after the detail jobs finish. And so even that, like, when you show it to a model and suddenly you have this bizarre maybe double counting because maybe a job ran twice or whatever it is, like, the model is gonna catch that more much more often than a human who sort of does that analysis inconsistently every month when their boss bugs them, if that makes sense.
[00:47:21] Lucas Thelosen:
That's an impact. So that's a really interesting point on the, as an analyst. Right? You have let's say, you have two hours to do this thing. Do you really wanna run an additional query? Like, you you just notice something, and it's like, you know, I could investigate now, but, like, what I have is probably good enough. Like, you know, I have a couple of green numbers, a couple of red numbers, and I have a couple of bullet points. And that's just a very different attitude than than Orion has. Right? It it goes down every rabbit hole, and it and it's fine if it that one didn't work out, that analysis is not relevant. It closes that. It it discards it. But it's not afraid to run an additional query and check on this as well, even if that needs additional work for itself.
And that's an interesting interesting change. So it can you know, we do see it asking questions that have necessarily been asked before and and looking at data that maybe hasn't been looked at for a while. Like, you know, hey. Maybe our the way we set our lifetime values for our customer cohorts, you know, that is quite it's a year old now, and things have changed in our business. Maybe I should rerun that and and figure out, you know, what our estimated lifetime value should be. Yeah. That's that's gonna be a thirty hour project for someone, so but Orion doesn't mind. So it's really interesting to see that. Digging into the other side of that trust equation and some of the ways that we're talking about the work being done by data analysts, I'm curious how you're seeing the introduction of a system like Orion
[00:48:43] Tobias Macey:
change the types of work and the attitudes around that work that the overall data team has and the overall organizational impact that it can have as far as maybe shifting the focus or concentrating the focus of those data experts onto more of the critical path and less on this busy work?
[00:49:03] Lucas Thelosen:
Yeah. I think there is there is like, the way I see it, there's, like, two paths that people's careers will go into. And the one is on the the architect. Right? Like, you're responsible for how things are set up, the the the data model, the the, you know, data catalog, the dictionary. On the other side, stakeholder management. I mean, I think I can't overstate that. Like, you another example I brought earlier of, like, this this guy from McKinsey, you know, working for us, it was like, if I had this insight at my previous job, I would have charged a couple $100,000 for it. Like, that you know, you can take what Orion puts out and bring it to the team meeting, and you can take all the credit. You know, you don't have to you don't have to emphasize that an AI spent, you know, what would have taken eighty hours of of work on your end usually to come up with this. Right? So take it and and do the stakeholder management. Talk to the business. Right? Explain it to them. They wanna hear it from a person. Like, they you know, I'm the CEO of a company. I still have a human lawyer because I want someone's signature on the paper. Right? I want someone I can look into the eyes. I know they use AI. I know that they didn't write this contract for me. Right? Like, that's fine. But I wanna have a human there that I can and trust and work with. And so stakeholder management, I also think being not just not just an analyst, but be a data product manager. Talk about the road map of where data is gonna go in your company. Right? Which third party data sets you're gonna add to it? How it's gonna evolve and mature further far beyond where you are right now? And really own it like a product manager. Like, you have release cycles on new things you're gonna do now and and different parts of the business you're gonna work with. Right? So elevate your career that way. And I think those are the two directions, people are gonna go where, yeah, you're not gonna spend time writing 40 different queries anymore. Right? Orion can do that for you. It can it can empower you to elevate your career.
[00:50:54] Drew Gilson:
One of our internal slogans is, promote the data hero. And so, you know, we mean, first of all, just that person we've felt forever is a very, very important part of any organization. Let's recognize them and reward them, but also, like, literally promote them. Like, let's get you promoted. Like, this tool should have a tremendous, impact on your career, and I and I do believe that.
[00:51:17] Tobias Macey:
Digging a bit more into the technical implementation of Orion, I'm wondering if you can give a bit of an overview of the system design and how you think about the integration of these language models with the data infrastructure and data architecture and just some of the evolution of your understanding of the problem as you have gone from initial idea to where you are now?
[00:51:42] Drew Gilson:
Yeah. For sure. So at the high level, Orion is a what we call a multi agent system. And so it's essentially a series of AI agents that are orchestrated together that pass information between them. So over time, that's evolved to become quite sophisticated. And I think anybody who's building a multi agent system has realized that the coordination and orchestration of that message passing is extremely important. And so how you control which agents see which messages and how you move the flow, the conversation, or or the control flow through the system is is is actually quite hard. And so at the beginning, you know, it was, let's just say, very simple. Right? When we first started, experimenting to see what was possible, we would get a model to run some queries. We would show the data to the model in its context window, and then it would begin to produce insights. And over time, as this has become more sophisticated, we've realized that, actually, we don't really want the model to ever look at the data. So it's a large language model. Right? It's not a it's not designed to be particularly numerate or to be able to do analysis, in its in its context window. Ultimately, it's a storyteller. Right? These are narrative, lingual models. But what we can do is use code, which is also in some ways narrative to do the analysis that we're looking for and hide a lot of the actual inputs and outputs in terms of the data itself. And so that was, say, sort of the next step in the evolution of the design of this system. So you've gone from a model sort of looking directly at data to a world now where a model or a series of agents is making a plan, writing some code. So in this case, it's mostly Python to achieve that plan broken down into multiple steps.
The code is then interacting with the data and then doing things like looking at, oh, aggregates, summary statistics at various steps or, you know, looking at the outputs of certain steps and perhaps just the first few rows, the last few rows, or sort of random sampling depending on the task. And then kind of maintaining, like, a separate track, like, an artifact of the outputs of that code that then becomes the source of truth for the output, which we call an insight. And then we use the large language model again to to explain, to tell a story about the data trends that have emerged based on the code that has been written by these models. And so through that process, there's a lot of checks and balances as I've said, and there's a lot of different, components. For instance, you know, you'll have groups that just plan the analysis. You'll have groups that actually write code to perform the analysis. You'll have agents that will take the outputs and then will write it in a certain way based on the specifications that it has about your output constraints or your format or your destination, like, maybe if it's a slide deck instead of a memo. And then ultimately, you get your output. And so I think, you know, in the future, we're gonna begin looking at even more sophisticated approaches. Like, for instance, Google just open sourced a new large model that's not a large language model. It's a large time series model. That's, just one example of a lot of really interesting research that's come out of the back of all this explosion of possibility with large models and deep learning. But it's, you know, it's easy to imagine in, like, a future iteration of Orion that some of those groups have access to models that, don't exist today, like a like a large model that does, you know, linear regression or more sophisticated, you know, time series forecasting in a, you know, relatively unexplainable way even though the accuracy might be good on whatever the benchmarks were that that model was trained on, but then incorporating the results of that analysis back into the, you know, the pipe. Right? So you have the ability for these agents to do a bunch of different things, including writing code and then using yet more machine learning models that have been trained to do particularly, perhaps even domain specific tasks. And then you assemble the output of it all into, the output that the, you know, the user is looking for. So, you know, really, it's just a it is a chain. It's a long chain of of agents passing messages to get the output that you're looking for. And so I think, you know, just to wrap up, you know, anybody who's maybe trying to build this sort of eventually realizes how complicated that can get and just how much you have to learn along the way. So for us, as we've put this together and sort of gradually increased the capabilities and the quality of the system, boy, have we ever spent a lot on compute and tokens. You know, it's taken us many, iterations over the last, you know, fifteen, sixteen months to sort of get the output, consistent. And I think that's probably quite frustrating for a lot of people because you can get so close with that simple iteration where the model is just looking at the data.
It looks it looks right. You know? You you go this thing's capable of doing this in, like, one shot in, like, ten to thirty seconds. But, of course, that's not true. So I think, you know, although I would love to believe that there's a future where you can run a single model in a tool called loop and trust the outputs that it, delivers to you. I do think in the, you know, may medium term, that's probably unlikely. And so I think the orchestration that's gone into building something like Orion is super, super important, and that's really, really hard. And I think it's just been one of the most fascinating technical journeys for for me and my team.
[00:56:59] Tobias Macey:
And as you have been as you have been building and iterating on Orion and working with some of your early customers to introduce them to the capabilities and the conceptual shifts as far as how they approach data in their organization, what are some of the most interesting or innovative or unexpected ways that you've seen that capability applied?
[00:57:20] Lucas Thelosen:
It's really great to see completely new use cases that we didn't even think about. Like, one of our early customers, they you know, Ryan has access to all their data, and one of their competitors published a blog post. And so they took the PDF that of that blog post and gave it to Orion and say, hey. Can you look at our data and write something up on counterpoints, you know, where they might be wrong and and, what we should maybe say to this? Okay. This is a data analysis tool. This is not, you know, this is not a content creation tool. Mhmm. But it was very interesting to see Orion just do that. Right? It looked at the data in the blog post. It it pulled the queries to to see what their own data said and then wrote up where the where they have contradictory data and where they have supportive data of what was said in the blog post, which really helped our, customer to then go out and and write an write an opinion on what just came out there. There's, like, you have statistics. So then, you know, the use cases that we didn't quite anticipate either was we thought Orion would be everybody's assistant in essence. You know, everybody's personal analyst. What is the number one use case, though, right now is actually powering customer facing reporting.
So every comp you know, a lot of companies have lots of bigger customers, more important customers, but to service them would be quite expensive. Right? You can't hire a person to help each one of these customers. So then they were like, well, why don't we just use Orion to create, you know, a custom insight for this particular customer and how things are going, you know, where there's opportunity and whatnot. Yeah. It can it can do that 700 times, and each time different, right, because every customer is unique and their usage is different and their relationship is different. So those are those are, like, small scaled insights, you know, so to speak then. And I would say that's probably the most common use case, and then you can turn around and do the same thing for account teams.
[00:59:07] Drew Gilson:
Yeah. For well, for me, actually, it's just that the value of our product to our own engineering teams as we've built it is pretty neat. You know, I've always been, of course, an advocate for dog fooding, but, all the QA data that we generate is now analyzed by Orion. And, our engineers interact with Orion every day, of course, to continue to improve itself, which makes for some pretty interesting sort of metacognitive, you know, interesting reflections as as Orion sort of realizes that, oh, this this insight's about me. Right? So I that's what I get a kick out of. Maybe that's, that's just my sort of nerd indulgence, but, I just think it's awesome.
[00:59:44] Lucas Thelosen:
Yeah. I mean, I really like the analysis. So Orion analyzed its own inefficiencies. So it it looked at our cloud our cloud bill, our cloud spend, and went through where we could actually save some money by running running things differently. So, you know, we use Orion as a data analysis tool on itself, and we were actually I I think the first month, it went down by 30% in cost, and the next month, we we almost had a similar 30% decrease in cost. So now we're, you know, over 60% or so down from where we were two months ago in terms of efficiency of running Orion just by doing analysis on itself.
[01:00:19] Drew Gilson:
And and these opportunities, like, by the way, like, you know, cloud FinOps is huge as everybody knows, and they and everybody can probably go turn off some servers.
[01:00:29] Tobias Macey:
Yeah. That also brings up another use case. So financial analysis for cloud spend, but then another growing ecosystem is engineering analysis as far as the efficacy and throughput of your engineering teams and how to optimize that and overall productivity. And so I think it'd be interesting as well to see the potential for some of that agentic workload and some of the, some of the insights that you can get on the overall effectiveness of your teams as far as the work that they're doing beyond just at the organizational level.
[01:01:02] Drew Gilson:
Yeah. It's it's true. And I think, you know, Orion's not calling any of my engineers in the middle of the night yet. But, yeah, I I think, that's I mean, this is this we're getting into some interesting territory, right, because I think there are so many different kinds of data and and, and a common one, of course, is, you know, individual performance metrics. And, you know, I think that in a world that, where you have, you know, these systems sort of optimizing continuously and and maybe making somewhat impersonal decisions about the data that they're exposed to, I think, you know, that's not the type of world that I that I want. I guess I just wanna state that. You know? We we wanna empower the human here. So, but, you know, if we can make somebody's job more efficient by pointing out something that they might not be doing, that they should be doing, or sort of maybe point out that, some relative to their peers, the benchmark of their, you know, outbound performance could be in or their performance relative to the benchmark could be increased by maybe calling a different subset of customers based on the data they have available to. You know, there's opportunities
[01:02:04] Lucas Thelosen:
to impact people's lives in a positive way. I think if it's more a personal coach, right, that that and and there is the more like, you can be more vulnerable, right, with it than you might be with a manager, where you say, you know, I'm I have a hard time with x, y, and z, and then Orion can see if there's something there that the data could support. I I think the the personal coach is more interesting probably to to to me. And I use it, you know, I I use it myself where it's like, what are the most important things I should be working on today, right, to drive sales for the company? Orion can look at our data or the the cloud cost spent example. Right? Like, those are very tangible, helpful things that it can point out to me.
[01:02:41] Tobias Macey:
Yeah. I think just generally applying more of that analytical cloud to a broader set of problems than what organizations have typically invested because of the high activation costs that have existed.
[01:02:55] Drew Gilson:
Yeah. Absolutely. A lot of data is getting copied around, you know, every single night, and yet it's only a drop in the bucket, relative to what, could be happening for each and every person in that organization.
[01:03:08] Tobias Macey:
And as you have been building this system, working with some of your early customers, understanding its capabilities, what are some of the most interesting or unexpected or challenging lessons that you've each learned in the process?
[01:03:21] Lucas Thelosen:
Yeah. That's a a fun story, you know, just to to to share that in the early days when we set up the multi agent system and we're like, okay. We really gotta put a structure in here. It was we have an agent that comes up with new ideas. Right? Its job is to think outside the box and and and think of additional questions to ask. And so early on, this is super early, it came up with the idea that this would be super helpful for the board meeting. And all of a sudden, all the agents start talking about the upcoming board meeting and how to support the board meeting further. There was no board meeting. Right? The agent just voiced that it would be interesting for a board meeting, and then all agents got sidetracked. So we realized that we had to put quite a good structure in place to have, like, an essence, a manager, right, that that brings it back to the original, this is what the task is about, and let's stay within the boundary of that task. It was just fascinating to see quite, like, a a human like response, right, in the system of agents as they're working together. There was another, and there's another this was a couple months later, but, like, one of our customers finally, like, got full access to everything, and they put something in that was completely outside of, you know, the data. And, like, you really shouldn't use Orion for it. This is more like a like a personal psychologist maybe or something. Right? And and and luckily, the system caught it. Right? The it was realized that it shouldn't be doing this, but the internal monologue that the internal conversations that went on was quite funny to read. Like, it was like this is it wrote an obituary and terminated the process, but first it published the obituary to the to the internal log. And it's like, yeah. Okay. It had a bit of an attitude.
[01:04:50] Drew Gilson:
Well, it's just so it's so it's it's so hard to orchestrate these things. Right? So I was gonna share just a similar lesson. Like, if you have a group of agents and you tell them, look, you're connected to a semantic model that represents the truth in the business. And, of course, that's what we say about semantic models. We spend a lot of money making sure that the metrics that are encoded in LookML, for instance, are correct. And so if you say to the agents, like, this is true, if it isn't or if it produces something that's, like, clearly not right, the the contortions that you can get into in the conversation flow are just frankly, I mean, it's on one hand, it's hilarious. On the other hand, it's just like, oh my gosh. How are we gonna solve this one? Because, of course, you know, you have a series of, say, sale prices that are coming out of a customer system. And then in one case, for one particular region, somebody has inadvertently multiplied them all by several orders of magnitude, which happens. If the agents are instructed to believe that the semantic model is correct at all costs, well, the it's just an existential crisis for those for those poor AI agents. Right? And so we've had to figure out how do we create the right amount of structure and guardrails with just the right amount of perhaps it's a circuit breaker in some cases or sort of an escape hatch where an where an agent in one of these conversations will have the ability to push the buttons. Something isn't right here. We need to escalate this and figure it out. Or maybe it's just a process of inquiry, sort of double click and you go, well, this can't possibly be right because it's just so anomalous relative to all the other data. And so designing those interventions when that sort of thing happens actually took a lot of time in the early days because if you don't do that, the models will catastrophize or they'll just completely make up, something that might conceivably make sense for the anomaly, which is just simply not true. I mean, the truth is somebody fat fingered some data maybe. Right? And so I think that's one of the, you know, those are some of the early days lessons that I think anybody who's building with this stuff is go going to encounter and will have to solve. And, so there's and then there's certainly been a few of those along the way.
[01:07:02] Tobias Macey:
Alright. I know that. So are there any other aspects of the work that you're doing on Orion or this overall space of agentic analytics and the impacts that it can have that we didn't discuss yet you'd like to cover before we close out the show?
[01:07:15] Drew Gilson:
You know, I think I guess this is a refrain that everybody is saying, but it's a really important one. Like, this is the worst that it's ever gonna be. You know? And and it's it's maybe it's, maybe it's been been said enough that I don't have to say it, but I really don't actually think that. Like, this is getting better every quarter. And if we play that out, you know, even if there is some fundamental cap in the limitation, like a limitation of the technology, even if we just froze what we had today in terms of the capabilities of the models, we could go get easily a decade of value out of it. Right? And so I think that's maybe the point that I'd like to leave on. You know, this is, something that is tremendously exciting given the capability that we have today. You know, there's still a little of the occasional, you know, person out there who said, this is just a random, you know, language generator, and it's not going to be able to, drive business value in a meaningful way. I I don't think that's true. I think that, this is actually the worst that it's ever that it'll ever be, and it's just increasing every quarter in capability.
And I'm super, super excited about that. I can't, I just can't say how interesting this space is, you know, over a twenty plus year career in data and, software. I'm having more fun than I've ever had before. It's awesome.
[01:08:31] Lucas Thelosen:
Yeah. I think that's exactly it. Right? Like, I in the beginning, I said I did my first cloud migration in 02/2007. Right? Like, there are certain trends. Like, I didn't get into blockchain. Right? I didn't do that. I didn't quit my job at Google for that. But I, you know, I have four daughters at home. I quit my job at Google because of this. This is a real it's a monumental change in how we're gonna do things, and I would encourage anyone to embrace it just like, you know, I embraced cloud in 02/2007. It was so beneficial to my career. It will be so beneficial to your career to embrace AI tooling right now because then two years from now, hey. You have two years of AI experience. Right? And that is very real and very valuable to you and your your career over the next decade. So I'm urging anyone, this is this is a real moment in technology, and those don't come along very often. You know? It it might take ten years again to have a next big disruptive moment. So embrace it and and see how you can use it, so to to advance your own career right now.
[01:09:28] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with the both of you and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And I'd just like to thank you for taking the time today to join me and share the work that you're doing and your insights on this overall space of agentic analytics and some of the impacts and real world capabilities that it can have. So I appreciate all of the time and energy you're putting into that, and I hope enjoy the rest of your day. Thank you so much for having us. Thank you. Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com
[01:10:28] Tobias Macey:
with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to a factor of six while guaranteeing accuracy? DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to DBT, or handling complex multisystem migrations, they deliver production ready code with a guaranteed time line and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they turn months long migration nightmares into week long success stories. Your host is Tobias Macey, and today I'm interviewing Lucas Thelosen and Drew Gilson about the engineering and impact of building an autonomous data analyst. So, Lucas, can you start by introducing yourself?
[00:01:08] Lucas Thelosen:
Yeah. Absolutely. I'll I live here in Boulder, Colorado, dad of four daughters, and have been in data analytics my whole career. So I've been analyst for a few different companies, before I started building my first AI in 2010 during the mortgage crisis and, realized that I have to build a central central foundation where all the models can run off and the different dashboards reporting use cases. So I came across a company called Looka, back in 2013 and, got that into everything in the company that I was running at at the time. So that then led me to build my first consulting business, and, I got to set the same up for Uber and and Walmart and Amazon, actually, which got a little awkward when we got acquired by Google. But, some good exposure there across across The US and and Europe. And then, and then, yeah, I got to run product for Google. That was my most recent gig here for data and AI before we started Gravity.
[00:02:08] Tobias Macey:
And, Drew, how about yourself?
[00:02:10] Drew Gilson:
Sure. So Lucas and I met quite a long time ago around the time that Lucas discovered Looker. I also discovered the product. So I live in Calgary. I'm Canadian, and I had been operating an ecommerce business, so in the outdoor category. So we shipped backpacks and tents and sleeping bags all over Canada. And I started a warehouse out of a garage and then put it into a much larger fulfillment center and then, built all the software and the systems to make that business work, and we were generating data like crazy. And I wish I could remember how I discovered Looker. It's kind of the part of the story that I can never tell properly. But one day, I came across this tiny little startup out of Santa Cruz, and it, allowed me to do the work of several data analysts, just all, you know, by myself with my small team. And so I loved it, and I ended up down in Santa Cruz for their first user conference, which was '22 people sitting around on beanbag chairs in an unfinished floor. And one of those, people was Lucas. And so the founder of Looker organized that, and he pulled us together. And he just said, you know, what do you what do you want the product to do next? You know? What do you what do you love about it? What do you not like about it? And so Lucas and I hit it off at that time. And then few years later, after we had wrapped that business up, I ended up joining Looker a little bit later than Lucas. But so we've worked closely together now for over ten years. I joined Google as well through the acquisition.
I've had the privilege of staying in Canada where, I love to hike and ski and climb. And so I haven't, actually lived in the Valley, but I've worked closely at, Looker and at Google where I took a slightly different path than Lucas. I ended up, spending a year in the cloud AI group at Google where I was working on document extraction. So LLM power document extraction, we call document AI. And so in '22, I really started to see the power of the large models, and we started to think, holy smokes. This is gonna be huge at about the same time chat GPT came out. And so, Lucas and I made plans to leave to capitalize on that, and so we left Google in April '20, '24.
And, now we're about, almost, you know, fifteen twelve, fifteen months into this idea at Gravity. We were building Orion.
[00:04:26] Tobias Macey:
And in terms of your background and I guess just wondering how you both got into data and what it is about the overall data ecosystem that has captured your interest for so long.
[00:04:38] Lucas Thelosen:
Yeah. That's a good question. So I'm German, and I love efficiency. And I and I really I you know, it's it's funny. I didn't quite realize this. Like, there's a lot of stereotypes Americans have about Germans. Right? And I didn't know that. Nobody told me those stereotypes when I came to The US. You know, I came here to learn English. And then when I got my first job, you know, I was, like, 22, and I was hired as an analyst, which is, like, you know, for many people just like me at that time, just a like a a random title for doing a lot of different things. And I I was I was hired as an analyst, and I just couldn't hold my my how do you say it? I couldn't hold my tongue. I couldn't I couldn't stop saying. So I was like, guys, this is an incredibly inefficient company. You guys make a lot of mistakes all over the place, and it's so vulnerable to things being messed up in these thousands of spreadsheets. And here I was, like, I was 22. I was, like, ten years younger than the next senior employee. And they're like, well, if if you if you think you know better, then why don't you suggest something? And I didn't. I also didn't pick up on sarcasm.
So I I I literally I wrote a proposal. I I went and researched, and, like, a week later, I presented my proposal for a cloud migration in 2007 to the to the CEO. And he was like, well, okay. Well, I mean, what what do we got to lose here? So why don't you just go for it? So that's how I found out about databases and and and Python and, you know, like, just yeah. SQL. I learned SQL. I taught myself SQL. So it was a wonderful, you know, experience, and, that just happened to be the thing I did over and over. So the next company was like, hey. We heard you did this thing over there. Could you do this with us too? And then that that was the next thing. And then I taught myself, actually, VBA. Like, I don't know if anyone still knows what that is, but, like, I I wrote something in VBA that did stuff in Excel and all these fun things. Yeah. So I was not by training. I was actually a FP and A guy. Like, I had to you know, that's what I learned in school, how to do forecasting and and financial analysis.
I am super passionate about it, though, still. Like, deep down, every like, I think the world would be, like, a lot fairer place if people were more data driven and looked at the data more versus just, like, you know, whoever has the loudest voice in the room and the best relationship. And, you know, most people do wanna make decisions based on their what their buddy thinks, and, and and sometimes we just ignore the facts or we just ask the analysis to be tweaked so it fits the narrative that we already want. And so I always brought you know, I I I really leaned into that German stereotype. I was like, hey. But this is what the numbers say, and maybe we should do that.
[00:07:10] Drew Gilson:
So for me, I one of my first jobs, this would have been that about 1998 or 1999, was data entry for, Red Cross. And so I had to type in a whole bunch of address records, and, many other people were also doing this. And and, of course, the deduplication of that data entry was a significant and ongoing problem. And that was my first exposure to SQL. So I similar to Lucas, we're both very applied. Right? I think at many points in our career, we've both sort of thought, this is silly. This takes too long. There's gotta be a better way. What are the tools available to solve this problem? And so that's led me kinda deeper and deeper into data and then eventually AI in my career. So so, yeah, I had come up with a way to just do a part of that job much more efficiently when I was a kid more or less. And that, through a somewhat long and circuitous journey led me to work at an advertising agency in the early two thousands. And I remember the first, like, paper that I ever authored kinda similar to Lucas. Like, you tryna really, at the end of the day, you're younger, you're trying to you think you know what it is that, the organization you're working for needs to do. You scribble some manifesto. You try to convince your, you know, your boss or your boss's boss. And in my case, it was like, I just I'll never forget it. It was like this meme back even before we called the memes of this, World War one fighter pilot, like, flying his plane with a blindfold on. And I said, operating this business without business intelligence is like flying blind. And I had written this, you know, this this long document, which, unfortunately, I lost. But, at some point, you know, maybe maybe it'll turn up, and it probably won't be nearly as, as good as I remember it being. But I just remember thinking there's so much that we know about how this agency operates in terms of resourcing and forecasting and the, you know, in essentially, like, timing of invoices and payables and all this stuff, and nobody's looking at it. Like, surely, if we put this on a dashboard and we got people to make decisions based on our resource load and based on the average profitability of different types of engagements that we do, we could make this way more successful. And so, you know, slowly and surely, we actually we actually did, and it was, very early to do that sort of thing. We were this was, like, on, like it was on, like, crystal reports and, web trends. Like, you know, long, long ago, there were all sorts of different, ways that you could get data to, you know, to to to run your business. But, anyways, then I ended up consulting on my own often in the data realm and then, ended up operating this ecommerce company that I had read that I had mentioned. But, data's just kind of been like this through line. Like, I've always gone back to what kind of data exhaust are we creating, how can we use that to create this iterative process of ongoing improvement, whether in, like, the group that I had been working for or perhaps the entire business or my client.
And, when the organization or the client is receptive to that kind of thinking, you can do some really, really awesome stuff, and it doesn't really have to be that complicated necessarily. A lot of the time, it's simply just looking at some KPIs and and doing that regularly and then making better decisions as a result as opposed to some of the really fancy stuff that we've started doing over the last few years, which is maybe hard to understand, more predictive, and and prescriptive, which I'm sure we can get into. But often, there's a ton of value just in the fundamentals. And I think, what I'm trying to connect to here is that what we are doing with our product has a lot more to do with the fundamentals than you would think. It's a lot more, let's just do the simple stuff consistently and remind you to do it than do the really fancy stuff that you might not necessarily understand or believe or trust enough to go action. So we can get into that, I'm sure, throughout the conversation.
[00:10:53] Tobias Macey:
And so now digging into what you're building with Orion and the overall focus there, obviously, business analytics, data analytics is something that is well understood. It's been part of the core requirements for any business operating efficiently and at scale to your points for decades now. So why isn't it a solved problem? What is it that can is still needs to be discovered and improved upon given the fact that so many companies have put so much time and money into it?
[00:11:26] Lucas Thelosen:
Yeah. That's so before before Drew and Iwa and their respective product teams at Google, we actually ran the consulting arm. So we were in the consulting arm of Looker and then subsequently, at Google Cloud. And the thing that we saw there over and over is that the vast majority of companies are under resourced when it comes to analytics. Data engineering, data analysis, they don't have the manpower to really support all that the business could do with the data that they have. And that is quite frustrating. Right? Like, you I have seen a couple a couple companies that are that are well resourced and or at least reasonably resourced, and they find really interesting things. Usually, they are they are in hypercompetitive markets where any kind of marginal improvement really matters a lot to to get the next sort of funding or to be like the the IPO one. You know? I was really close to a lot of the food delivery businesses in 2017, for example. Hypercompetitive.
A bunch of money had been poured in. They all had big data teams. They all were trying to figure out how can we be the best food delivery businesses. But the vast majority, let's say 95% of companies, do not resource their teams sufficiently, and so that means there's bandwidth constraints. And then, there is a a huge disconnect between the questions people in the business have and the people that understand the data. So there are these people that understand the data. They know what data is available. They know they they might not necessarily know all the way it could be used. And on the other hand, you have all these people that have a great understanding of the business that they're in and the specific function they're serving, but they don't know how to connect it to the data. And so ideally, right, and I always suggested this as in the consulting side, like, invite your analysts into your team meetings. Have them sit there with you. Right? Have them be part of the conversation so they can make that connection for you. And, unfortunately, you know, that that doesn't necessarily happen because the analyst bandwidth is is not there or, you know, they forgot to invite that person again. I don't know. All these different reasons. So what what we thought we can do now here is actually bridge the gap and actually give everybody a one to one analyst that is right there next to you and actually go even much further than waiting for you to have the question. But think of all the questions that you probably wanna ask. So if you are the head of marketing, if you are the head of procurement, right, or in any of these functions in a company, we can actually prepopulate what are the 100 questions in this job function that someone probably wants to ask. And with some additional context towards your company, we can make that even more specific using AI technology to say, you know, before you even have to think about the questions you wanna ask, we will ask them for you. Orion asks them for you. It thinks about them for you. It becomes aware of of the data available that the data engineering team has put together, the semantic layers there, the data dictionary is set up. Orion can read that and then connect it to all the different business functions. So if we can bridge that last mile and actually explain the actions that should be taken to the to the actual business user, right, we can close that last mile that was that has been the gap there. Like, when I look at, you know, the usage data in in Lookout, I can't talk about the specific numbers, but the re the customers had come back are very low. There are a couple power users in there that use it all the time, and this is not unique, right, to just Looker. It's any BI tool out there. You put out the dashboards. You put these nice, you you know, pivot tables together.
A couple people come back. The vast majority of the business does not. So why don't I bring it to them and explain it to them in the language that they like to speak and actually go as far as, you know what? I went through multiple root cause analysis. I created a couple cohorts, did some predictions, and here is what you probably wanna do on this Monday morning instead of having them to go to the tool, figure out how to use it because I haven't been there for a week, you know, all these different things. So that that was our thesis, and that's what we have built here with Orion.
[00:15:21] Tobias Macey:
And that points back to a conversation that I came across, I don't even know how many years ago at this point, but definitely on the order of at least four or five years ago of people making that same observation that having dashboards is all well and good, but it's ultimately useless because even though you might see, oh, this data is very interesting. It's telling me something about the business, but then it's okay. Well, now what? Now I have to figure out how to actually turn that into some sort of effective action or how am I supposed to actually take action from what I'm seeing because now I need to be able to replicate that context, which is a problem that we're seeing all over again in the case of AI where you might be able to actually see something and understand something, but now how do you actually feed that back into a system that can do something about it and starting to look at this more active analytics and being able to actually turn that into outcomes rather than just retroactive analysis.
[00:16:17] Lucas Thelosen:
Yeah. I think the eye opening moment was when we, we were engaged with this multibillion dollar, retail company, and they had already invested a lot in their data stack. And and yet we came in. We had time because we were the external consultants. You know, we had no meetings on our calendar, and we're able to go through the data and just realize, you know what? Over here, you're just throwing a lot of fresh produce away. And it's because it's in the wrong location at the wrong time, so we use weather patterns. We use, like, you know, search signals, all these different data points, and we're able to predict where the produce should and after a, you know, after a couple weeks of work, tell them exactly how to change their logistics so the produce was better allocated in essence. And it ended up saving that specific company $10,000,000 a week in fresh produce waste. And and that's just like and, you know, it's a massive scale. It's not applicable to everybody, but you can scale it then down and say, okay. Well, what if everybody had three analysts with a fresh perspective that can come in, take a look, and ask the questions that maybe that the entrenched team, you know, hasn't been asking because they didn't have the bandwidth. They didn't have the, you know, the time to go through all this and find these things that are maybe small, but they do add up.
[00:17:32] Drew Gilson:
And it's not just the bandwidth, though. Like, one of the things that made that possible was the consideration and inclusion of third party data. So from a data engineering perspective, actually, I think this is, you know, quite relevant to the audience going forward in this new world. Internal teams tend to focus on internal data. Now you have some special parts of the world where particularly in finance where it's all about seeking alpha and you find that edge with third party data or proprietary data. But in most industries, most of the time, like an under resourced analysis is not going to include third party data. And so once we have these large models that are aware of the entire internet and the data that can be obtained in both structured and unstructured form, you can start to layer that in a lot more quickly and easily. And I think that has, like, profound implications for what we do as, data engineers and analytics engineers because there's sometimes there's only so much signal in internal data. And I would argue that, like, in a lot of cases, particularly in, like, the SMB and mid mid market, space, that there could conceivably be more signal external to your business than inside. And and that's missed by a lot of data engineers out there. If we spent as much time figuring out how to obtain those sources which are of value to the business and, making them possible to, you know, maybe in the simplest, just be compared against for benchmarking. Right? It's just a simple example. You can make a big difference, but not a, you know, not a whole lot of people think that way in this in this industry. There's a lot of time that's spent modeling an internal data source. And then at the end of the day, again, particularly on the smaller end, there's only so much juice from the lemon that you can squeeze.
[00:19:18] Lucas Thelosen:
There's also like, now we can we don't have to join them on a join key. Right? We don't have to join the datasets on any kind of specific like, we can actually have one agent talk about what they found on your data and one agent talk about what they found on the weather data, on the demographic data, on the search trends data, right, on the news data. And it doesn't need to be joined. It doesn't I mean, they talk about, like, hey. On Thursday, this is what I saw. Is, anything that could be outside of our own data related to this, The weather, one might say, well, you know, in Florida, there was a certain event. Was this isolated to Florida? Well, let's check. You know? So they can have a conversation without having to do a specific join like we used to have.
[00:19:57] Tobias Macey:
And digging now into the capability of applying these models and agentic workflows to the problem of business analysis, this is something that has been sought after for a long time now. And with the introduction of these large language models, some of the early iterations were largely focused around, let's talk to your data by doing text to SQL where you say, oh, this is what I want to do. And then the model will generate a SQL query, execute it, and maybe perform some summarization on the response. But one of the requirements that that typically has is that you need to have already invested in having a very clean and scrutable data model and maybe some additional documentation and semantic metadata about it for the model to be able to actually figure out what query to write given the input. And I'm wondering how you've seen the overall architectural patterns and domain understanding of how to actually effectively do that talk to your data use case that has evolved over the past couple of years since that first round of text to SQL started to become en vogue.
[00:21:00] Lucas Thelosen:
Can I let Drew jump here in a second? I, I I I think I think text to SQL misses a really important point. Well, not not an important like, it's a tool for the right user. Right? Like, similar like a dashboard or a good spreadsheet is the right tool for some people. If you know the questions to ask, write Text to SQL if you know your data, you know the question to ask, you know how to prompt it, text to SQL can be a good tool. I do and then a second point there on on what you were asking that I do think a a good foundational data model is is great. I think it's a really good investment to make as a company, structure data, explain your data in some way. Right? Any AI use case will be much more powerful if you put that in place. There are some things we can do beyond that, and that's why I'm gonna hand it over to Drew.
[00:21:45] Drew Gilson:
Yes. Sure. So, I mean, this has evolved, like, very rapidly over the last even six months. Right? So if you wanna do one shot text to SQL prediction, like, if that so that's not necessarily part of the product that we're building. We believe very strongly in the value of the the semantic model. So we're always looking for all the context that we can get to help make sure that we shape your question. And so we can get that context from, like, DBT model files. We can get it from, LookML if you happen to use Looker or perhaps metadata from, like, a a Power BI dataset. Like, there's a bunch of places you can get it. But no matter what, if you get that context and try to do single shot prediction on it, you're probably going to lose fidelity at some point. And so there's a lot of alternatives now. So because tool calling has gotten quite a bit better, what we do is, at the high level anyway, is decompose the problem into its constituent parts. Right? So we will, first of all, determine, well, what is the forget the SQL. What is the data that we need to solve this problem? At what grain? And what are the conditions that we need to set in order to limit or slice the data in such a way that it will be valuable to answer the question that we think needs to be answered. And so all of those steps happen independently.
And at each step, we validate and we verify and we check. So for instance, if we're predicting which fields we need after we get the prediction back, and that might be a tool call, we'll make sure they exist. Now I'm giving a very simple example just so that it's understandable. But in a naive text to SQL approach, you know, you might predict a query that has a hook who knows, it might have a variety of issues, including fields that just simply don't exist. And that was what people were struggling with a few years ago. And and, of course, that left a pretty bad taste if, you know, you had these large models just predicting queries that didn't have any fidelity and in some cases just didn't even seem to understand the underlying schema. Right? But if you can decompose the problem into its constituent parts and then validate and verify at each step, you can get much better, outputs. You can essentially ensure that that you're you're correct at every step. Now it's not cheap and it's not fast, and I expect both of those things to change over time. But it's quite a bit, it's it's quite a bit better than just taking in a natural language question and then just doing essentially what is it, like a transformation or mutation into SQL. I do wanna just, like, jump back up though and emphasize something that Lucas said, which is for us, even if our product was unable to find the data on its own and you had to sort of feed it a little bit of of help, whether that's your complicated SQL query. Right? Like, maybe you have a notebook where you've been doing this analysis for five years.
It takes time. You know exactly how to do it. There's still a ton of value past that point. And so I would encourage other people who are doing this to kinda get out of the weeds of concerns about SQL generation accuracy and go, what is the downstream value of automating through AI all of the subsequent things that can happen? Which is like, to give a specific example, if you know the report that you're supposed to be looking at, but you're a human and you probably don't look at it every single morning with your cup of coffee and scrutinize it in all of its detail, and I can assure you many people don't, the model will happily do that. Right? So that there's benefit to the model simply looking at the results of a known good query, and not predicting the query in the first place. And so a lot of our product actually serves, like, those kind of use cases, which is like you're a human. As much as you say you wanna be data driven and you wake up to data every day and make decisions with data, probably, if we had something that could look at the results of your known good queries that you've had in that notebook and then tell you when there's something that needs to be acted upon or something that's particularly interesting, you know, that might be might be valuable. Right? And I think there's a lot of use cases and a lot of teams out there that might benefit. And so I think the point that I'm making is if you're still sort of worried about the accuracy of of text to SQL, I think that if you can get to high nineties accuracy for your use case or even to a 100% because you've kinda got it on rails, there's a tremendous amount of value that you can unlock with large models sort of past that point. So I don't think it should be an excuse to not be experimenting with large models and data engineering and data analytics at that at this time.
[00:26:13] Tobias Macey:
The other angle of the problem of data analytics in particular is to your earlier point that the analyst needs to have a lot of domain context and understanding of the business and its operations and its requirements to be able to build the appropriate answers and reports to the insights that the business needs to be able to actually improve on their efficiencies, and that requires a lot of nuance. There is a lot of missing documentation. There's a lot of lack of digital representation of that knowledge, and I'm wondering how you think about bridging that divide in terms of these agentic use cases of you throw a model into a sea of data and say, this is what I want to know. It might be able to give you an answer, but how do you know that it's actually the right answer or the answer that the person is actually looking for to be able to make a decision that it needs and just some of those gaps of how to manage that contextual representation and that domain representation of the business that you're operating within.
[00:27:18] Lucas Thelosen:
There are some when we get started with a new customer, right, like Orion. One of the first things we do is take a look at or Orion does, is take a look at the metadata and understand, you know, what are the most common date fields, what are the most common measures and dimensions. So it gets an understanding of what the business usually looks at. Right? It can look visually. You it can look at the dashboards. Right? Those kind of things, reports that are being used. Some organizations have onboarding materials. They're usually incomplete, but we can feed it actually into Orion as well. And then we we think of Orion like onboarding a new data analyst. But, you know, in the first three hours, it can already consume all available information, so it's already really fast on that front. We actually have a customer right now, who asked us if Orion can onboard the next analyst because it it now has such a, you know, thorough understanding of the business and, you know, can explain it with a lot of patience. I I didn't think about that, that that Orion could be used to onboard a new employee, which is really fun to see. There are some really interesting things that can be done now, beyond that. Right? Like, where a lot of a lot of meetings are recorded now. They're transcribed. Right? There's meeting notes attached to it. So you actually don't necessarily have to invite the analyst anymore to the meeting. Right? You can just share the the transcripts.
So there are other ways that you can then also add Orion to have that understanding of what's going on there. And then the l m, and then we added to it a a a vast library of of of understanding of how to do data analysis. Right? While every company is unique, and I also think there are a lot of patterns that are similar. Right? If you're running a sales team, we actually have a decent understanding with you know, I've been in hundreds of engagements, worked with customers on on analysis of inventory management, sales, marketing, right, all these different cost optimization, whatever it might be. And there are patterns that we can that that we put into Orion that can be repeated. You know? Like, I don't wanna I don't wanna neglect that every company is unique and different, but there's also a lot of, like, best practices on this probably what you wanna look at that we added to Orion and then Orion can suggest. So if we take the the understanding of the metadata of, you know, query history, all all of that, Add to it our library of this is how to do data analysis really well for team x. We add to it in Orion, actually. This is what the company is. This is what it does. You can add to it goals, write any additional context information you would like. You build quite a powerful data analyst, in some ways, even more informed so than maybe some of the human analysts, right, that it takes a year or it takes six months to really get that institutional knowledge, and then it's really hard to move it to the next, you know, the new hire in essence. And that's, I guess, why this company asked us if they can use Orion to onboard the new employee because it's finally right? We have a vector date we have a database behind it. Right? Orion stores some of the information so it can retrieve it and have actually a a meta analysis too. Yeah. So there's a lot of things we can do to get Orion into the room, so to speak.
[00:30:13] Drew Gilson:
I wanna go back to a comment I made about just the power and value of the simple things. So Orion can, for instance, look at your calendar. And if you are, say, a customer success manager and you have a few meetings throughout the week, after you've done that OAuth integration and it's able to see your calendar through the reasoning process that, you know, all large models have and, that we've kind of tuned to, you know, for this particular use case. We can pick off a meeting. We can go, looks like you're meeting with an important customer on Thursday morning. Here are the basic facts that you need to take into that meeting. And those facts, if you have a data engineering team that's doing a great job, should be in a flat fact table where there is absolutely no ambiguity or complexity to pull them out. You know, we're not talking about, like, there's some super complex 50 line, you know, SQL query with window functions and aggregations and lag and all that. I mean, literally, it's like, go get the churn and the number of seats and the price and the contract renewal date and all of those things and get it into the Slack messages of that individual so that they have it at hand in that meeting. And that's what data can do. And, unfortunately, I think in a lot of cases, we just still haven't gotten there. Right? Like, you don't have unless you have a very sophisticated data culture with truly self serve analytics and the change management in place to make sure that every CSM on the team is, like, drilled to go get that for themselves, It's kinda rare that somebody goes into that meeting that prepped.
But if you have this thing now that kinda just watches your calendar, pulls from the fact table, and maybe even drops some bullet points on a slide, you suddenly can say, like, yeah, we're data driven. Like, we bring data to every customer meeting. And and so I I just think that's really cool. And so I think there's so much opportunity to do that as opposed to maybe taking, oh, I don't know, 28 different, like, really complex features of your customers and then trying to figure out, well, what is the, you know, the through line? What are the missing thing? Where can we go acquire more customers who are gonna have a, you know, greater, you know, LTV than others? And all and all and I'm not saying that's not valuable. I mean, that's extremely valuable, but there's but that's there's a long tail of, like, basic data culture ability and just the way that you kinda operate that can still be turned on across, like, the the whole world in in in our industry. And so we really hope to help, companies do that simple use case. And there's just a couple other examples that I think we can give before we get into, any you know, when people put data and AI together, I think they kinda go, like, right to the far end of, like, the analytical maturity curve. And I think that stuff's really exciting, but it's actually not the biggest opportunity.
[00:33:04] Lucas Thelosen:
I mean, that's one of the things that gets me really passionate too is, like, you know, we worked at Google, obviously. We worked with Amazon. We worked at Walmart. Right? Like, we got to work with companies that had unlimited amounts of money pretty much to spend on data analysis. And now you can get sophistication, right, but also just the the lowest hanging fruits of data analysis to pretty much any company. Like, you know, people get worried about AI and AI taking away jobs. Right? But, like, Amazon, you know, Walmart, Google, they had the resources to have these kind of insights and to drive their business to be incredibly profitable and make billions of dollars. And the but most companies do not. And so now we have the opportunity to bring data analysis to every company. Right? Like, pretty much every company and that and and and level the playing field. Like, it's no longer reserved just to Walmart and Amazon. Right? There's so many retailers that were squeezed out. There are so many tech companies that got squeezed out because they didn't have the resources. And I think that's an often overlooked, right now, at least, you know, effect of of what AI can bring us, where we put our, you know, our our heads here together, like, guys from Google, guys from McKinsey, right, like, of what does really great analysis look like, put it into Orion, and can make it accessible at scale. And it gets me really excited.
[00:34:21] Tobias Macey:
Your point of being able to integrate with calendars, being able to integrate with Slack and send some bullet points or generate slides and integrate with maybe your Google Workspace to create a presentation introduces the question of the user interface to analytics where, for a long time, that was the business intelligence dashboard where you have to take the initiative to go look at a pane of glass and try to figure out what it's telling you, or maybe it was the job of somebody on your analytics team to prepare those reports for you. But more often than not, they're too busy just trying to keep up that dashboard. And I'm curious how you think about the user interface to analytics and some of the ways that you think about the integration of that insight into the day to day workflow of the people who need that information to be able to execute on their jobs and how these agentic capabilities maybe change the paradigm around what analytics is actually for.
[00:35:22] Lucas Thelosen:
So I I was I'm very passionate about like, I know I just said I'm passionate about this other thing, but I'm also passionate about this one. Can we go to where people are, you know, instead of forcing them to come to our proprietary user interface? Because I think, like, we all have, you know, 10 tools to log in to already. So as much as I can get to where you already are, and if that is in Slack, if that is in in your email system, you know, Gmail or Outlook, wherever you might be. If we can get to, right now, I mean, chat interface is, of course, super popular to to interact with AIs because it is conversational in its nature. But the closer we can get to where people already are, so it becomes second nature to use it, I I think that's a really good one. The the problem I'm facing, and, you know, I enjoy the conversation here on this one, is, like, there's an incredible compute power behind what we built. But if you put it into a chat interface, it seems just like another chatbot, and you're really devaluating all that is happening behind it. I mean, massive compute clusters are going through. A database are being pulled. Right? All this stuff is happening, and all of this is a little blurb in your chat. Right? It doesn't quite sell the value as much as, like, you know, I one of our coworkers, he was at McKinsey, and he was like, this is you know, we would have charged $300,000 for this insight. And now it's just an email in your inbox that might be ignored. Right? Like, how do we how do we get that right? That we convey the value of what you just got here. Right? We we're working with this, ecommerce company and and found this really great insight of markets that they hadn't explored yet that matches their ideal customer profile. Worth a lot of money. And it's just yet, you know, because it came in an email, they actually didn't quite read it. Right? Like, so we really gotta nail that experience. How can we make it as as valuable as that, you know, session you booked with McKinsey that you spent thousands of dollars on?
[00:37:13] Drew Gilson:
Yeah. These are hard problems. The user experience of working with an AI agent, I think, is a fascinating, UX design problem. We, you know, we have so much value that we can drop, like, into your, again, like your DMs. But, again, there's just so much noise in the workplace, and you might not realize just how much went into making sure that that was vetted and and accurate that it does become, you know, it does become a challenge. So we're figuring that out with along with everybody else. But I really do think, like, it's better than the prior attempt at this, which is super complex dashboards with a lot of filters that only a small subset of people used. And I think in almost, you know, without exception, it's better to just hit somebody with a message that says, our conversion rate dropped 2% last week. You know, seems to be due to different traffic from email. Maybe because of the holiday weekend, we should probably reengage these user users with an offer. Like you know? And that maybe goes to the person in marketing, like, proactively. Right? Now you coulda gone, you coulda clicked, and you coulda sliced, and you coulda thought, okay. Channel by channel. There seems to be a little dip here. Okay. Monday was a holiday. Maybe we should look at that segment. We should look at open rates. We should do this. We should do that, the other thing. But, nobody does. You know? And so I think getting to the place where we've got trust in something that is autonomously doing that on our behalf, surfacing recommendations, which are hopefully ranked in some sort of order such that we're not completely overwhelming you because that's another, you know, side of this coin. If we continually send you stuff to do and you just lose, you know, the ability to frankly just act on it because it's just too much and you're not sure where the, you know, signal is and the noise, that would also be bad. But I do think that if we hit you with one actionable insight a week that we could have a profound effect on, like, your success at work. Right? And that's without you logging into an application and trying to figure out how to use it and, you know, trying to understand whether you're doing it right and all that stuff. And there's opportunities everywhere. Right? You know, there are so many people who are not that close to the data teams today that through the work of data engineers and analytics engineers, we could really empower. And I think that's, really exciting. We've said it a couple times now, but, you know, Lucas and I have, like, a ton of empathy for that because we're both field guys. Like, we're pro serve to the bone. You know, we're not in the necessarily in the the server room or, like, deep in the guts of the, the data warehouse. We're we've been out with people trying to help them use data in their job every day for the last fifteen, twenty years. And so, that's what we wanna do with AI. I think it's, like, super cool.
[00:39:50] Tobias Macey:
You touched briefly on another important aspect of this challenge, particularly when you're bringing AI into the picture is accuracy and trustworthiness where you're not necessarily guaranteed that when you have a human doing the same work, but you have a higher expectation that they are going to have done that work since it's part of their job description. And I'm curious how you think about the validation and grounding of the AI workflow to make sure that you don't have any erroneous analyses or that you have some screwed ability as to how the agent reached a particular conclusion.
[00:40:30] Drew Gilson:
Yeah. Yeah. So the answer to that is a lot of you know, we spend a lot of money on compute. Right? So there's a bunch of ways to make it more likely that you're gonna drive accuracy as high as possible. It's brute force, but it does work with these probabilistic systems is simply doing things multiple times. Now that's not the only thing we do. There's a lot of other checks that we'll go through to ensure that the information that we're delivering is correct. But simply running the same kind of analysis a few times in parallel and then looking for consensus will get you pretty far in terms of ironing out, like, potentially strange anomalies introduced by the probabilistic large model in the first place because it's kinda unlikely over a long time horizon and and with enough attempts that you're going to, not converge on the thing that's true. Right? Now these large models do hallucinate. There's no doubt about it, but, you can minimize that by looking for consensus.
So that's one of the things. I think that's one of the key insights that, anybody doing this stuff should be looking at. It's not cheap, at least today, but it's cheaper than it was a year ago, and it's gonna continue to get cheaper. But then there's also just, like, a ton of guardrails and what we would call reflection or, LLM as a judge, right, where you go at this step in the process, given the output and given what we know about the business and the context for this assignment and the just simply common sense things that an analyst, you know, would think about. Like, maybe if something is, an order of magnitude larger than something else, that's suspicious. We should flag that. We should investigate deeper. There's all sorts of things that we can kinda spin off of the agentic process to go, that seems odd. We should double check that. We should maybe consider making sure that that number was correct, and then we'll spin off another process to do that. And so with all those things in place, you can you can get pretty close. You can get, extremely close to the performance of a human analyst.
And, with every day that goes by, I think we're getting better. I think that, it's it's really tough though because at the end of the day, if a self driving car, to take an example, I think that's that's certainly relevant, hurts one individual, that's gonna set back the, adoption of the self driving car really significantly for reasons that are just human reasons. Right? And so the same thing applies here, maybe not to the same severity, of course. You can spend a long time gaining trust. And then if you lose it, if you do one thing, you can lose it really quickly. And so I think from a user experience perspective and just from a, AI engineering perspective, everything that we can do to help users understand the limitations of the systems that they're working with in the first place and to continue to sort of be accountable for their outputs and be involved and engaged, I think, is a good thing. We don't wanna take the agency away from the people in any way, shape, or form. It's super important. Right? So I think, you know, as we think about building these applications and the experiences within them, everything we can do to kinda continue to bring people along and help them, work with the AI instead of just receiving the outputs is really good. And everything that we can do, of course, to make sure that, we're equipping them with the information that is not, you know, at least on the for the many reasons that are sort of obviously wrong. We wanna kinda eliminate those. But, yeah, it's just it's a it's certainly a brave new world. And I think that over the next couple of years, we're gonna see all sorts of interesting ways that we can help people work with machines a little bit better.
[00:44:05] Lucas Thelosen:
I think the, like, the self driving car example is a good one. And, you know, it is very applicable to data analysis too because we initially, when we started, we're like, hey. This is gonna be a a assistant to the data team. Right? And we expect people to review what the output is because that's how, you know, LMs are being expected to work right now. A human will always review it. I have changed my opinion on that since then, very similar to the self driving car, right, where if there's one mistake made, you know, we hold it to much higher standards. It's funny. Right? Like, at the same time, we're criticizing AI for being in its infancy still and and, of course, nobody should just blindly trust in AI. On the flip side, it makes one mistake. Right? And and John over here obviously made a mistake too as a data analyst, you know, but, hey. It's John. You know? It's fine. He's he's gonna fix it. But when the AI makes it, you know, I lose all trust. So we do have to, at least in in our approach to data analysis, have to hold it to a much higher standard. It can't be wrong. Right? Even though we put disclaimers on it and everything, right, it has to be right. Because we just it's it's in our nature right now. It you know, AI is is new to a lot of people's life. We we would be very quick to throw it out quick. And so so so we just have to hold it to a much higher standard than even we would have to we would hold a human analyst. I can't count the number of times someone forgot to copy down a formula in a spreadsheet, you know, or something like that. Like, I I mean, every single company I went to, I found something. It's like, hey. You the way you cut your profit margin over here, that's the wrong formula. You know? It's like like, really substantial things.
And that just can't happen, because even though it is a new tool, we are
[00:45:44] Drew Gilson:
super critical of it, just just how it is. But my hope would be that, like, as we adopt these tools and more people use data to make better decisions over time, that the net benefit to our organizations and our lives, certainly at this point, I do think is positive. The tools are good enough and the models are good enough that that I believe to be true. You know, as Lucas said, people make mistakes all the time. I I do think that these systems actually probably make fewer mistakes if we were to take a group of, oh, data analysis students, maybe. Lucas and I have been talking about doing this and and sort of pit them against Orion, and and perhaps next year, we'll do that and we'll we'll release the benchmark.
But, you know, people people are inconsistent. They do things different ways, you know, each time they they do something. This system that we're building and the large models that power it are kind of consistently curious and diligent in their in their pursuit of the the answers that they're looking for. And I think that counts, over time for quite quite a bit. And even just that consistency is valuable because you have more data to then consider whether the individual answer that you maybe got that day is in line with all the other analyses that have been done at, like, 04:30 in the morning after the detail jobs finish. And so even that, like, when you show it to a model and suddenly you have this bizarre maybe double counting because maybe a job ran twice or whatever it is, like, the model is gonna catch that more much more often than a human who sort of does that analysis inconsistently every month when their boss bugs them, if that makes sense.
[00:47:21] Lucas Thelosen:
That's an impact. So that's a really interesting point on the, as an analyst. Right? You have let's say, you have two hours to do this thing. Do you really wanna run an additional query? Like, you you just notice something, and it's like, you know, I could investigate now, but, like, what I have is probably good enough. Like, you know, I have a couple of green numbers, a couple of red numbers, and I have a couple of bullet points. And that's just a very different attitude than than Orion has. Right? It it goes down every rabbit hole, and it and it's fine if it that one didn't work out, that analysis is not relevant. It closes that. It it discards it. But it's not afraid to run an additional query and check on this as well, even if that needs additional work for itself.
And that's an interesting interesting change. So it can you know, we do see it asking questions that have necessarily been asked before and and looking at data that maybe hasn't been looked at for a while. Like, you know, hey. Maybe our the way we set our lifetime values for our customer cohorts, you know, that is quite it's a year old now, and things have changed in our business. Maybe I should rerun that and and figure out, you know, what our estimated lifetime value should be. Yeah. That's that's gonna be a thirty hour project for someone, so but Orion doesn't mind. So it's really interesting to see that. Digging into the other side of that trust equation and some of the ways that we're talking about the work being done by data analysts, I'm curious how you're seeing the introduction of a system like Orion
[00:48:43] Tobias Macey:
change the types of work and the attitudes around that work that the overall data team has and the overall organizational impact that it can have as far as maybe shifting the focus or concentrating the focus of those data experts onto more of the critical path and less on this busy work?
[00:49:03] Lucas Thelosen:
Yeah. I think there is there is like, the way I see it, there's, like, two paths that people's careers will go into. And the one is on the the architect. Right? Like, you're responsible for how things are set up, the the the data model, the the, you know, data catalog, the dictionary. On the other side, stakeholder management. I mean, I think I can't overstate that. Like, you another example I brought earlier of, like, this this guy from McKinsey, you know, working for us, it was like, if I had this insight at my previous job, I would have charged a couple $100,000 for it. Like, that you know, you can take what Orion puts out and bring it to the team meeting, and you can take all the credit. You know, you don't have to you don't have to emphasize that an AI spent, you know, what would have taken eighty hours of of work on your end usually to come up with this. Right? So take it and and do the stakeholder management. Talk to the business. Right? Explain it to them. They wanna hear it from a person. Like, they you know, I'm the CEO of a company. I still have a human lawyer because I want someone's signature on the paper. Right? I want someone I can look into the eyes. I know they use AI. I know that they didn't write this contract for me. Right? Like, that's fine. But I wanna have a human there that I can and trust and work with. And so stakeholder management, I also think being not just not just an analyst, but be a data product manager. Talk about the road map of where data is gonna go in your company. Right? Which third party data sets you're gonna add to it? How it's gonna evolve and mature further far beyond where you are right now? And really own it like a product manager. Like, you have release cycles on new things you're gonna do now and and different parts of the business you're gonna work with. Right? So elevate your career that way. And I think those are the two directions, people are gonna go where, yeah, you're not gonna spend time writing 40 different queries anymore. Right? Orion can do that for you. It can it can empower you to elevate your career.
[00:50:54] Drew Gilson:
One of our internal slogans is, promote the data hero. And so, you know, we mean, first of all, just that person we've felt forever is a very, very important part of any organization. Let's recognize them and reward them, but also, like, literally promote them. Like, let's get you promoted. Like, this tool should have a tremendous, impact on your career, and I and I do believe that.
[00:51:17] Tobias Macey:
Digging a bit more into the technical implementation of Orion, I'm wondering if you can give a bit of an overview of the system design and how you think about the integration of these language models with the data infrastructure and data architecture and just some of the evolution of your understanding of the problem as you have gone from initial idea to where you are now?
[00:51:42] Drew Gilson:
Yeah. For sure. So at the high level, Orion is a what we call a multi agent system. And so it's essentially a series of AI agents that are orchestrated together that pass information between them. So over time, that's evolved to become quite sophisticated. And I think anybody who's building a multi agent system has realized that the coordination and orchestration of that message passing is extremely important. And so how you control which agents see which messages and how you move the flow, the conversation, or or the control flow through the system is is is actually quite hard. And so at the beginning, you know, it was, let's just say, very simple. Right? When we first started, experimenting to see what was possible, we would get a model to run some queries. We would show the data to the model in its context window, and then it would begin to produce insights. And over time, as this has become more sophisticated, we've realized that, actually, we don't really want the model to ever look at the data. So it's a large language model. Right? It's not a it's not designed to be particularly numerate or to be able to do analysis, in its in its context window. Ultimately, it's a storyteller. Right? These are narrative, lingual models. But what we can do is use code, which is also in some ways narrative to do the analysis that we're looking for and hide a lot of the actual inputs and outputs in terms of the data itself. And so that was, say, sort of the next step in the evolution of the design of this system. So you've gone from a model sort of looking directly at data to a world now where a model or a series of agents is making a plan, writing some code. So in this case, it's mostly Python to achieve that plan broken down into multiple steps.
The code is then interacting with the data and then doing things like looking at, oh, aggregates, summary statistics at various steps or, you know, looking at the outputs of certain steps and perhaps just the first few rows, the last few rows, or sort of random sampling depending on the task. And then kind of maintaining, like, a separate track, like, an artifact of the outputs of that code that then becomes the source of truth for the output, which we call an insight. And then we use the large language model again to to explain, to tell a story about the data trends that have emerged based on the code that has been written by these models. And so through that process, there's a lot of checks and balances as I've said, and there's a lot of different, components. For instance, you know, you'll have groups that just plan the analysis. You'll have groups that actually write code to perform the analysis. You'll have agents that will take the outputs and then will write it in a certain way based on the specifications that it has about your output constraints or your format or your destination, like, maybe if it's a slide deck instead of a memo. And then ultimately, you get your output. And so I think, you know, in the future, we're gonna begin looking at even more sophisticated approaches. Like, for instance, Google just open sourced a new large model that's not a large language model. It's a large time series model. That's, just one example of a lot of really interesting research that's come out of the back of all this explosion of possibility with large models and deep learning. But it's, you know, it's easy to imagine in, like, a future iteration of Orion that some of those groups have access to models that, don't exist today, like a like a large model that does, you know, linear regression or more sophisticated, you know, time series forecasting in a, you know, relatively unexplainable way even though the accuracy might be good on whatever the benchmarks were that that model was trained on, but then incorporating the results of that analysis back into the, you know, the pipe. Right? So you have the ability for these agents to do a bunch of different things, including writing code and then using yet more machine learning models that have been trained to do particularly, perhaps even domain specific tasks. And then you assemble the output of it all into, the output that the, you know, the user is looking for. So, you know, really, it's just a it is a chain. It's a long chain of of agents passing messages to get the output that you're looking for. And so I think, you know, just to wrap up, you know, anybody who's maybe trying to build this sort of eventually realizes how complicated that can get and just how much you have to learn along the way. So for us, as we've put this together and sort of gradually increased the capabilities and the quality of the system, boy, have we ever spent a lot on compute and tokens. You know, it's taken us many, iterations over the last, you know, fifteen, sixteen months to sort of get the output, consistent. And I think that's probably quite frustrating for a lot of people because you can get so close with that simple iteration where the model is just looking at the data.
It looks it looks right. You know? You you go this thing's capable of doing this in, like, one shot in, like, ten to thirty seconds. But, of course, that's not true. So I think, you know, although I would love to believe that there's a future where you can run a single model in a tool called loop and trust the outputs that it, delivers to you. I do think in the, you know, may medium term, that's probably unlikely. And so I think the orchestration that's gone into building something like Orion is super, super important, and that's really, really hard. And I think it's just been one of the most fascinating technical journeys for for me and my team.
[00:56:59] Tobias Macey:
And as you have been as you have been building and iterating on Orion and working with some of your early customers to introduce them to the capabilities and the conceptual shifts as far as how they approach data in their organization, what are some of the most interesting or innovative or unexpected ways that you've seen that capability applied?
[00:57:20] Lucas Thelosen:
It's really great to see completely new use cases that we didn't even think about. Like, one of our early customers, they you know, Ryan has access to all their data, and one of their competitors published a blog post. And so they took the PDF that of that blog post and gave it to Orion and say, hey. Can you look at our data and write something up on counterpoints, you know, where they might be wrong and and, what we should maybe say to this? Okay. This is a data analysis tool. This is not, you know, this is not a content creation tool. Mhmm. But it was very interesting to see Orion just do that. Right? It looked at the data in the blog post. It it pulled the queries to to see what their own data said and then wrote up where the where they have contradictory data and where they have supportive data of what was said in the blog post, which really helped our, customer to then go out and and write an write an opinion on what just came out there. There's, like, you have statistics. So then, you know, the use cases that we didn't quite anticipate either was we thought Orion would be everybody's assistant in essence. You know, everybody's personal analyst. What is the number one use case, though, right now is actually powering customer facing reporting.
So every comp you know, a lot of companies have lots of bigger customers, more important customers, but to service them would be quite expensive. Right? You can't hire a person to help each one of these customers. So then they were like, well, why don't we just use Orion to create, you know, a custom insight for this particular customer and how things are going, you know, where there's opportunity and whatnot. Yeah. It can it can do that 700 times, and each time different, right, because every customer is unique and their usage is different and their relationship is different. So those are those are, like, small scaled insights, you know, so to speak then. And I would say that's probably the most common use case, and then you can turn around and do the same thing for account teams.
[00:59:07] Drew Gilson:
Yeah. For well, for me, actually, it's just that the value of our product to our own engineering teams as we've built it is pretty neat. You know, I've always been, of course, an advocate for dog fooding, but, all the QA data that we generate is now analyzed by Orion. And, our engineers interact with Orion every day, of course, to continue to improve itself, which makes for some pretty interesting sort of metacognitive, you know, interesting reflections as as Orion sort of realizes that, oh, this this insight's about me. Right? So I that's what I get a kick out of. Maybe that's, that's just my sort of nerd indulgence, but, I just think it's awesome.
[00:59:44] Lucas Thelosen:
Yeah. I mean, I really like the analysis. So Orion analyzed its own inefficiencies. So it it looked at our cloud our cloud bill, our cloud spend, and went through where we could actually save some money by running running things differently. So, you know, we use Orion as a data analysis tool on itself, and we were actually I I think the first month, it went down by 30% in cost, and the next month, we we almost had a similar 30% decrease in cost. So now we're, you know, over 60% or so down from where we were two months ago in terms of efficiency of running Orion just by doing analysis on itself.
[01:00:19] Drew Gilson:
And and these opportunities, like, by the way, like, you know, cloud FinOps is huge as everybody knows, and they and everybody can probably go turn off some servers.
[01:00:29] Tobias Macey:
Yeah. That also brings up another use case. So financial analysis for cloud spend, but then another growing ecosystem is engineering analysis as far as the efficacy and throughput of your engineering teams and how to optimize that and overall productivity. And so I think it'd be interesting as well to see the potential for some of that agentic workload and some of the, some of the insights that you can get on the overall effectiveness of your teams as far as the work that they're doing beyond just at the organizational level.
[01:01:02] Drew Gilson:
Yeah. It's it's true. And I think, you know, Orion's not calling any of my engineers in the middle of the night yet. But, yeah, I I think, that's I mean, this is this we're getting into some interesting territory, right, because I think there are so many different kinds of data and and, and a common one, of course, is, you know, individual performance metrics. And, you know, I think that in a world that, where you have, you know, these systems sort of optimizing continuously and and maybe making somewhat impersonal decisions about the data that they're exposed to, I think, you know, that's not the type of world that I that I want. I guess I just wanna state that. You know? We we wanna empower the human here. So, but, you know, if we can make somebody's job more efficient by pointing out something that they might not be doing, that they should be doing, or sort of maybe point out that, some relative to their peers, the benchmark of their, you know, outbound performance could be in or their performance relative to the benchmark could be increased by maybe calling a different subset of customers based on the data they have available to. You know, there's opportunities
[01:02:04] Lucas Thelosen:
to impact people's lives in a positive way. I think if it's more a personal coach, right, that that and and there is the more like, you can be more vulnerable, right, with it than you might be with a manager, where you say, you know, I'm I have a hard time with x, y, and z, and then Orion can see if there's something there that the data could support. I I think the the personal coach is more interesting probably to to to me. And I use it, you know, I I use it myself where it's like, what are the most important things I should be working on today, right, to drive sales for the company? Orion can look at our data or the the cloud cost spent example. Right? Like, those are very tangible, helpful things that it can point out to me.
[01:02:41] Tobias Macey:
Yeah. I think just generally applying more of that analytical cloud to a broader set of problems than what organizations have typically invested because of the high activation costs that have existed.
[01:02:55] Drew Gilson:
Yeah. Absolutely. A lot of data is getting copied around, you know, every single night, and yet it's only a drop in the bucket, relative to what, could be happening for each and every person in that organization.
[01:03:08] Tobias Macey:
And as you have been building this system, working with some of your early customers, understanding its capabilities, what are some of the most interesting or unexpected or challenging lessons that you've each learned in the process?
[01:03:21] Lucas Thelosen:
Yeah. That's a a fun story, you know, just to to to share that in the early days when we set up the multi agent system and we're like, okay. We really gotta put a structure in here. It was we have an agent that comes up with new ideas. Right? Its job is to think outside the box and and and think of additional questions to ask. And so early on, this is super early, it came up with the idea that this would be super helpful for the board meeting. And all of a sudden, all the agents start talking about the upcoming board meeting and how to support the board meeting further. There was no board meeting. Right? The agent just voiced that it would be interesting for a board meeting, and then all agents got sidetracked. So we realized that we had to put quite a good structure in place to have, like, an essence, a manager, right, that that brings it back to the original, this is what the task is about, and let's stay within the boundary of that task. It was just fascinating to see quite, like, a a human like response, right, in the system of agents as they're working together. There was another, and there's another this was a couple months later, but, like, one of our customers finally, like, got full access to everything, and they put something in that was completely outside of, you know, the data. And, like, you really shouldn't use Orion for it. This is more like a like a personal psychologist maybe or something. Right? And and and luckily, the system caught it. Right? The it was realized that it shouldn't be doing this, but the internal monologue that the internal conversations that went on was quite funny to read. Like, it was like this is it wrote an obituary and terminated the process, but first it published the obituary to the to the internal log. And it's like, yeah. Okay. It had a bit of an attitude.
[01:04:50] Drew Gilson:
Well, it's just so it's so it's it's so hard to orchestrate these things. Right? So I was gonna share just a similar lesson. Like, if you have a group of agents and you tell them, look, you're connected to a semantic model that represents the truth in the business. And, of course, that's what we say about semantic models. We spend a lot of money making sure that the metrics that are encoded in LookML, for instance, are correct. And so if you say to the agents, like, this is true, if it isn't or if it produces something that's, like, clearly not right, the the contortions that you can get into in the conversation flow are just frankly, I mean, it's on one hand, it's hilarious. On the other hand, it's just like, oh my gosh. How are we gonna solve this one? Because, of course, you know, you have a series of, say, sale prices that are coming out of a customer system. And then in one case, for one particular region, somebody has inadvertently multiplied them all by several orders of magnitude, which happens. If the agents are instructed to believe that the semantic model is correct at all costs, well, the it's just an existential crisis for those for those poor AI agents. Right? And so we've had to figure out how do we create the right amount of structure and guardrails with just the right amount of perhaps it's a circuit breaker in some cases or sort of an escape hatch where an where an agent in one of these conversations will have the ability to push the buttons. Something isn't right here. We need to escalate this and figure it out. Or maybe it's just a process of inquiry, sort of double click and you go, well, this can't possibly be right because it's just so anomalous relative to all the other data. And so designing those interventions when that sort of thing happens actually took a lot of time in the early days because if you don't do that, the models will catastrophize or they'll just completely make up, something that might conceivably make sense for the anomaly, which is just simply not true. I mean, the truth is somebody fat fingered some data maybe. Right? And so I think that's one of the, you know, those are some of the early days lessons that I think anybody who's building with this stuff is go going to encounter and will have to solve. And, so there's and then there's certainly been a few of those along the way.
[01:07:02] Tobias Macey:
Alright. I know that. So are there any other aspects of the work that you're doing on Orion or this overall space of agentic analytics and the impacts that it can have that we didn't discuss yet you'd like to cover before we close out the show?
[01:07:15] Drew Gilson:
You know, I think I guess this is a refrain that everybody is saying, but it's a really important one. Like, this is the worst that it's ever gonna be. You know? And and it's it's maybe it's, maybe it's been been said enough that I don't have to say it, but I really don't actually think that. Like, this is getting better every quarter. And if we play that out, you know, even if there is some fundamental cap in the limitation, like a limitation of the technology, even if we just froze what we had today in terms of the capabilities of the models, we could go get easily a decade of value out of it. Right? And so I think that's maybe the point that I'd like to leave on. You know, this is, something that is tremendously exciting given the capability that we have today. You know, there's still a little of the occasional, you know, person out there who said, this is just a random, you know, language generator, and it's not going to be able to, drive business value in a meaningful way. I I don't think that's true. I think that, this is actually the worst that it's ever that it'll ever be, and it's just increasing every quarter in capability.
And I'm super, super excited about that. I can't, I just can't say how interesting this space is, you know, over a twenty plus year career in data and, software. I'm having more fun than I've ever had before. It's awesome.
[01:08:31] Lucas Thelosen:
Yeah. I think that's exactly it. Right? Like, I in the beginning, I said I did my first cloud migration in 02/2007. Right? Like, there are certain trends. Like, I didn't get into blockchain. Right? I didn't do that. I didn't quit my job at Google for that. But I, you know, I have four daughters at home. I quit my job at Google because of this. This is a real it's a monumental change in how we're gonna do things, and I would encourage anyone to embrace it just like, you know, I embraced cloud in 02/2007. It was so beneficial to my career. It will be so beneficial to your career to embrace AI tooling right now because then two years from now, hey. You have two years of AI experience. Right? And that is very real and very valuable to you and your your career over the next decade. So I'm urging anyone, this is this is a real moment in technology, and those don't come along very often. You know? It it might take ten years again to have a next big disruptive moment. So embrace it and and see how you can use it, so to to advance your own career right now.
[01:09:28] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with the both of you and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And I'd just like to thank you for taking the time today to join me and share the work that you're doing and your insights on this overall space of agentic analytics and some of the impacts and real world capabilities that it can have. So I appreciate all of the time and energy you're putting into that, and I hope enjoy the rest of your day. Thank you so much for having us. Thank you. Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com
[01:10:28] Tobias Macey:
with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Episode Overview
Guest Introductions: Lucas Toulozan and Drew Gilson
Background and Interest in Data
Challenges in Business Analytics
Agentic Workflows and AI in Business Analysis
Domain Context and AI Integration
User Interface and Analytics Integration
Accuracy and Trust in AI Analytics
Impact on Data Teams and Organizational Change
Technical Implementation of Orion
Innovative Uses and Customer Applications
Lessons Learned and Challenges
Future of Agentic Analytics and Closing Thoughts