Summary
In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and application advancements made it practical for governed workflows, and explores how Compass redefines the relationship between data teams and stakeholders by shifting analysts into steward roles, capturing and governing context, and integrating with Slack where collaboration already happens. The conversation covers organizational observability through Compass's conversational system of record, cost control strategies, and the implications of agentic collaboration on Conway's Law, as well as what's next for Compass and Nick's optimistic views on AI-accelerated software engineering.
Announcements
Parting Question
In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and application advancements made it practical for governed workflows, and explores how Compass redefines the relationship between data teams and stakeholders by shifting analysts into steward roles, capturing and governing context, and integrating with Slack where collaboration already happens. The conversation covers organizational observability through Compass's conversational system of record, cost control strategies, and the implications of agentic collaboration on Conway's Law, as well as what's next for Compass and Nick's optimistic views on AI-accelerated software engineering.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
- Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
- Your host is Tobias Macey and today I'm interviewing Nick Schrock about building an AI analyst that keeps data teams in the loop
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Compass is and the story behind it?
- context repository structure
- how to keep it relevant/avoid sprawl/duplication
- providing guardrails
- how does a tool like Compass help provide feedback/insights back to the data teams?
- preparing the data warehouse for effective introspection by the AI
- LLM selection
- cost management
- caching/materializing ad-hoc queries
- Why Slack and enterprise chat are important to b2b software
- How AI is changing stakeholder relationships
- How not to overpromise AI capabilities
- How does Compass relate to BI?
- How does Compass relate to Dagster and Data Infrastructure?
- What are the most interesting, innovative, or unexpected ways that you have seen Compass used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Compass?
- When is Compass the wrong choice?
- What do you have planned for the future of Compass?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- Dagster
- Dagster Labs
- Dagster Plus
- Dagster Compass
- Chris Bergh DataOps Episode
- Rise of Medium Code blog post
- Context Engineering
- Data Steward
- Information Architecture
- Conway's Law
- Temporal durable execution framework
[00:00:11]
Tobias Macey:
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data teams everywhere face the same problem. They're forcing ML models, streaming data, and real time processing through orchestration tools built for simple ETL. The result, inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed, flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high memory machines or distributed compute.
Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI engineering, streaming, Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workloads, see what it can do for you at dataengineeringpodcast.com/prefect. Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to a factor of six while guaranteeing accuracy? DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems.
Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multisystem migrations, they deliver production ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they turn months long migration nightmares into week long success stories. Your host is Tobias Macy, and today I'm welcoming back Nick Schrock to talk about building an AI analytical system that keeps data teams in the loop in the form of Compass. So, Nick, can you start by introducing yourself for people who haven't heard any of your past appearances?
[00:02:09] Nick Schrock:
Yeah. Sure. And, thanks for having me, Tobias. It's always a pleasure being on. So, yeah, briefly, I'm Nick Schrock. I'm the CTO and founder of DAXR Labs, which is the company behind DAXR, which is a, open source data orchestration platform, and Daxter Plus, which is our commercial hosted product on top of that, and now kind of an additional product, which is called Compass, which I'm super excited to talk about. Yeah. Before that, kind of I cut my teeth at Facebook engineering, and the thing I was best known for was being one of the cocreators of GraphQL. So that's kind of my story.
Founded Daxter in 2018, so a while ago now, and but kinda kinda really got the company off the ground 2019, hired my first employee then. And we've been working really hard for a long time and have an at scale open source project and a really healthy commercial business and looking forward to many more years of success.
[00:03:06] Tobias Macey:
You've been running Dagster for almost as long as I've been running this podcast.
[00:03:11] Nick Schrock:
It's true. Your podcast is actually one of the major ways I got up to speed on the domain. In particular, the episode you did about DataOps was Chris Berg. That was, like, kind of a real unlock for me. So I feel like you and I have kind of been on the journey together in some ways.
[00:03:27] Tobias Macey:
Absolutely. And it's it's been a crazy ride over the past, what, eight years now. So Yeah. So I guess the next stop in that ride is Agentic Systems. And so because you're working in the technology space, you're obligated to build an Agentic system. So I'm wondering if you could just give a bit of an overview about what your thoughts are on the application of agentic systems to data analysis and some of the ways that you thought about the approach to Compass that keeps data teams in the loop without just leaving them on the sidelines and letting the AI run rampant over all of their hard work.
[00:04:06] Nick Schrock:
Yeah. It's been a fascinating journey, actually. I think both me and Daxter Labs as a company has been fairly conservative when it's come to AI and agentic systems up until now. You know, last summer, I wrote this piece. That was a blog post about, what I called the rise of medium code and the properties of a software system that needs to happen or to be an amenable target for AI cogent. And it really focused on minimizing slop and having a technical blast radius so the AI can't do much damage to your system and all that. So I've always thought about it in those terms, but I was always a little skeptical about how good the agents could get. And it's really their the progress has really exceeded my expectations in the last year. And I think really a huge I didn't realize the time, but I think a huge release was this release in February where Anthropic released Sonnet three seven, I think it was, and Quad Code in the same release.
And those two things was a simultaneous innovation at the model layer, but, also, I think maybe even more importantly, the application layer over that model layer. And that kind of moment and the right afterwards was a huge wake up call for me that was like, oh, these systems are super ready for prime time now if you apply the right tools and techniques. And then that really momentum has been building. Actually, you know, in in June, now this term kind of became part of the ether context engineering. Toby looked up Postapata, and then it was canonized by Carpathi.
But it really described how, you know, you can you know, it's a kind of a rebrand of prompt engineering, but it it kinda describes how you can programmatically inject context in the right place in the right time to the right model. And that mentality really clicked with me in terms of, like, oh, you know, me, the lowly product and infrastructure engineer who doesn't know hired the higher order math necessary to kinda build a foundation model. Like, I can really participate in this in a super first class way. So that's kind of the context of my journey. At this point, I'm very, you know, AI pill, I would say, in terms of I believe that we are witnessing the simultaneous disruption of multiple layers of the stack simultaneously in a way that we've never experienced before. So AI is revolutionizing the way we build software, the way we structure infrastructure, the way our stakeholder relationships work, and also the the consumption layer. Right? ChatDTP has been, like, a dramatic change in the way that we interact with computer systems.
And that has not really reached the enterprise at all either, which is a very interesting topic of discussion. So I think we're at this massive way. And it is you know, you you kind of alluded to it how, like, okay. You're in software, so you have to be thinking about agentic code. And, unfortunately, kind of that is true. And I think some people are more obnoxious about it. But it's kind of like ignoring agentic AI would be like ignoring the Internet in the nineties, you know, and not really thinking about how that's gonna impact your system. And, you know, it comp the Internet completely revolutionized all domains of computing. Right? Even the ones that weren't putatively you know? It kinda started with consumer, but then all our infrastructure changed too. So I think this is a similar a similar wave. So I think it's extremely exciting, and I'm just really enthusiastic about the future.
[00:07:52] Tobias Macey:
And to help frame the rest of the conversation, can you describe the scope and purpose of Compass and some of the problems that you're trying to solve with it?
[00:08:03] Nick Schrock:
Yeah. Without going into features or technologies, I think the the highest level the problem we are trying to solve in sort of human terms is to completely restructure the relationship between a data platform team and their stakeholders. Meaning that kind of right now, I think that data teams feel like they are cogs in a machine, that they are cost centers, that they are there to do a job. Business stakeholders ask for data. Business stakeholders ask for dashboards. But then you're kinda disconnected from those business business users because your work is intermediated by these tools, which are often not that pleasant to deal with. Like, you know, BI tools being an example. I often joke that BI category, it it feels like it was invented by Dostoevsky because, like, all BI tools are terrible, but they're all terrible in their own ways. And so what we really want to do is rather than have you know, you you mentioned the term self serve. Rather than think about it as complete self serve, we wanna redefine the relationship between the stakeholders so that the data team is collaborating with the business stakeholders in real time in a highly positive way where instead of be being viewed as a cost center, they are the face of the value. And so they're collaborate collaboratively working with their stakeholders, and they're empowering empowering way more of those stakeholders.
Now that is the problem we're trying to solve. And in the end, that means much more accessibility to data, and you can leverage your data platform to do more things in the organization. They're bore thereby increasing the value. K. So that was like a value. K. So that was like a whole a lot of stuff that I just talked about. But we're redefining the stakeholder relationships such that the perceived and real value of a data platform is higher in the organization. Now how do we do that? You know, Compass looks fairly innocent at first blush.
It is a Slack native experience where you can interact with your data in natural language. It's processed by AI, which sort of acts like a junior analyst, and it can interrogate your data warehouse and do interesting analyses. That is the user experience. But it ends up being fairly, I would say, transformative, dare I say, revolutionary, and it has been internally because you have the stakeholders interacting with both this agentic tool, the the the analyst. But then because there's a agent analyst, the the data team members that are in there in that Slack thread with the business stakeholders, they're no longer analysts.
They act much more like data stewards. They're, like, guiding the user to do the correct analysis, and then they manage the context. And they manage the context store so they can govern the AI in a very scalable fashion. And then the business stakeholders, they never leave Slack. There's data vis right there. The stakeholders can make data requests. They can request context corrections. Often, the AI just figures out how to do that for them. So they plug in all sorts of workflows. They can schedule these analyses on a regularized basis. And in our demos, we really we almost brag.
Right? That's like, you're not gonna see a web UI during this entire demo, and that is super deliberate. Because who wants to learn another web UI? You, like, get bounces web UI. You have to auth to it. You have to learn a completely new information architecture. Right? You have to learn completely new concepts, and it bounces you out of your collaborative zone. You can no longer go, like, at mention people, etcetera, etcetera. So, yeah, that's kind of the approach. It's a, you know, it's a Slack native natural natural language analytics experience that is both collaborative, governed, and natural language in AI driven by AI.
[00:12:01] Tobias Macey:
So the common challenge when dealing with all of these agent based systems, as you already pointed out, is this challenge of context engineering or the alchemy of turning raw bits into useful information is the entire purpose of data engineering, to be very reductive about it. And so those two things are intention because the data engineer doesn't want to forego their purpose and hand it off to an AI and particularly if they have any pride in their work because they know that the AI isn't going to do an appropriate job of understanding the business needs, the business context, all of the hardwood knowledge that they've already encoded into the data assets that they're building. Whereas the consumers of that data, to your point, don't wanna have to deal with learning all the information architecture. They don't want to have to dig through all the docs or go through all the pipelines to really understand what it is that they're actually looking at. And so I'm wondering if you can just talk to some of the ways that you're thinking about that context management and the handoff between the data teams who are doing all the hard work of bringing all this information together and hydrating it with that business context and the ways that the agentic analyst is able to actually retrieve and interact with that context to be able to understand how to map the probably very vaguely worded request from the stakeholders into a concrete plan of action and means of discovery and enumerating all of the information that's required to be able to fulfill that request.
[00:13:35] Nick Schrock:
Right. So there's a lot there. Just trying to start the thesis apart here. So I guess the dispositionally and because I also think it's how the world works is that I think of the AI as a bicycle for your brain as opposed to a replacement for it. And in some ways, actually, almost in every way, these AIs make judgment and taste that much more leveraged. Because if you have good taste and judgment, you can get the AIs to do an extraordinary amount of work on your behalf, that's high quality. But if you don't have that, then it can you know, I call it a technical debt super spreader. It can copy bad patterns and it can go off the races and hallucinate lord knows what. So that's kind of my starting place is that we build tools that keeps humans in the loop, that are governed, and that accelerate the work and amplify the work of subject matter experts rather than sort of eviscerating it. So that's kinda that's kinda my starting.
I, so where do you want me to take it next? So that's, like, kind of, like, high level philosophical.
[00:14:44] Tobias Macey:
So I think the interesting bit is I'll admit to everybody listening, I've already witnessed the demo of Compass, so I already know a lot of the details of how this is operating. So these are somewhat leading questions and a little bit of inside baseball. But my understanding, the Compass utility relies on a repository of context artifacts for being able to understand how to map some of the semantics of these analytical requests into the actual data assets that are available with these data assets, at least in its current formulation, largely being restricted to a data warehouse environment for being able to create and execute SQL queries and, warehouse environment for being able to create and execute SQL queries. And so I'm wondering if you can just talk to how you're thinking about the initialization of that context repository from that set of tables and data assets that already exist in a manner that isn't just a lot of extra busy work on the behalf of the data team, but provides all of the necessary information and guidance to the agentic system for being able to fulfill the requests of the business stakeholders.
[00:15:47] Nick Schrock:
Right. Okay. That makes total sense. So I think there's, like there is an initial step where we bootstrap the contact store with as much information as possible. That takes the form of, you know, setup is super easy. All you do is you plop in your data warehouse creds, and you're good to go. Now what happens is that we for the tables that you allow us to see, we query the information schema. We get as much information as possible from there. We also sample the data, and then we programmatically generate context. I think where a lot of people go wrong with that type of step is that they aren't thoughtful enough about producing the precise context that your application expects.
They just, like, dump raw metadata into some context window and, like, hope and pray that the agent figure stuff out. We're my very much of the belief that you need to very deliberately produce context programmatically in a way that's guided and specific to your application. That means there's a level of precision and control. Like, I firmly believe that increasingly, we're gonna move from data pipelines to context pipelines, meaning that context will be computed. It'll be computed from other context and other data, and that is data pipelining. Right? So that's one of the reasons we we kind of have an engineering approach to it. And the automatically generated data docs are kind of like an initial step to do that. The second critical piece that's in the context store is the oh, and actually, I'll start. All of this is managed with Git, and we think that's very important. Context occupies this fascinating space that's sort of in between code and data, meaning that context is computed just like data, but it also very directly determines system behavior just like code does. This is why we've kind of set on this path of having programmatically generated context checked in to a Git repository.
Because if it's checked in, you can track changes, you can revert things, you could write tooling over it to change it in place. It allows all sorts of flexibility and precision. Right? So imagine you're really developing this thing at scale. You have evals so you can evaluate the performance of the agent. Right? You can do, like, a bisect against the context just like code, and that's a super powerful dynamic. The second piece that we capture in Git are these manual context corrections that we get directly from the business stakeholders. So this is another critical piece of the context puzzle. The first one, again, being programmatic generation of context. And then the second piece is actually getting the information out of the brains of your stakeholders who actually know the domain and into some governed context store where the agents can utilize that to give correct results. Like, for example, at one of our early customers, they use the term core as a kind of special code word for a project that doesn't really mean core, and they flag this. You know? They're like, this has screwed up AIs before, and they totally hallucinate because, obviously, it has its own idea of what core means in the foundation layer. So we created this context correction that very specifically laid out, like, okay. Core actually means these, this and this, these context, etcetera, etcetera, and the system performed well. Now what's really magical about the way Compass works is that all of this is kind of captured either explicitly or in an ambient sense from the interactions that are happening in Slack. So, for example, if the business stakeholder is presented within Slack with a data visualization that looks wrong, like the demo that we give is, like, some sales rep has a 90% win rate, and that never happens. So the demo is, like, you look into it, you do some investigation, you figure out that the sales rep is actually a customer success manager, and then that automatically submits that context back to the contact store and says, like, hey. Don't count CSMs as sales reps. So and then that, it's then checked into the context store. Knowledge is captured, and then the data team can kind of take that and manage it very explicitly. And that's a very powerful model to boot strap. You know, generally, alternative and more heavyweight systems like semantic layers is to do this upfront process that is very burdensome and complicated, and you need to send the business stakeholder to a custom tool or a web app or something. And they never do that. So the knowledge stays captured in their head, which does no good to anyone except for them, maybe. But as we demonstrated, humans have limited context windows too. And what's good for the goose is good for the gander here. So it's important to get that out of your head and into a place where the the agents can take advantage of it. And it's really this lightweight interface within Slack that's really the magic on in in that part of the process.
[00:20:35] Tobias Macey:
One of the other aspects of data teams who are responsible for the care and feeding of analytical systems, particularly when you're dealing with business intelligence, is that it's often very difficult to gain any real insight into how the stakeholders are interacting with those systems. You might be able to have some audit logs to see how frequently people are running certain queries or dashboards. But beyond that, you don't know why they're going to those dashboards, what they're doing with the information once they retrieve it. And I'm curious how you're thinking about the ways of bringing more visibility both to the data teams and at the organizational level of how the overall company is interacting with the data assets that you have and are creating and some of the ways that that can create some feedback loops to the data teams either to prune unused data assets or to understand what new data assets they need to generate to be able to fulfill the needs of the organization, and in particular, how using the conversational system of record for the company helps to provide some of that visibility?
[00:21:44] Nick Schrock:
Oh, it's a great question. And I think I could talk for hours on this subject because it's not just what's currently happening, but I think the roadmap on this front is super, super bright. You know, we're still early stages here. So, you know, I guess, like, just the first thing that happens is that when the business stakeholder is in the same channel as the analyst, and the analyst can literally see what the business stakeholder wants, gets literally compiled to SQL in the data warehouse, that gap between business language and, like, your column and table names communicates so much about what is actually happening and what people actually want. If you're with a traditional BI tool or exploratory data analysis tool, that that translation does not exist in a format that can be discernible by one of the analysts.
So I think that is the dynamic. Just the social dynamics end up really producing a ton of insight. And we are just at the beginnings of being able to use, I like your term, the conversational system of record to drive more value out of that. The initial kind of, like, feature we have that I think demonstrates that is that we have the ability to create data requests in a ticketing system automatically based on what's been happening in a specific thread. So let me give you an example of how this works. So I was asking specifically Compass about product analytics, and I asked and this but it expands beyond that. So I was asking what I thought was a very simple question, which was, how many of our customers use declarative automation, which is one of our scheduling features? It turns out our warehouse didn't really have that particular feature explicitly modeled well. So what the AI did, and this was awesome and terrifying to watch, what it actually did is we have Gong transcripts, meaning Gong is a system that records and transcribes sales calls. So the AI decided to use Snowflake's features of AI based analyses to it couldn't find the exact information, but what it did do is it found all the customers that had mentioned declarative automation in one of their sales calls, which was obviously imprecise, but gave me a hint or at least a floor of how many people use it and interesting customers that use it. It was also terrifying because those capabilities are extremely expensive in Snowflake. But as a result of this experience, I was like, make a data request ticket so that we have this information first class in the data warehouse. And what the system does is it scoops up that entire conversation. And think about all the context you have. You have you have who asked for it. You have all this SQL that was generated that is navigating the warehouse and trying to figure out where where things are going on. You have follow-up questions. You might have a conversation. Maybe our analyst jumps in and says, like, oh, yeah. We don't have that because of this historical reason. It scoops all that up, synthesizes it, and creates a data request. Right? And that's an LLM assisted process. So we're really building, you know, Databricks calls these compound AI applications.
But what it means is kind of injecting where appropriate LLM augmentation and processing into parts of the workflow. And I think we're just at the beginnings of this. As we develop this product over time, we will be doing much leveraging the conversational system of record much more. You can imagine doing post hoc processing on threads that allow you to discern with more detail, like, what suggested context corrections, for example, scooping up all that information. You can imagine having observability and ins insights tools across all the conversations happening in all different channels across your business so you can understand, like, what's happening, what data is being requested the most frequently, what data is not being requested.
You know, my vision is kinda like, it's almost you know how during COVID, right, Google, you could figure out when the COVID was spreading with Google Trends, and it was ahead of any reporting because people would start asking about, you know, oh, I'm losing my I I I can no longer taste something. You could kind of see that go through the country and lead be a leading indicator of the true metrics that get reported. And I think I'm imagining a future where at a company, you can really get a sense of, like, what people care about in aggregate. If you if the analytical queries are shifting in the business, that's actually a good insight in a large organization about what people care about, what they're worried about, etcetera, etcetera. So I think there's a huge space here to get broad based intelligence from this conversational system of record and how the context is being accessed to. So I think it's, like, a a very perceptive question, and I'm like, I think, like, there's just a huge amount of greenfield around there.
[00:26:35] Tobias Macey:
It's interesting too. I was actually, just earlier today having a conversation with somebody who's building an agentic coding platform for doing software engineering in an autonomous fashion, and it brought up the whole idea of Conway's Law about how the structure of the software is defined by the communication patterns of the organization. And once you start introducing these agentic systems, that changes the communication patterns to also incorporate those LLMs, which by necessity modifies the structures of the software that gets created. And I'm wondering how you see that analogy play out in the context of these agentic analytical systems and the role that it plays in terms of the design and orchestration of the data assets that you're building and the ways that people are interacting with those data systems. But because we have these LLMs in play, it is no longer human to human interaction or human to deterministic machine interaction. The LLM then plays a role in that communication system and modifies the ways that people are interacting with it.
[00:27:43] Nick Schrock:
That is really interesting. The Conway's law analogy, I hadn't thought about, but it makes total sense. Because one of the things that I think is happening in the AI era is that I think nearly every stakeholder relationship is going to be reimagined for this era. And I think part of that is because the new consumption layers facilitate new team organizations because of the Conway's law. So just as an example that's not in data platforms, what I mean by that is that there's a wire there's, like, a all software engineering is going away boomlet. Right? That was kind of a big conversation, which I thought was complete load of, you know, whatever.
It's a family show. Right? Tobias. The, but, you know, PMs vibe coding does not mean that software engineers are going away. However, the ability of PMs to prototype and build things in the native system of the engineers fundamentally and completely transforms their stakeholder relationship. And I think this is partially kind of one of these Conway law esque effect. And that's also what's happening in Compass between the business stakeholder and the analyst where the business stakeholder can now kind of do Vibe analytics in their own way and communicate directly in the native medium that the analyst can understand. So and this Conway of Law effect is I am I am incredibly bullish on this UI interaction of multiplayer agentic chat in b two b context. So, you know, single player agentic chat, like ChatDVT and its competitors, have completely remade consumer software and are in the process of doing so. And I think that this multiplayer collaborative chat is gonna do the is the same order of magnitude change in the enterprise.
You know, we're seeing it right now in Compass because, you know, if you stack that on top of the data platform, you effectively don't need reporting functionality across all of your vertical SaaS apps. It's just in this one spot, which is super, super exciting. And, you know, and I think this, like, agentic chat is what does it because you're bringing in people. You can bring in the random stakeholders. And then instead of, like, the agent, I think a lot of people's mental model of the agent is, like, someone's alone and talking to the agent in an enterprise context. That doesn't make sense. What the Slack modality does is that the agent is a participant in a collective conversation that incorporates workflows, and that is a super powerful dynamic that also kind of changes the communication structures here. So there's a lot of I think people have the wrong mental model of this. There's, like there's also a boom lit about, like, oh, there's gonna be, like, one person startups that are billion dollar companies. And, like, I don't really think that's true either because I just I don't imagine a world where, like, one human is talking to, like, n agents and building a company like that.
I think it more of, like, there's there's there could be fewer people, more hyper empowered people, but it's always gonna be hybrid where there's lots of humans and lots of agents and this the the humans are sort of up leveling their work. At least that's the way maybe it's just the way the I want the world to work, but I is I I think it is the way the world will work.
[00:31:01] Tobias Macey:
I think it's also indicative of just the overall tendency for people to take a proof of concept and extrapolate to a larger scale that is not true. I mean, as with anything in software and technology, it's, oh, I built this system in a weekend, so therefore, I can build an entire production company by the end of the week. But the the the factors of scale are something that nobody ever properly accounts for where you're dealing with exponential complexity but logarithmic capability. And so you're you're going to diverge sooner rather than later in terms of what you could actually feasibly maintain. And so I think similarly with that idea of the one person company where I just have 50 different agents, it's like your your head's going to explode trying to keep up with them. And, eventually, the agents you're going to hit the law of diminishing return where the inaccuracies of the agents are going to start compounding, and it's going to drive your multibillion dollar company into the ground before it ever takes off. And I think it's it's also indicative of the hype cycle that came out with the initial release of chat GPT about saying, oh, well, AGI is now just around the corner. We're going to have it by the end of next year, and now it's in the end of two years or five years or it keeps getting pushed back.
[00:32:13] Nick Schrock:
Yeah. And what is even AGI? You know? Like, it's a very difficult thing to define. You know, I use the term technical debt super spreader and things like that. I actually think that's a specific instantiation of a more general trend that's gonna be across multidomains. Because, like, I think with AI, we're going to be entering a complexity crisis effectively. Like, the ability of agentic systems and humans empowered by agentic systems to produce complexity, junk content, interrelated cons concepts that you fundamentally don't understand is very, very high. So I think that the ability to manage and model complexity will become only more and more leveraged. You know? And that's that's what I think about when I'm doing agentic engineering is really compartmentalizing complexity in a way where the agents can contribute the right things at the right time. But, yeah, we are it's going to be a very complicated world with all these agents running around.
[00:33:13] Tobias Macey:
And now bringing us back around to Compass and these agentic systems for AI and the role of the data infrastructure and the data teams in that landscape, what are some of the ways that the requirements of the data infrastructure change when you have to support these agentic systems, and what are some of the aspects that can remain the same and the agent is able to just use systems as they exist today?
[00:33:42] Nick Schrock:
You know, it's such a broad question. And, you know, the agentic systems are so new, and not that many people have deployed them at real scale, that I think it's actually very difficult to understand at this time how exactly it's going to impact everything. You know? I think lots of people there lots of people have these like, a lot of people are like, oh, there's gonna be more unstructured data. I don't even know if that's true, for example, because, actually, for these AI systems to operate over and do super leverage thing, you actually want tons of structure and tons of metadata, tons of context. You know, I think that real time, more complex workflows are going to be incredibly important.
I'm quite bullish on systems like Temporal, for example, to manage the agentic workflows and the complicated agentic workflows that go on because the ability to pause and resume compute will be very important. Because, like, one of the interesting things happening is that agentic workflows are so high latency. Right? Users are now trained to think for a computer to think for minutes on end on your behalf, which is very different from, say, the web era where, like, every millisecond counted. And utilizing computational resources efficiently in those contexts, I think, is actually quite challenging. There's any number of things I could spout off about, but I think I think anyone who gives you definitive answers about how all this stuff is gonna impact infrastructure doesn't really know what they're talking about. Because, like I said, every layer in the stack is getting disrupted, and both the consumption layer is changing.
That implies changes to the compute that's actually running, but also the AI is impacting the way these infrastructure things are built. So there's multiple dimensions of variability right now, so I think it's very difficult to project beyond pure print conjecture what's gonna be changing.
[00:35:40] Tobias Macey:
One of the other aspects of bringing an agent into the equation is obviously cost because LLMs have very unpredictable cost patterns. And so you don't want to route every request through the LLM and especially having it do a huge body of work if it's something that you already have a stored data asset for. And I'm wondering if how you're thinking about some of the methods around taking some of the common questions and interaction patterns with that agent and being able to then either cache them for quick retrieval or materialize them into a more durable asset so it's not something that gets recomputed every time or just some of the ways to enforce the interaction patterns of the stakeholders to say, you don't have to ask the LM this question every time. You can go here for it, or it's going to deliver this to you without you having to take any action into some of the means of mitigation of unbounded cost.
[00:36:42] Nick Schrock:
Yeah. No. It's a you know, I felt this very personally because I went from zero to, like, a 100 on agentic coding this summer, and I hadn't signed up for the Claude Max plan. I just used our corporate account, which doesn't have that sort of high usage limit. And in my first two weeks of Claude code usage, I cost the company $3,000. So we're able to get it under control, but, you know, you can consume a lot of cost doing that. I kind of like I don't even wanna think about, like, how much natural gas was burned to produce those, you know, 10,000 lines of code or whatever. So the cost you know, I mentioned a complexity crisis before. I also think there's a cost crisis coming. And I think the first answer here is that earlier in this episode, I mentioned that I think context pipelines are the kind of the new data pipelines.
And that is gonna one, is that you wanna be precise about when and how you recalculate context. And that means it's a data pipelining problem, like doing event based computation, crafting the computation in a very specific way, and then producing it in a highly tailored way so it's perfect for your application. So writing data pipelines that become context pipelines and then matching that with context engineering, meaning taking those produced artifacts and feeding it to the right model at the right time, the combination of those two techniques, I think, are going to be essential for controlling costs. You know, because, like, the larger the context window, the more expensive the compute is. Like, prefill is quadratic with respect of context window size, and it just determines a ton about model performance. But I think that the cost crisis coming is real.
I think the chickens will come home to roost for a lot of these firms who aren't passing through enough of the compute cost to their customers, and their customers will have a rude awakening and churn. And I think, you know, some of the coding startups are encountering that challenge right now. So, yeah, I think there's gonna be a huge amount of techniques, and those techniques will stay extremely relevant even as the models get better and even as they get cheaper too. Because some of this context management, I view it as kind of almost like big o notation or algorithmic complexity, Meaning that no matter how good Moore's law, an o of n squared sort algorithm can only go so far because no matter how fast the processor is. And I think the same thing is gonna be true with context engineering. You know? Like, we're even seeing this now. Like, we're getting to a million and, like, even enormously large context windows, but they have enormous amounts of diminishing return. And even it can be a negative thing if you pollute the context with contradictory information. You know, this is famously called, like, context poisoning or context rot and all this stuff. So I think, like, context engineering is gonna be more and more expansive. I think that is gonna be a common theme to control cost. And then beyond that, having more control over fine tuning. And, I think there's a whole undiscovered country in terms of democratizing fine tuning and then having the model providers have built in capabilities so you can do fine tuning over their closed source models. But it is early days. But we are going to burn a lot of money and energy along the way, and it's gonna be becoming increasingly more important to control it.
[00:40:12] Tobias Macey:
As you have been going through this journey of building Compass, testing it out, getting it in front of some early adopters. What are some of the most interesting or innovative or unexpected ways that you've seen teams apply these agentic capabilities on top of their existing data investments?
[00:40:30] Nick Schrock:
That is a good question. You know, I think the thing that has really struck out at me is that, you know, the only in our funnel you know, we're still on the order of dozens of users, and we're you know, this is week of October 6, and we're opening the floodgates a little bit. You know, we have hundreds of people on the wait list. I think the thing that has struck out at me is that effectively, once people connect their data warehouse to the system, we have a 100% retention on the platform, which is crazy. So people start using it, and they the usage is intense, and they get tons and tons of stakeholders in the system. Right? And, internally, you know, we actually purchased some datasets that we're gonna make public that effectively are the moral equivalent of, like, the pitch book data. So companies and their fundraising histories and their revenue numbers and all this sort of stuff. And then a people database, which is kinda like the LinkedIn dataset. And just Compass plus those things make it, like, the best prospecting, meaning salespeople finding customers that will be open to purchasing the product. Prospecting and recruiting tool, it's, like, more powerful than LinkedIn sales navigator.
It's crazy. You know, SQL is so powerful. And natural language on top of SQL, doubly so. So we've already seen every single ops role use this tool effectively across recruiting, across HR, across FinOps, RevOps, sales ops. There's lots of ops these days. Product ops queries, doing these sort of things on our own data platform, in fact. So the breadth of use cases has been pretty awesome. And, yeah, you know, it's been great. I the a lot of our early product market fit are actually investment firms. So and they use it for interesting stuff. We thought they would use it for kind of trying to find new companies to invest in, but they have a sales pipeline just like a but their sales are investing in something. So they kind of know what stage they're looking at their company in. They kind of have a pretty formalized pipeline. And the generally, there's one investment ops person who kind of manages that, and they have to field requests from the partners, which is often very time sensitive and stressful. But they've actually gotten their partners, the people who run the firms, to use this tool directly and which has both been efficient, but also an incredible stress reducer, which is really literally why on our marketing site, we can have a pull quote that says, quote, unquote, Compass saved my life, which is always something you wanna hear as a founder.
But the reason why that, person said that is because we not just saved her time, but enormous amounts of stress dealing with kind of time sensitive time sensitive requests from very important people. So I think this investor use case has been pretty interesting to see.
[00:43:24] Tobias Macey:
In your work of building the system and understanding the capabilities and use cases and limitations of an agentic analytics platform and how to tie it into existing data infrastructure and data assets, what are some of the most interesting or unexpected or challenging lessons that you learned in the process?
[00:43:44] Nick Schrock:
I mean, it's still early days. It's amazing how once you once you go from one person on the go to market team being able to interact with the data warehouse to 80% of your team being able to interact with the data warehouse, you really start to see how many gaps there are both in understanding and your data model, but also, like, these gaps and understanding of, like, what people actually care about. So I think that has been super interesting to have rollout in real time.
[00:44:17] Tobias Macey:
And what are the situations where you would advise against going down the agentic path for these exploratory or analytical use cases?
[00:44:29] Nick Schrock:
Yeah. So the you know, we don't call it a BI tool. We call it exploratory data analysis because it's actually a very distinct use case. The for BI tools often drive absolutely mission critical things, like revenue reporting that is subject to regulatory scrutiny, or comp decisions, or pricing decisions. And Compass is explicitly not designed for that use case. It is for exploratory, rapidly rapid directionally correct data analysis, which is a very different use case. So we don't purport no desire to be a replacement for those those core BI assets. We think that should be managed by the BI tools. It's Kind of one of our principles here is that we want to designers call this truth in materials. We don't want to pretend like it's not an LOM. We don't want to pretend that it's a 100% accurate or bulletproof. It's not its purpose. Right? We want it to be rapidly correct or directionally correct and eventually correct. And by eventual correctness, I mean that the context store gets added to, and then the the queries get more and more accurate over time to some kind of asymptotic level. So, you know, there are domains where absolute precision in all cases is absolutely required. That is not, it is for facilitating, as I said, directionally correct rapid analyses.
[00:45:50] Tobias Macey:
And as you continue to invest in and iterate on this agentic exploratory analytics use case, what are some of the things you have planned for the near to medium term or any particular projects or problem areas or capabilities that you're excited to explore?
[00:46:07] Nick Schrock:
Yeah. So one thing I'm super interested in, I think for obvious reasons, is deep integration between Compass and Dagster plus and Dagster. You know? And this comes in many, many different forms, both using data pipelines to produce context and manage context, integrating the context store with our operational system of record, and then also using this tool. You know, we have this ability to create data requests, which can be, like, very detailed, and then using that as a basis of AI agentic authoring workflows, which we actually have kind of working already and is very, very effective. So I'm very excited for that dimension, kind of integrating Compass even more first class into data platforms. I'm very excited to work on our at a, a more enterprise SKU of Compass. I think these kind of organizational observability features will be part of that, as well as sort of on prem versions, which will have its own challenges, but will really unlock usage in a ton of places that will deliver a ton of value and we feel will be very successful in terms of being a healthy business. Yeah. And then just kind of you know, this the way this is set up, you know, we can attack all sorts of interesting use cases one by one by one. You know, we view just in the initial stages, right, every dashboard in every vertical SaaS app is our opportunity in effect.
And that's very exciting to see. And I'm excited to so much of the information and knowledge work that happens still, it's so much drudgery. Manually fielding a request to add such and such to this Salesforce dashboard and then, you know, hooking this and that up. And I think people are a little too pessimistic about, like, AI taking all of our jobs. I don't think that will happen. I think people will move up the stack and have to deal with much less drudgery. And that's kind of the way I approach this and what I seek to do as participating and helping with this product. I think the future is bright. You know, it it kind of always comes up. I'm maybe anticipating a question you might ask, but, should my kids study software engineering? Is software engineering gonna have a future? And blah blah blah. And I couldn't be more bullish about the future of software engineering. It's just gonna change the definition of what software engineering is. But the the core foundations of learning how computation works, learning how to think about this stuff from first principles will only become more leveraged.
[00:48:37] Tobias Macey:
Are there any other aspects of this space of agentic analytics, the work that you're doing on Compass, the leveraging of existing data infrastructure and data assets into this more AI driven interaction pattern that we didn't discuss yet that you'd like to cover before we close out the show?
[00:48:56] Nick Schrock:
No. I think we've we've done a pretty good we've covered a lot of ground, so I think we'll leave it here.
[00:49:02] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap of the tooling or technology for data management today.
[00:49:16] Nick Schrock:
It's always extremely unfair when you ask vendors this because we're morally obligated to talk our own book. But I am super interested in I'm, like, obsessed with this context engineering notion. And I think it's, like, gonna be a defining discipline for the next ten years. I think it's super, super early days. I actually think about a lot because my other kind of passion project right now is figuring out how to deploy AI and agentic authoring in real and large software systems. And I am very interested in the problem of keeping this sounds simple, but I think it's a big problem.
Keeping markdown files checked into a project up to date with the underlying code. Because I think this is a big problem. Because I think of these markdown files, generally, that are computed by agents, they're just to me, they're just token caches. Right? That LLM has, like, evaluated a bunch of tokens in the code base and then materialized that knowledge in more condensed form. Right? And I think that's actually gonna happen recursively in large software projects. But keeping it up to date, it's actually another instance of a data pipelining problem because you can't recompute it every time because it ends up being too expensive. So how can you do that intelligently and keep it up to date? I think it's just one pillar of what is gonna be needed to do AI accelerated software engineering at scale. That's my term, by the way. I despise the term vibe coding and hope we don't talk about it here.
[00:50:37] Tobias Macey:
Yeah. An alternative term that I heard recently is AI native engineering.
[00:50:43] Nick Schrock:
That's pretty good. I will take it. I will take it. Agentic engineering is pretty good too, but I don't know. Agentic is like one of these words now, which I, like, only use as a last resort.
[00:50:54] Tobias Macey:
Absolutely. Well, thank you very much for taking the time today to join me and share the work that you've been doing on your agentic analytics system and, the experiences that you've learned there. So I appreciate that, and I hope you enjoy the rest of your day. Alright.
[00:51:09] Nick Schrock:
Thanks, Tobias. Thanks for having me.
[00:51:18] Tobias Macey:
Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data teams everywhere face the same problem. They're forcing ML models, streaming data, and real time processing through orchestration tools built for simple ETL. The result, inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed, flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high memory machines or distributed compute.
Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI engineering, streaming, Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workloads, see what it can do for you at dataengineeringpodcast.com/prefect. Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to a factor of six while guaranteeing accuracy? DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems.
Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multisystem migrations, they deliver production ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they turn months long migration nightmares into week long success stories. Your host is Tobias Macy, and today I'm welcoming back Nick Schrock to talk about building an AI analytical system that keeps data teams in the loop in the form of Compass. So, Nick, can you start by introducing yourself for people who haven't heard any of your past appearances?
[00:02:09] Nick Schrock:
Yeah. Sure. And, thanks for having me, Tobias. It's always a pleasure being on. So, yeah, briefly, I'm Nick Schrock. I'm the CTO and founder of DAXR Labs, which is the company behind DAXR, which is a, open source data orchestration platform, and Daxter Plus, which is our commercial hosted product on top of that, and now kind of an additional product, which is called Compass, which I'm super excited to talk about. Yeah. Before that, kind of I cut my teeth at Facebook engineering, and the thing I was best known for was being one of the cocreators of GraphQL. So that's kind of my story.
Founded Daxter in 2018, so a while ago now, and but kinda kinda really got the company off the ground 2019, hired my first employee then. And we've been working really hard for a long time and have an at scale open source project and a really healthy commercial business and looking forward to many more years of success.
[00:03:06] Tobias Macey:
You've been running Dagster for almost as long as I've been running this podcast.
[00:03:11] Nick Schrock:
It's true. Your podcast is actually one of the major ways I got up to speed on the domain. In particular, the episode you did about DataOps was Chris Berg. That was, like, kind of a real unlock for me. So I feel like you and I have kind of been on the journey together in some ways.
[00:03:27] Tobias Macey:
Absolutely. And it's it's been a crazy ride over the past, what, eight years now. So Yeah. So I guess the next stop in that ride is Agentic Systems. And so because you're working in the technology space, you're obligated to build an Agentic system. So I'm wondering if you could just give a bit of an overview about what your thoughts are on the application of agentic systems to data analysis and some of the ways that you thought about the approach to Compass that keeps data teams in the loop without just leaving them on the sidelines and letting the AI run rampant over all of their hard work.
[00:04:06] Nick Schrock:
Yeah. It's been a fascinating journey, actually. I think both me and Daxter Labs as a company has been fairly conservative when it's come to AI and agentic systems up until now. You know, last summer, I wrote this piece. That was a blog post about, what I called the rise of medium code and the properties of a software system that needs to happen or to be an amenable target for AI cogent. And it really focused on minimizing slop and having a technical blast radius so the AI can't do much damage to your system and all that. So I've always thought about it in those terms, but I was always a little skeptical about how good the agents could get. And it's really their the progress has really exceeded my expectations in the last year. And I think really a huge I didn't realize the time, but I think a huge release was this release in February where Anthropic released Sonnet three seven, I think it was, and Quad Code in the same release.
And those two things was a simultaneous innovation at the model layer, but, also, I think maybe even more importantly, the application layer over that model layer. And that kind of moment and the right afterwards was a huge wake up call for me that was like, oh, these systems are super ready for prime time now if you apply the right tools and techniques. And then that really momentum has been building. Actually, you know, in in June, now this term kind of became part of the ether context engineering. Toby looked up Postapata, and then it was canonized by Carpathi.
But it really described how, you know, you can you know, it's a kind of a rebrand of prompt engineering, but it it kinda describes how you can programmatically inject context in the right place in the right time to the right model. And that mentality really clicked with me in terms of, like, oh, you know, me, the lowly product and infrastructure engineer who doesn't know hired the higher order math necessary to kinda build a foundation model. Like, I can really participate in this in a super first class way. So that's kind of the context of my journey. At this point, I'm very, you know, AI pill, I would say, in terms of I believe that we are witnessing the simultaneous disruption of multiple layers of the stack simultaneously in a way that we've never experienced before. So AI is revolutionizing the way we build software, the way we structure infrastructure, the way our stakeholder relationships work, and also the the consumption layer. Right? ChatDTP has been, like, a dramatic change in the way that we interact with computer systems.
And that has not really reached the enterprise at all either, which is a very interesting topic of discussion. So I think we're at this massive way. And it is you know, you you kind of alluded to it how, like, okay. You're in software, so you have to be thinking about agentic code. And, unfortunately, kind of that is true. And I think some people are more obnoxious about it. But it's kind of like ignoring agentic AI would be like ignoring the Internet in the nineties, you know, and not really thinking about how that's gonna impact your system. And, you know, it comp the Internet completely revolutionized all domains of computing. Right? Even the ones that weren't putatively you know? It kinda started with consumer, but then all our infrastructure changed too. So I think this is a similar a similar wave. So I think it's extremely exciting, and I'm just really enthusiastic about the future.
[00:07:52] Tobias Macey:
And to help frame the rest of the conversation, can you describe the scope and purpose of Compass and some of the problems that you're trying to solve with it?
[00:08:03] Nick Schrock:
Yeah. Without going into features or technologies, I think the the highest level the problem we are trying to solve in sort of human terms is to completely restructure the relationship between a data platform team and their stakeholders. Meaning that kind of right now, I think that data teams feel like they are cogs in a machine, that they are cost centers, that they are there to do a job. Business stakeholders ask for data. Business stakeholders ask for dashboards. But then you're kinda disconnected from those business business users because your work is intermediated by these tools, which are often not that pleasant to deal with. Like, you know, BI tools being an example. I often joke that BI category, it it feels like it was invented by Dostoevsky because, like, all BI tools are terrible, but they're all terrible in their own ways. And so what we really want to do is rather than have you know, you you mentioned the term self serve. Rather than think about it as complete self serve, we wanna redefine the relationship between the stakeholders so that the data team is collaborating with the business stakeholders in real time in a highly positive way where instead of be being viewed as a cost center, they are the face of the value. And so they're collaborate collaboratively working with their stakeholders, and they're empowering empowering way more of those stakeholders.
Now that is the problem we're trying to solve. And in the end, that means much more accessibility to data, and you can leverage your data platform to do more things in the organization. They're bore thereby increasing the value. K. So that was like a value. K. So that was like a whole a lot of stuff that I just talked about. But we're redefining the stakeholder relationships such that the perceived and real value of a data platform is higher in the organization. Now how do we do that? You know, Compass looks fairly innocent at first blush.
It is a Slack native experience where you can interact with your data in natural language. It's processed by AI, which sort of acts like a junior analyst, and it can interrogate your data warehouse and do interesting analyses. That is the user experience. But it ends up being fairly, I would say, transformative, dare I say, revolutionary, and it has been internally because you have the stakeholders interacting with both this agentic tool, the the the analyst. But then because there's a agent analyst, the the data team members that are in there in that Slack thread with the business stakeholders, they're no longer analysts.
They act much more like data stewards. They're, like, guiding the user to do the correct analysis, and then they manage the context. And they manage the context store so they can govern the AI in a very scalable fashion. And then the business stakeholders, they never leave Slack. There's data vis right there. The stakeholders can make data requests. They can request context corrections. Often, the AI just figures out how to do that for them. So they plug in all sorts of workflows. They can schedule these analyses on a regularized basis. And in our demos, we really we almost brag.
Right? That's like, you're not gonna see a web UI during this entire demo, and that is super deliberate. Because who wants to learn another web UI? You, like, get bounces web UI. You have to auth to it. You have to learn a completely new information architecture. Right? You have to learn completely new concepts, and it bounces you out of your collaborative zone. You can no longer go, like, at mention people, etcetera, etcetera. So, yeah, that's kind of the approach. It's a, you know, it's a Slack native natural natural language analytics experience that is both collaborative, governed, and natural language in AI driven by AI.
[00:12:01] Tobias Macey:
So the common challenge when dealing with all of these agent based systems, as you already pointed out, is this challenge of context engineering or the alchemy of turning raw bits into useful information is the entire purpose of data engineering, to be very reductive about it. And so those two things are intention because the data engineer doesn't want to forego their purpose and hand it off to an AI and particularly if they have any pride in their work because they know that the AI isn't going to do an appropriate job of understanding the business needs, the business context, all of the hardwood knowledge that they've already encoded into the data assets that they're building. Whereas the consumers of that data, to your point, don't wanna have to deal with learning all the information architecture. They don't want to have to dig through all the docs or go through all the pipelines to really understand what it is that they're actually looking at. And so I'm wondering if you can just talk to some of the ways that you're thinking about that context management and the handoff between the data teams who are doing all the hard work of bringing all this information together and hydrating it with that business context and the ways that the agentic analyst is able to actually retrieve and interact with that context to be able to understand how to map the probably very vaguely worded request from the stakeholders into a concrete plan of action and means of discovery and enumerating all of the information that's required to be able to fulfill that request.
[00:13:35] Nick Schrock:
Right. So there's a lot there. Just trying to start the thesis apart here. So I guess the dispositionally and because I also think it's how the world works is that I think of the AI as a bicycle for your brain as opposed to a replacement for it. And in some ways, actually, almost in every way, these AIs make judgment and taste that much more leveraged. Because if you have good taste and judgment, you can get the AIs to do an extraordinary amount of work on your behalf, that's high quality. But if you don't have that, then it can you know, I call it a technical debt super spreader. It can copy bad patterns and it can go off the races and hallucinate lord knows what. So that's kind of my starting place is that we build tools that keeps humans in the loop, that are governed, and that accelerate the work and amplify the work of subject matter experts rather than sort of eviscerating it. So that's kinda that's kinda my starting.
I, so where do you want me to take it next? So that's, like, kind of, like, high level philosophical.
[00:14:44] Tobias Macey:
So I think the interesting bit is I'll admit to everybody listening, I've already witnessed the demo of Compass, so I already know a lot of the details of how this is operating. So these are somewhat leading questions and a little bit of inside baseball. But my understanding, the Compass utility relies on a repository of context artifacts for being able to understand how to map some of the semantics of these analytical requests into the actual data assets that are available with these data assets, at least in its current formulation, largely being restricted to a data warehouse environment for being able to create and execute SQL queries and, warehouse environment for being able to create and execute SQL queries. And so I'm wondering if you can just talk to how you're thinking about the initialization of that context repository from that set of tables and data assets that already exist in a manner that isn't just a lot of extra busy work on the behalf of the data team, but provides all of the necessary information and guidance to the agentic system for being able to fulfill the requests of the business stakeholders.
[00:15:47] Nick Schrock:
Right. Okay. That makes total sense. So I think there's, like there is an initial step where we bootstrap the contact store with as much information as possible. That takes the form of, you know, setup is super easy. All you do is you plop in your data warehouse creds, and you're good to go. Now what happens is that we for the tables that you allow us to see, we query the information schema. We get as much information as possible from there. We also sample the data, and then we programmatically generate context. I think where a lot of people go wrong with that type of step is that they aren't thoughtful enough about producing the precise context that your application expects.
They just, like, dump raw metadata into some context window and, like, hope and pray that the agent figure stuff out. We're my very much of the belief that you need to very deliberately produce context programmatically in a way that's guided and specific to your application. That means there's a level of precision and control. Like, I firmly believe that increasingly, we're gonna move from data pipelines to context pipelines, meaning that context will be computed. It'll be computed from other context and other data, and that is data pipelining. Right? So that's one of the reasons we we kind of have an engineering approach to it. And the automatically generated data docs are kind of like an initial step to do that. The second critical piece that's in the context store is the oh, and actually, I'll start. All of this is managed with Git, and we think that's very important. Context occupies this fascinating space that's sort of in between code and data, meaning that context is computed just like data, but it also very directly determines system behavior just like code does. This is why we've kind of set on this path of having programmatically generated context checked in to a Git repository.
Because if it's checked in, you can track changes, you can revert things, you could write tooling over it to change it in place. It allows all sorts of flexibility and precision. Right? So imagine you're really developing this thing at scale. You have evals so you can evaluate the performance of the agent. Right? You can do, like, a bisect against the context just like code, and that's a super powerful dynamic. The second piece that we capture in Git are these manual context corrections that we get directly from the business stakeholders. So this is another critical piece of the context puzzle. The first one, again, being programmatic generation of context. And then the second piece is actually getting the information out of the brains of your stakeholders who actually know the domain and into some governed context store where the agents can utilize that to give correct results. Like, for example, at one of our early customers, they use the term core as a kind of special code word for a project that doesn't really mean core, and they flag this. You know? They're like, this has screwed up AIs before, and they totally hallucinate because, obviously, it has its own idea of what core means in the foundation layer. So we created this context correction that very specifically laid out, like, okay. Core actually means these, this and this, these context, etcetera, etcetera, and the system performed well. Now what's really magical about the way Compass works is that all of this is kind of captured either explicitly or in an ambient sense from the interactions that are happening in Slack. So, for example, if the business stakeholder is presented within Slack with a data visualization that looks wrong, like the demo that we give is, like, some sales rep has a 90% win rate, and that never happens. So the demo is, like, you look into it, you do some investigation, you figure out that the sales rep is actually a customer success manager, and then that automatically submits that context back to the contact store and says, like, hey. Don't count CSMs as sales reps. So and then that, it's then checked into the context store. Knowledge is captured, and then the data team can kind of take that and manage it very explicitly. And that's a very powerful model to boot strap. You know, generally, alternative and more heavyweight systems like semantic layers is to do this upfront process that is very burdensome and complicated, and you need to send the business stakeholder to a custom tool or a web app or something. And they never do that. So the knowledge stays captured in their head, which does no good to anyone except for them, maybe. But as we demonstrated, humans have limited context windows too. And what's good for the goose is good for the gander here. So it's important to get that out of your head and into a place where the the agents can take advantage of it. And it's really this lightweight interface within Slack that's really the magic on in in that part of the process.
[00:20:35] Tobias Macey:
One of the other aspects of data teams who are responsible for the care and feeding of analytical systems, particularly when you're dealing with business intelligence, is that it's often very difficult to gain any real insight into how the stakeholders are interacting with those systems. You might be able to have some audit logs to see how frequently people are running certain queries or dashboards. But beyond that, you don't know why they're going to those dashboards, what they're doing with the information once they retrieve it. And I'm curious how you're thinking about the ways of bringing more visibility both to the data teams and at the organizational level of how the overall company is interacting with the data assets that you have and are creating and some of the ways that that can create some feedback loops to the data teams either to prune unused data assets or to understand what new data assets they need to generate to be able to fulfill the needs of the organization, and in particular, how using the conversational system of record for the company helps to provide some of that visibility?
[00:21:44] Nick Schrock:
Oh, it's a great question. And I think I could talk for hours on this subject because it's not just what's currently happening, but I think the roadmap on this front is super, super bright. You know, we're still early stages here. So, you know, I guess, like, just the first thing that happens is that when the business stakeholder is in the same channel as the analyst, and the analyst can literally see what the business stakeholder wants, gets literally compiled to SQL in the data warehouse, that gap between business language and, like, your column and table names communicates so much about what is actually happening and what people actually want. If you're with a traditional BI tool or exploratory data analysis tool, that that translation does not exist in a format that can be discernible by one of the analysts.
So I think that is the dynamic. Just the social dynamics end up really producing a ton of insight. And we are just at the beginnings of being able to use, I like your term, the conversational system of record to drive more value out of that. The initial kind of, like, feature we have that I think demonstrates that is that we have the ability to create data requests in a ticketing system automatically based on what's been happening in a specific thread. So let me give you an example of how this works. So I was asking specifically Compass about product analytics, and I asked and this but it expands beyond that. So I was asking what I thought was a very simple question, which was, how many of our customers use declarative automation, which is one of our scheduling features? It turns out our warehouse didn't really have that particular feature explicitly modeled well. So what the AI did, and this was awesome and terrifying to watch, what it actually did is we have Gong transcripts, meaning Gong is a system that records and transcribes sales calls. So the AI decided to use Snowflake's features of AI based analyses to it couldn't find the exact information, but what it did do is it found all the customers that had mentioned declarative automation in one of their sales calls, which was obviously imprecise, but gave me a hint or at least a floor of how many people use it and interesting customers that use it. It was also terrifying because those capabilities are extremely expensive in Snowflake. But as a result of this experience, I was like, make a data request ticket so that we have this information first class in the data warehouse. And what the system does is it scoops up that entire conversation. And think about all the context you have. You have you have who asked for it. You have all this SQL that was generated that is navigating the warehouse and trying to figure out where where things are going on. You have follow-up questions. You might have a conversation. Maybe our analyst jumps in and says, like, oh, yeah. We don't have that because of this historical reason. It scoops all that up, synthesizes it, and creates a data request. Right? And that's an LLM assisted process. So we're really building, you know, Databricks calls these compound AI applications.
But what it means is kind of injecting where appropriate LLM augmentation and processing into parts of the workflow. And I think we're just at the beginnings of this. As we develop this product over time, we will be doing much leveraging the conversational system of record much more. You can imagine doing post hoc processing on threads that allow you to discern with more detail, like, what suggested context corrections, for example, scooping up all that information. You can imagine having observability and ins insights tools across all the conversations happening in all different channels across your business so you can understand, like, what's happening, what data is being requested the most frequently, what data is not being requested.
You know, my vision is kinda like, it's almost you know how during COVID, right, Google, you could figure out when the COVID was spreading with Google Trends, and it was ahead of any reporting because people would start asking about, you know, oh, I'm losing my I I I can no longer taste something. You could kind of see that go through the country and lead be a leading indicator of the true metrics that get reported. And I think I'm imagining a future where at a company, you can really get a sense of, like, what people care about in aggregate. If you if the analytical queries are shifting in the business, that's actually a good insight in a large organization about what people care about, what they're worried about, etcetera, etcetera. So I think there's a huge space here to get broad based intelligence from this conversational system of record and how the context is being accessed to. So I think it's, like, a a very perceptive question, and I'm like, I think, like, there's just a huge amount of greenfield around there.
[00:26:35] Tobias Macey:
It's interesting too. I was actually, just earlier today having a conversation with somebody who's building an agentic coding platform for doing software engineering in an autonomous fashion, and it brought up the whole idea of Conway's Law about how the structure of the software is defined by the communication patterns of the organization. And once you start introducing these agentic systems, that changes the communication patterns to also incorporate those LLMs, which by necessity modifies the structures of the software that gets created. And I'm wondering how you see that analogy play out in the context of these agentic analytical systems and the role that it plays in terms of the design and orchestration of the data assets that you're building and the ways that people are interacting with those data systems. But because we have these LLMs in play, it is no longer human to human interaction or human to deterministic machine interaction. The LLM then plays a role in that communication system and modifies the ways that people are interacting with it.
[00:27:43] Nick Schrock:
That is really interesting. The Conway's law analogy, I hadn't thought about, but it makes total sense. Because one of the things that I think is happening in the AI era is that I think nearly every stakeholder relationship is going to be reimagined for this era. And I think part of that is because the new consumption layers facilitate new team organizations because of the Conway's law. So just as an example that's not in data platforms, what I mean by that is that there's a wire there's, like, a all software engineering is going away boomlet. Right? That was kind of a big conversation, which I thought was complete load of, you know, whatever.
It's a family show. Right? Tobias. The, but, you know, PMs vibe coding does not mean that software engineers are going away. However, the ability of PMs to prototype and build things in the native system of the engineers fundamentally and completely transforms their stakeholder relationship. And I think this is partially kind of one of these Conway law esque effect. And that's also what's happening in Compass between the business stakeholder and the analyst where the business stakeholder can now kind of do Vibe analytics in their own way and communicate directly in the native medium that the analyst can understand. So and this Conway of Law effect is I am I am incredibly bullish on this UI interaction of multiplayer agentic chat in b two b context. So, you know, single player agentic chat, like ChatDVT and its competitors, have completely remade consumer software and are in the process of doing so. And I think that this multiplayer collaborative chat is gonna do the is the same order of magnitude change in the enterprise.
You know, we're seeing it right now in Compass because, you know, if you stack that on top of the data platform, you effectively don't need reporting functionality across all of your vertical SaaS apps. It's just in this one spot, which is super, super exciting. And, you know, and I think this, like, agentic chat is what does it because you're bringing in people. You can bring in the random stakeholders. And then instead of, like, the agent, I think a lot of people's mental model of the agent is, like, someone's alone and talking to the agent in an enterprise context. That doesn't make sense. What the Slack modality does is that the agent is a participant in a collective conversation that incorporates workflows, and that is a super powerful dynamic that also kind of changes the communication structures here. So there's a lot of I think people have the wrong mental model of this. There's, like there's also a boom lit about, like, oh, there's gonna be, like, one person startups that are billion dollar companies. And, like, I don't really think that's true either because I just I don't imagine a world where, like, one human is talking to, like, n agents and building a company like that.
I think it more of, like, there's there's there could be fewer people, more hyper empowered people, but it's always gonna be hybrid where there's lots of humans and lots of agents and this the the humans are sort of up leveling their work. At least that's the way maybe it's just the way the I want the world to work, but I is I I think it is the way the world will work.
[00:31:01] Tobias Macey:
I think it's also indicative of just the overall tendency for people to take a proof of concept and extrapolate to a larger scale that is not true. I mean, as with anything in software and technology, it's, oh, I built this system in a weekend, so therefore, I can build an entire production company by the end of the week. But the the the factors of scale are something that nobody ever properly accounts for where you're dealing with exponential complexity but logarithmic capability. And so you're you're going to diverge sooner rather than later in terms of what you could actually feasibly maintain. And so I think similarly with that idea of the one person company where I just have 50 different agents, it's like your your head's going to explode trying to keep up with them. And, eventually, the agents you're going to hit the law of diminishing return where the inaccuracies of the agents are going to start compounding, and it's going to drive your multibillion dollar company into the ground before it ever takes off. And I think it's it's also indicative of the hype cycle that came out with the initial release of chat GPT about saying, oh, well, AGI is now just around the corner. We're going to have it by the end of next year, and now it's in the end of two years or five years or it keeps getting pushed back.
[00:32:13] Nick Schrock:
Yeah. And what is even AGI? You know? Like, it's a very difficult thing to define. You know, I use the term technical debt super spreader and things like that. I actually think that's a specific instantiation of a more general trend that's gonna be across multidomains. Because, like, I think with AI, we're going to be entering a complexity crisis effectively. Like, the ability of agentic systems and humans empowered by agentic systems to produce complexity, junk content, interrelated cons concepts that you fundamentally don't understand is very, very high. So I think that the ability to manage and model complexity will become only more and more leveraged. You know? And that's that's what I think about when I'm doing agentic engineering is really compartmentalizing complexity in a way where the agents can contribute the right things at the right time. But, yeah, we are it's going to be a very complicated world with all these agents running around.
[00:33:13] Tobias Macey:
And now bringing us back around to Compass and these agentic systems for AI and the role of the data infrastructure and the data teams in that landscape, what are some of the ways that the requirements of the data infrastructure change when you have to support these agentic systems, and what are some of the aspects that can remain the same and the agent is able to just use systems as they exist today?
[00:33:42] Nick Schrock:
You know, it's such a broad question. And, you know, the agentic systems are so new, and not that many people have deployed them at real scale, that I think it's actually very difficult to understand at this time how exactly it's going to impact everything. You know? I think lots of people there lots of people have these like, a lot of people are like, oh, there's gonna be more unstructured data. I don't even know if that's true, for example, because, actually, for these AI systems to operate over and do super leverage thing, you actually want tons of structure and tons of metadata, tons of context. You know, I think that real time, more complex workflows are going to be incredibly important.
I'm quite bullish on systems like Temporal, for example, to manage the agentic workflows and the complicated agentic workflows that go on because the ability to pause and resume compute will be very important. Because, like, one of the interesting things happening is that agentic workflows are so high latency. Right? Users are now trained to think for a computer to think for minutes on end on your behalf, which is very different from, say, the web era where, like, every millisecond counted. And utilizing computational resources efficiently in those contexts, I think, is actually quite challenging. There's any number of things I could spout off about, but I think I think anyone who gives you definitive answers about how all this stuff is gonna impact infrastructure doesn't really know what they're talking about. Because, like I said, every layer in the stack is getting disrupted, and both the consumption layer is changing.
That implies changes to the compute that's actually running, but also the AI is impacting the way these infrastructure things are built. So there's multiple dimensions of variability right now, so I think it's very difficult to project beyond pure print conjecture what's gonna be changing.
[00:35:40] Tobias Macey:
One of the other aspects of bringing an agent into the equation is obviously cost because LLMs have very unpredictable cost patterns. And so you don't want to route every request through the LLM and especially having it do a huge body of work if it's something that you already have a stored data asset for. And I'm wondering if how you're thinking about some of the methods around taking some of the common questions and interaction patterns with that agent and being able to then either cache them for quick retrieval or materialize them into a more durable asset so it's not something that gets recomputed every time or just some of the ways to enforce the interaction patterns of the stakeholders to say, you don't have to ask the LM this question every time. You can go here for it, or it's going to deliver this to you without you having to take any action into some of the means of mitigation of unbounded cost.
[00:36:42] Nick Schrock:
Yeah. No. It's a you know, I felt this very personally because I went from zero to, like, a 100 on agentic coding this summer, and I hadn't signed up for the Claude Max plan. I just used our corporate account, which doesn't have that sort of high usage limit. And in my first two weeks of Claude code usage, I cost the company $3,000. So we're able to get it under control, but, you know, you can consume a lot of cost doing that. I kind of like I don't even wanna think about, like, how much natural gas was burned to produce those, you know, 10,000 lines of code or whatever. So the cost you know, I mentioned a complexity crisis before. I also think there's a cost crisis coming. And I think the first answer here is that earlier in this episode, I mentioned that I think context pipelines are the kind of the new data pipelines.
And that is gonna one, is that you wanna be precise about when and how you recalculate context. And that means it's a data pipelining problem, like doing event based computation, crafting the computation in a very specific way, and then producing it in a highly tailored way so it's perfect for your application. So writing data pipelines that become context pipelines and then matching that with context engineering, meaning taking those produced artifacts and feeding it to the right model at the right time, the combination of those two techniques, I think, are going to be essential for controlling costs. You know, because, like, the larger the context window, the more expensive the compute is. Like, prefill is quadratic with respect of context window size, and it just determines a ton about model performance. But I think that the cost crisis coming is real.
I think the chickens will come home to roost for a lot of these firms who aren't passing through enough of the compute cost to their customers, and their customers will have a rude awakening and churn. And I think, you know, some of the coding startups are encountering that challenge right now. So, yeah, I think there's gonna be a huge amount of techniques, and those techniques will stay extremely relevant even as the models get better and even as they get cheaper too. Because some of this context management, I view it as kind of almost like big o notation or algorithmic complexity, Meaning that no matter how good Moore's law, an o of n squared sort algorithm can only go so far because no matter how fast the processor is. And I think the same thing is gonna be true with context engineering. You know? Like, we're even seeing this now. Like, we're getting to a million and, like, even enormously large context windows, but they have enormous amounts of diminishing return. And even it can be a negative thing if you pollute the context with contradictory information. You know, this is famously called, like, context poisoning or context rot and all this stuff. So I think, like, context engineering is gonna be more and more expansive. I think that is gonna be a common theme to control cost. And then beyond that, having more control over fine tuning. And, I think there's a whole undiscovered country in terms of democratizing fine tuning and then having the model providers have built in capabilities so you can do fine tuning over their closed source models. But it is early days. But we are going to burn a lot of money and energy along the way, and it's gonna be becoming increasingly more important to control it.
[00:40:12] Tobias Macey:
As you have been going through this journey of building Compass, testing it out, getting it in front of some early adopters. What are some of the most interesting or innovative or unexpected ways that you've seen teams apply these agentic capabilities on top of their existing data investments?
[00:40:30] Nick Schrock:
That is a good question. You know, I think the thing that has really struck out at me is that, you know, the only in our funnel you know, we're still on the order of dozens of users, and we're you know, this is week of October 6, and we're opening the floodgates a little bit. You know, we have hundreds of people on the wait list. I think the thing that has struck out at me is that effectively, once people connect their data warehouse to the system, we have a 100% retention on the platform, which is crazy. So people start using it, and they the usage is intense, and they get tons and tons of stakeholders in the system. Right? And, internally, you know, we actually purchased some datasets that we're gonna make public that effectively are the moral equivalent of, like, the pitch book data. So companies and their fundraising histories and their revenue numbers and all this sort of stuff. And then a people database, which is kinda like the LinkedIn dataset. And just Compass plus those things make it, like, the best prospecting, meaning salespeople finding customers that will be open to purchasing the product. Prospecting and recruiting tool, it's, like, more powerful than LinkedIn sales navigator.
It's crazy. You know, SQL is so powerful. And natural language on top of SQL, doubly so. So we've already seen every single ops role use this tool effectively across recruiting, across HR, across FinOps, RevOps, sales ops. There's lots of ops these days. Product ops queries, doing these sort of things on our own data platform, in fact. So the breadth of use cases has been pretty awesome. And, yeah, you know, it's been great. I the a lot of our early product market fit are actually investment firms. So and they use it for interesting stuff. We thought they would use it for kind of trying to find new companies to invest in, but they have a sales pipeline just like a but their sales are investing in something. So they kind of know what stage they're looking at their company in. They kind of have a pretty formalized pipeline. And the generally, there's one investment ops person who kind of manages that, and they have to field requests from the partners, which is often very time sensitive and stressful. But they've actually gotten their partners, the people who run the firms, to use this tool directly and which has both been efficient, but also an incredible stress reducer, which is really literally why on our marketing site, we can have a pull quote that says, quote, unquote, Compass saved my life, which is always something you wanna hear as a founder.
But the reason why that, person said that is because we not just saved her time, but enormous amounts of stress dealing with kind of time sensitive time sensitive requests from very important people. So I think this investor use case has been pretty interesting to see.
[00:43:24] Tobias Macey:
In your work of building the system and understanding the capabilities and use cases and limitations of an agentic analytics platform and how to tie it into existing data infrastructure and data assets, what are some of the most interesting or unexpected or challenging lessons that you learned in the process?
[00:43:44] Nick Schrock:
I mean, it's still early days. It's amazing how once you once you go from one person on the go to market team being able to interact with the data warehouse to 80% of your team being able to interact with the data warehouse, you really start to see how many gaps there are both in understanding and your data model, but also, like, these gaps and understanding of, like, what people actually care about. So I think that has been super interesting to have rollout in real time.
[00:44:17] Tobias Macey:
And what are the situations where you would advise against going down the agentic path for these exploratory or analytical use cases?
[00:44:29] Nick Schrock:
Yeah. So the you know, we don't call it a BI tool. We call it exploratory data analysis because it's actually a very distinct use case. The for BI tools often drive absolutely mission critical things, like revenue reporting that is subject to regulatory scrutiny, or comp decisions, or pricing decisions. And Compass is explicitly not designed for that use case. It is for exploratory, rapidly rapid directionally correct data analysis, which is a very different use case. So we don't purport no desire to be a replacement for those those core BI assets. We think that should be managed by the BI tools. It's Kind of one of our principles here is that we want to designers call this truth in materials. We don't want to pretend like it's not an LOM. We don't want to pretend that it's a 100% accurate or bulletproof. It's not its purpose. Right? We want it to be rapidly correct or directionally correct and eventually correct. And by eventual correctness, I mean that the context store gets added to, and then the the queries get more and more accurate over time to some kind of asymptotic level. So, you know, there are domains where absolute precision in all cases is absolutely required. That is not, it is for facilitating, as I said, directionally correct rapid analyses.
[00:45:50] Tobias Macey:
And as you continue to invest in and iterate on this agentic exploratory analytics use case, what are some of the things you have planned for the near to medium term or any particular projects or problem areas or capabilities that you're excited to explore?
[00:46:07] Nick Schrock:
Yeah. So one thing I'm super interested in, I think for obvious reasons, is deep integration between Compass and Dagster plus and Dagster. You know? And this comes in many, many different forms, both using data pipelines to produce context and manage context, integrating the context store with our operational system of record, and then also using this tool. You know, we have this ability to create data requests, which can be, like, very detailed, and then using that as a basis of AI agentic authoring workflows, which we actually have kind of working already and is very, very effective. So I'm very excited for that dimension, kind of integrating Compass even more first class into data platforms. I'm very excited to work on our at a, a more enterprise SKU of Compass. I think these kind of organizational observability features will be part of that, as well as sort of on prem versions, which will have its own challenges, but will really unlock usage in a ton of places that will deliver a ton of value and we feel will be very successful in terms of being a healthy business. Yeah. And then just kind of you know, this the way this is set up, you know, we can attack all sorts of interesting use cases one by one by one. You know, we view just in the initial stages, right, every dashboard in every vertical SaaS app is our opportunity in effect.
And that's very exciting to see. And I'm excited to so much of the information and knowledge work that happens still, it's so much drudgery. Manually fielding a request to add such and such to this Salesforce dashboard and then, you know, hooking this and that up. And I think people are a little too pessimistic about, like, AI taking all of our jobs. I don't think that will happen. I think people will move up the stack and have to deal with much less drudgery. And that's kind of the way I approach this and what I seek to do as participating and helping with this product. I think the future is bright. You know, it it kind of always comes up. I'm maybe anticipating a question you might ask, but, should my kids study software engineering? Is software engineering gonna have a future? And blah blah blah. And I couldn't be more bullish about the future of software engineering. It's just gonna change the definition of what software engineering is. But the the core foundations of learning how computation works, learning how to think about this stuff from first principles will only become more leveraged.
[00:48:37] Tobias Macey:
Are there any other aspects of this space of agentic analytics, the work that you're doing on Compass, the leveraging of existing data infrastructure and data assets into this more AI driven interaction pattern that we didn't discuss yet that you'd like to cover before we close out the show?
[00:48:56] Nick Schrock:
No. I think we've we've done a pretty good we've covered a lot of ground, so I think we'll leave it here.
[00:49:02] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap of the tooling or technology for data management today.
[00:49:16] Nick Schrock:
It's always extremely unfair when you ask vendors this because we're morally obligated to talk our own book. But I am super interested in I'm, like, obsessed with this context engineering notion. And I think it's, like, gonna be a defining discipline for the next ten years. I think it's super, super early days. I actually think about a lot because my other kind of passion project right now is figuring out how to deploy AI and agentic authoring in real and large software systems. And I am very interested in the problem of keeping this sounds simple, but I think it's a big problem.
Keeping markdown files checked into a project up to date with the underlying code. Because I think this is a big problem. Because I think of these markdown files, generally, that are computed by agents, they're just to me, they're just token caches. Right? That LLM has, like, evaluated a bunch of tokens in the code base and then materialized that knowledge in more condensed form. Right? And I think that's actually gonna happen recursively in large software projects. But keeping it up to date, it's actually another instance of a data pipelining problem because you can't recompute it every time because it ends up being too expensive. So how can you do that intelligently and keep it up to date? I think it's just one pillar of what is gonna be needed to do AI accelerated software engineering at scale. That's my term, by the way. I despise the term vibe coding and hope we don't talk about it here.
[00:50:37] Tobias Macey:
Yeah. An alternative term that I heard recently is AI native engineering.
[00:50:43] Nick Schrock:
That's pretty good. I will take it. I will take it. Agentic engineering is pretty good too, but I don't know. Agentic is like one of these words now, which I, like, only use as a last resort.
[00:50:54] Tobias Macey:
Absolutely. Well, thank you very much for taking the time today to join me and share the work that you've been doing on your agentic analytics system and, the experiences that you've learned there. So I appreciate that, and I hope you enjoy the rest of your day. Alright.
[00:51:09] Nick Schrock:
Thanks, Tobias. Thanks for having me.
[00:51:18] Tobias Macey:
Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Guest intro: Nick Schrock and Compass focus
Agentic systems hype, progress, and context engineering
What Compass is: Slack native, human-in-the-loop analytics
Keeping data teams in control: context management philosophy
Bootstrapping the context store and Git-governed corrections
From conversations to data requests and org visibility
Conways Law, multiplayer agentic chat, and workflows
Impact on data infrastructure and architectural unknowns
Cost control: context pipelines, precision, and caching
Early user patterns, ops and investor use cases
Lessons learned and where agentic EDA does not fit
Roadmap: deeper Dagster integration and enterprise features
Biggest tooling gap: context engineering for AI-native work