In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Mark explains why agents require serverless, elastic, and operationally simple databases, and how AWS solutions like Aurora and DSQL address these needs with features such as rapid provisioning, automated patching, geodistribution, and spiky usage. The conversation covers topics including tool calling, improved model capabilities, state in agents versus stateless LLM calls, and the role of Lambda and AgentCore for long-running, session-isolated agents. Mark also touches on the shift from local MCP tools to secure, remote endpoints, the rise of object storage as a durable backplane, and the need for better identity and authorization models. The episode highlights real-world patterns like agent-driven SQL fuzzing and plan analysis, while identifying gaps in simplifying data access, hardening ops for autonomous systems, and evolving serverless database ergonomics to keep pace with agentic development.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
- Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
- Your host is Tobias Macey and today I'm interviewing Marc Brooker about the impact of agentic workflows on database usage patterns and how they change the architectural requirements for databases
- Introduction
- How did you get involved in the area of data management?
- Can you describe what the role of the database is in agentic workflows?
- There are numerous types of databases, with relational being the most prevalent. How does the type and purpose of an agent inform the type of database that should be used?
- Anecdotally I have heard about how agentic workloads have become the predominant "customers" of services like Neon and Fly.io. How would you characterize the different patterns of scale for agentic AI applications? (e.g. proliferation of agents, monolithic agents, multi-agent, etc.)
- What are some of the most significant impacts on workload and access patterns for data storage and retrieval that agents introduce?
- What are the categorical differences in that behavior as compared to programmatic/automated systems?
- You have spent a substantial amount of time on Lambda at AWS. Given that LLMs are effectively stateless, how does the added ephemerality of serverless functions impact design and performance considerations around having to "re-hydrate" context when interacting with agents?
- What are the most interesting, innovative, or unexpected ways that you have seen serverless and database systems used for agentic workloads?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on technologies that are supporting agentic applications?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- AWS Aurora DSQL
- AWS Lambda
- Three Tier Architecture
- Vector Database
- Graph Database
- Relational Database
- Vector Embedding
- RAG == Retrieval Augmented Generation
- GraphRAG
- LLM Tool Calling
- MCP == Model Context Protocol
- A2A == Agent 2 Agent Protocol
- AWS Bedrock AgentCore
- Strands
- LangChain
- Kiro
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data teams everywhere face the same problem. They're forcing ML models, streaming data, and real time processing through orchestration tools built for simple ETL. The result, inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed, flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high memory machines or distributed compute.
Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI engineering, streaming, Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workloads, see what it can do for you at dataengineeringpodcast.com/prefect. Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that time line by up to a factor of six while guaranteeing accuracy? DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems.
Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multisystem migrations, they deliver production ready code with a guaranteed time line and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they turn months long migration nightmares into week long success stories. Your host is Tobias Macy. And today, I'm interviewing Mark Brooker about the impact of agentic work flows on database usage patterns and how they change the architectural requirements for your infrastructure. So, Mark, can you start by introducing yourself?
[00:02:09] Marc Brooker:
Hi. Yeah. I'm I'm Mark Brooker. I'm a VP and distinguished engineer here at Amazon Web Services, and I spend most of my time working on agents and frameworks for building agents, infrastructure for building agents, and also on our database portfolio, especially focused on transactional applications, NoSQL, and and large scale applications and serverless. And do you remember how you first got started working in data? So, you know, I was working on the EC two team in in AWS, you know, back in back in 2011. And, you know, one of the, one of the things that we were doing was, was essentially being an amateur DBA team running our own database infrastructure.
And, you know, that was painful in a bunch of ways. We ran these huge MySQL clusters of, eventually thousands of machines. And as operationally painful as that is, and as much as I wouldn't want to repeat that experience, it did really give me a taste of of what was exciting about databases and interesting about databases and an inspiration for a set of problems that I wanted to solve on behalf of teams like mine and and and customers like mine. And this took another nine years before, you know, databases was really my my day job, but those experiences running and and operating and designing that fairly large scale system, you know, really stuck with me. And and I was very inspired to want to solve some of those hard problems. And so, you know, folks like me, the next round of folks like me didn't have to worry about them as much.
[00:03:37] Tobias Macey:
And now the world has obviously changed a lot since 2011, especially in the technology realm. And for a long time, we've had a very established architectural pattern for the happy path of an application where the three tier architecture gained a lot of ground, is widely deployed, well understood. And now we are entering a new era where agentic workloads are gaining prominence and utility, which turns a lot of our assumptions about usage patterns, traffic patterns, persistence requirements on their head. And so I'm wondering if you could just start by giving your understanding of the role that the database plays in a I'll use the term typical, although that it's still a very very evolving space, a typical agentic workflow.
[00:04:29] Marc Brooker:
Yeah. You know, obviously, evolving space, very exciting amount of innovation going on in in the agent world right now. And, you know, fundamentally, you know, agents and and generative AI and AI more broadly is a data game. You know, data is the core of of of training. It's the core of model building, but even more so at at inference time, at generation time, at agent run time, access to data is how these things are successful. There's no way for you to build all of the relevant data into a model. You couldn't do that. You wouldn't want to do that because of access requirements and so on. And so data really is at the core of, you know, these AI architectures in the same way that it has been in the core of that traditional three tier architecture and the architectures that came before it. If we look at the agent space specifically, there are a couple of different cases. Right? I think about data for agents to use. Right? This would be the case where, you know, you have a business intelligence agent, for example, and you say, you know, go and find me five interesting things, or or go and find me the five customers who grew most in the last week and what was common about them. And there's a set of access patterns related to to that, which I can go into. And then there is agents as builders. Right? Agents as developers. Agents as things that are either autonomously or much more commonly working with humans to build architectures in the cloud. And and and here, they could be building architectures of other agents.
Even more commonly, they're building architectures that look like, from a block diagram level at least, the the more traditional architectures. Right? They're building those microservices. They're building those SOA services. They're building those back ends for websites. You know, they're building they're writing that React code. They're writing that Java code. They're writing that Rust code. And there, the access patterns aren't that different. The access methods might become quite different. There are big differences in requirements for things like authorization and authentication.
But what is very different is the operational requirements. You know, how do you know if the database is scaling? How do you know if it's healthy? And and those requirements really have moved on. And so, you know, broad space. And then maybe I might think of a a third set of agents with the agents that are operating my as built cloud infrastructure, right, that are looking at and saying, is my latency good? Is my cost optimal? Are my customers having a good time? And, you know, if those things are no, then, you know, making changes to the system, scaling things up, scaling things down, adding capacity, removing capacity, reconfiguring, and so on. And so with all of those agent use cases, there are different ways that they're affecting data access patterns, but all of them, you know, fundamentally still have their architecture built around datasets, whether those are datasets in traditional relational databases, whether they're datasets in in things like graph and and vector databases, or whether they are much more kinda open ended data lakes where, you know, there if a whole lot of data, some of it's structured, some of it's unstructured.
[00:07:34] Tobias Macey:
Another interesting aspect to the agentic workload pattern that I've heard anecdotally is that for the Neon serverless application Postgres database that was acquired by Databricks as well as some of the databases that are available on services like fly.io, I've heard people say that the majority of the actual database instances that created are actually created by agents and not humans. And so they actually ended up changing some of their product road map to be able to optimize for those use cases and those usage patterns beyond just the human driven, I need a new database for my application, and I'm going to create it one time. And that's all there is, and there will be a fairly monotonic level of increased usage, but it's not going to be as spiky as some agent that is maybe spotting off dozens or hundreds of sub agents to do various workloads and store their state somewhere. And I'm wondering, in terms of the agent workload patterns, some of the behaviors that you've seen that are typical in terms of the means of allocating and interacting with that persistence layer.
[00:08:47] Marc Brooker:
Yeah. You know, obviously, I can't talk to the accuracy of those reports, but it it's certainly true that this agent as builder is is changing the way that people think about the requirements of database systems and especially relational databases. An agent is operated too. And, you know, if you think about, well, what what's different? You know, you look back in time, a database was, you know, this very specialized piece of hardware. I would order it with months of lead time. I would install a piece of software on it that I got from a vendor. It had a single license. You know, I'd spend potentially weeks setting it up and and tweaking it and getting it, you know, ready for for its workload, do a bunch of testing. And so often the lead time, even if we look back twenty years ago, the lead time for database provisioning was months and, you know, maybe at best weeks, best case days. You know, what happened with the cloud is bringing that down to hours and then bringing that down to minutes. And now with serverless really started this push, but agents are accelerating it, you know, folks want to say, you know, I want a relational database. I want it to have this data in it, and I want it to be available in in seconds. And then I'm gonna use it. Maybe I'm gonna use it forever. Maybe I'm gonna use it for minutes. Maybe I'm gonna use it for hours, and that has to be economical. And then if it does run for a long period of time, it's gotta be able to deal with that scaling. It's gotta be able to deal with failures. It's gotta be able to keep itself up to date. It's gotta be able to keep itself secure and patched. And so if you look at what we've been doing at AWS with D SQL where, you know, all of that patching and all of that scalability is built into the product, it's not something the operator needs to worry about. And in Aurora where, you know, we've been improving automatic patching, reducing the downtime, significantly accelerating the creation of databases, you know, both for Aurora D SQL and Aurora Postgres, significant scale up and scale down work, creation time work, you know, pricing to make these more dynamic data use cases viable for customers.
A lot of that has been work has been accelerated by agents and and the needs that these agents as builders and agents as operators have and the requirements that they're pushing on database vendors. Super exciting time in in the data space because of this. Right? You know, as an engineer, right, the hard problems, my motivation to solve hard problems is is inspired by my customers, and this rate of change is just a super exciting time. It's also kind of fulfilling as somebody who spent a lot of time working on on serverless, working on Lambda, you know, growing that business, and and really seeing the the serverless trend grow and and seeing the customer value that that's brought, you know, it's this is an acceleration of the things that folks were caring about in that serverless space. And so, you know, great innovation going on kind of across the industry.
And I think as agent as builder becomes a more and more ubiquitous pattern in, you know, in the industry, we're going to see this as become a set of hard requirements for for database products, and that applies to relational and nonrelational databases.
[00:12:01] Tobias Macey:
The other interesting aspect is that for somebody who is architecting their own application from scratch, they will typically have a certain set of requirements and constraints that will guide them into determining one of or possibly multiple of a particular set of categories of database, whether I'm going to be using a relational database because I'm going to be largely transactional in terms of my usage patterns, or maybe I need a document database because of the flexibility of input and not knowing exactly what the type of data is that I'm going to be storing, or maybe I just need a key value store because it's a very simple usage pattern. And as you bring agents and their autonomy into the mix, I'm curious how you're seeing that change the calculus that teams have as far as determining what are the persistence layers that they want to expose to those agents for being able to manage their own state or build their own systems and use cases that the orchestrator agent is maybe federating out to sub agents.
[00:13:08] Marc Brooker:
You know, and so if I think about persistence needs, there's some things that are new and some things that are just not new. And I'll I'll start with the not new, right? Like choosing the right database has always been an exercise in understanding the shape of your data and understanding the needs of your access patterns. And so if you have data that is, you know, structured, having a schema, putting it in a relational database can provide incredible benefits in terms of being able to normalize and being able to enforce business constraints at the database level. However, if that data is unstructured documents, you know, putting it in a relational database might give you a place to index it, but isn't buying you a whole bunch of value. And so you could put it in an object store, you know, like s three. If your data is fundamentally graph shaped, you know, putting it in a graph database I can provide some really beneficial different ways to access that data that can really help applications.
And I think where people get into trouble in the data space is following trends and and thinking about, oh, well, graph databases are hot right now, and so I'm going to take all of my data and try and force it into that shape. Or, you know, everyone's talking about object stores right now. I'm gonna take all of my data and try and force it into that shape. And that doesn't work. I mean, frankly, there is a reason that, you know, there are these decade long durable patterns for different ways of structuring and accessing data. Obviously, relational databases and SQL are super powerful and and and super flexible. And, typically, when when I talk to people about their architectures, I recommend that that's the kind of first place they should look. Right? If you're unsure, hey. You know, look at relational. Look at SQL. Does it fit into that model? But it isn't the be all and end all. And, certainly, there is space for NoSQL and and for key e value and for graph and for object and and and for document and, you know, all of these other different, you know, structures and access patterns. And so those those are the things that haven't changed. And the then the things that have changed is what we've seen with AI over the last five years or so is this real emergence of of new access patterns and new access methods. And, you know, probably the most interesting one of those is vector. And, you know, if you had said, you know, said to me when I was was studying computer science that, you know, we would have robust ways to take a piece of text, extract the meaning from that text, turn that meaning into a set of numbers, put those numbers into a multidimensional space so that, you know, other pieces of text with similar meaning would be nearby in that space. Right? I would have thought that was science fiction, but that is the way that vector databases work and what makes it so powerful. You know, what makes this idea of, I'm gonna take text, I'm gonna turn it into these numbers that that express honestly just just meaning.
And when I look it up, I can say, you know, show me the bits of text that are related to this one semantically, that mean similar things, that contain similar concepts. And that's new. I mean, you know, in a theoretical sense, it's not new. There's a set of techniques that has existed for for 30, but the emergence of that as a mainstream technique is is new. A mainstream technique that is powering, you know, everything from semantic search to to reg to a whole lot of, more emerging database techniques. And, again, the place that folks are getting into trouble with that technique is to say, well, vector is the only thing you need for AI, and so I'm going to take my data of whatever shape it is, and I'm gonna try and force it into into a vector data base. And clearly, that doesn't work. Right? There is still data that is relational, data that is graph, data that is documents, data that is, you know, keys and values.
And there is nothing inherent to AI, generative AI, agentic AI that stops them from using those other access methods, those more traditional access methods, very, you know, very efficiently and and very effectively. And so Victor has given us another tool in the toolbox rather than a replacement, and that tool has, you know, in turn enabled a whole bunch of new techniques, you know, like RAG, like Graph RAG, that have, you know, allowed AIs to become more useful and and and more powerful and, you know, more easily consume data that isn't in the training set or or or the fine tuning set. And so, yeah, I mean, you know, that's that's that's what's kind of been happening in in in that access space. But I do expect what we're going to see is some of the, I don't know, flashy newness around Vector kind of settle down and, you know, us go to a world where we have this new tool in our toolbox, but a lot of the agents we're building are using relational interfaces. They're using those document interfaces. They're using lookup by primary key, lookup by secondary index. They're they're using looking up by geo. All of these things that have existed in the database space for decades, oh, now we also have this one more, which is kinda lookup by semantic meaning, which is is very exciting and and new and powerful.
[00:18:08] Tobias Macey:
Another interesting aspect of the decision process for which style of persistence layer you want to use is the capability of the model that you're using where, particularly, a couple of years ago, a lot of models could not form properly structured JSON to save their lives. And, also, there was the initial advent of text to SQL, but there was a decently high error failure rate for that. One of the reasons being that SQL is based on mathematical primitives, and LLMs are hilariously bad at math or at least have been up until the past year or so. And I'm wondering how you're seeing some of those considerations factor into the style of persistence that somebody might select for either a particular model or a particular style of agent.
[00:19:00] Marc Brooker:
Yeah. So I think there are two things that are going on there. You know, certainly, one of them is an improvement in model capabilities. You know, models have become significantly more capable over time. We have, as you pointed out, seen this real improvement in in reasoning or or or something like it from models, these models that can go off and and and think and talk to themselves for a little bit and and kind of noodle their way through problems, and and that's been led to a new set of capabilities. Model certainly have gotten better at outputting data in certain forms. You know, you want JSON, you want XML, you know, you just want stuff that's tagged in a certain way. They're actually pretty good at that now. Certainly not infallible, but but have improved markedly over the last year and, and made some things possible that were very difficult before. The other thing that's happening, happening, I think this is a bigger trend and almost more important than the improvement in model capabilities has become the mainstreaming of tool calling as a pack. You know, we've seen that with MCP. We've seen that with a to a. We've seen that with these, you know, agentic frameworks where we've been able to give models tools that they can use and describe those tools in a way that will allow them to choose the right tool. And so, you know, if you say, hey. I want this JSON data as a CSV. You know, that's something that a Python program, you know, you could write that in probably, I don't know, three lines of Python, you know, and get great performance. You could write that in probably 10 lines of Rust and get fantastic performance. You did that with an LLM, it would be many orders of magnitude more expensive and and and many orders of magnitude slower and so and more fallible. And so, you know, being able to say, here, LLM, here is a code interpreter that you can use as a tool. And so if you run into one of these problems, instead of trying to noodle through it yourself, write that little piece of Python code and execute that. Or, you know, here is a database API, or here is a microservice API as a tool. Right? Here is a SaaS application as a tool, and you can use and you can use those things to access data in a structured way. And models have become much, much better at selecting those tools, calling those tools in a successful way, knowing when the tools are the right things to to use and when they help. And so if you look at even, you know, the cutting edge research on or or the sort of frontier of models and the frontier of reasoning, what you're seeing in the benchmarks isn't, hey. This is just the model doing it itself. It's it's the model doing it with a tool and often with a code interpreter. And sometimes that code interpreter has a scientific library, maybe a symbolic math library. You know, certainly, it's got CSV parsers and JSON generators and all of these other things that are, you know, super powerful. And so, you know, you asked about something like NL to SQL. You know, you certainly do still need the ability to for the model to think, you know, hey. What data do I want from this database and and what what schema is there? It's gotten better at doing that, but we've also gotten better at saying, you know, here's a tool that is a specialized interface for for running these kinds of things, for maybe accessing data with the right structure, you know, maybe hiding some of the complexities from the model that it doesn't need to worry about. And, you know, that has provided a a whole new set of capabilities to to people who are building end to end systems with models. It's allowed us to use bigger and more capable models because we can do the heavy lifting of data movement in, you know, plain old boring code. And in turn, that's that's helped, you know, spin the flywheel of, hey. You got a more capable model. It's better at using the tools. It's better at building its own tools when it needs to. And so I think the the combination of of those model capabilities and tools and tool calling are, you know, really spinning the flywheel of of models and agents right now. And I think what we're gonna see is continuing improvement on both of those things. But I believe that, you know, the the the tool calling side and, you know, figuring out which tools are most useful, making those tools available to models is going to have, you know, probably be the bigger deal over the the next few years than than model capability improve.
[00:23:11] Tobias Macey:
Another interesting aspect of persistence and state management in the context of these models and agents is that the models themselves are effectively stateless, where every single time you make a request to them, you have to be able to provide them with the appropriate context or preexisting state for them to be able to do anything useful because they don't have any actual memory beyond some tertiary persistence layer that they might rely on for being able to retrieve additional context. And in terms of the actual runtime of the agent or the LLM, because of the fact that every interaction is an API call, there is no real benefit to maintaining persistent state within the process that is mediating that interaction aside from network latencies.
And given your experience working on the Lambda tool chain for AWS, I'm wondering how you're thinking about the benefits and trade offs of using something as ephemeral, as a Lambda for something as ephemeral, as an LLM call, and the varying layers of persistence and network retrieval, etcetera, in terms of that overall interaction pattern.
[00:24:27] Marc Brooker:
Yeah. You know, I I I think it is really interesting, the sort of parallel between, you know, Lambda style stateless functions and, you know, and the statelessness of of of LLMs. But while it's true that these LLM calls are pretty much stateless, except for, you know, things like prompt caching, which has started to introduce the soft state into LLM interactions, the agents themselves, you know, the the agent code has a lot of ongoing state. It's got a lot of memory. And so if you build an agent with a framework like Strands or a framework like Langchain, there is a bunch of, you know, just variables, stuff that it remembers as it's doing an LLM step and then maybe a tool call and then maybe integrating with something else and then, you know, going back to the LLM.
And so I think, you know, short lived single shot agents, you know, Lambda can be a great fit for those kinds of workloads. But a lot of agents aren't like that. They're longer running. They've got more state. They they're using, you know, more imperative code inside their agent frameworks. And that's where, you know, a couple of months ago, we announced and and and shipped the preview of a set of capabilities of AWS, new capabilities called AgentCore. And, you know, one of the pieces of AgentCore is the AgentCore runtime. And you can look at agent core runtime, and it looks, in some senses, a lot like Lambda. Right? It is this, you know, completely serverless runtime for for just chunks of code. You can bring a container. You can bring some code. You can bring your code in your framework, and we can run it. But unlike the Lambda product, agent core runtime calls, can run for hours, and they are isolated in different ways. I I could talk about I'll talk about sessions in a minute. But this allows you to build an agent with a framework like Strands, with a framework like Langchain that runs for a long period of time and mixes LLM interactions with interactions with tools, with interactions with other, you know, other services, with imperative code, with local communication, and all of the other things. Like, it lets you sort of pull from the toolbox basically whatever you want for agent building. But, you know, the the one of the huge benefits of of Lambda has been that single shotness. Right? The relative statelessness is a powerful tool for thinking through the security model of Lambda functions. And so we wanted to keep that. And so what we did with agent core run time was every agent session, and so you bring a session ID and you say this is, you know, my interaction with this agent, gets its own micro VM, gets its own environment, gets its own strongly, you know, strong isolation from all other sessions. And so if I'm interacting with an agent for a task at home and for a task at work, those are strongly isolated. If you're interacting with an agent and I'm interacting with an agent, those are strongly isolated in agent core with VM isolation. Right? Like, there's a real security boundary.
And the combination of these things, this fine grained isolation model at the session level and the ability to have these sessions run for hours and hours relaxes some of those constraints that you would think of if you had something like Lambda while providing a lot of the similar benefits both operationally and security wise, that you would get from from a serverless compute runtime. And, you know, that's really been the goal of the team is to provide both of those things as we as we build these primitives for running agents. But as I said, you know, I you you there's nothing magic about about agents. Right? There will be some agents that are a great fit for agent core run time. There'll be great fits some agents that are a great fit for Lambda, and there will be some agents will want their own, you know, a 192 core box with a terabyte of RAM and and, you know, you'll you'll buy them a whole EC2 instance to to run on.
And I expect that over time, we're going to see agents of all shapes and sizes that fit essentially across the whole of the current compute ecosystem.
[00:28:36] Tobias Macey:
The other aspect of agents is that because there's not necessarily any human associated with it, their usage patterns, both in terms of frequency, but also in terms of distribution, are unbound from physical constraints, which brings me to the D SQL project, which, to my understanding, is a geodistributed horizontally scalable relational engine. And I'm curious how you see that fitting in with this overall use case of expanding the autonomy of these agentic workflows and their ability to operate at the various geo replicated locations that are most conducive to the particular task that they're trying to perform for a given end user.
[00:29:25] Marc Brooker:
Yeah. You know, I I I I think there is a lot of, you know, a lot of overlap between, you know, those requirements and and and the needs of agents. And so, you know, I'll start with, you know, d SQL is a a about a year old, slightly less, about ten month old, product that we we announced at at reInvent last year. It is a serverless relational database. It's Postgres compatible, scales up and down. You know, it scales down to zero, scales up to essentially, you know, whatever size you you you need it to. Serverless operations. Right? There are no boxes. All of the patching is done for you. All of the failure tolerance is built in. And, you know, it can run-in this optional mode where instead of being only fault tolerant within a single AWS region, which offers a great level of fault tolerance already, can also run across multiple AWS regions with a full replica. And that means that a whole AWS region can become unavailable, and your database can still keep going with strong consistency with no data loss and close to the customers that matter. And those things are obviously important for traditional applications. Right? If you think about highly regulated industries, you think about financial industries, you think about things like retail, where having the business be available, you know, no matter what happens is is critical. And those were the workloads we really had in mind as we were thinking about DSQL and thinking about kicking that off. But, you know, as agents have emerged, a lot of the same requirements have been there. Right? Like, the operational requirements of I appreciate not having to patch. I appreciate not having to worry about, failures. I don't need something to fix things. It's just this this endpoint that my agents can use that is is always available. Then as you said, while they're unconstrained by these constraints of the real world, they're unconstrained by the constraints of humans. And so we should expect that agentic workloads are much spikier in scale in in in time. Right? Like, hey. I I want to go and ask 10,000 agents to cluster around a problem. They're gonna work, maybe do, you know, millions of transactions a second for a few hours and then nothing. And so that scalability becomes super important.
The pay as you go pricing model becomes, you know, becomes super important. The ability to, you know, handle both big data and small data becomes super important. And then as you say, there's also geography to to think about with that. And there are two aspects to to geography. One of them is because of the demand for inference, some workloads are needing to go, you know, geographically where there is inference capacity available for them. And so having the flexibility to say, I'm going to run this interesting agentic workload closest to the most economical inference capacity is a real differentiator for a lot of workloads and is practically very, very useful.
The other thing with with geography is about being close to, you know, those human end users, giving people really quick access to the data they need, the results of their agents. And, you know, and so having the ability to have a strongly consistent database, have a fault tolerant database, and to have a database that is global in multiple AWS regions, potentially on different continents, is extremely powerful and extremely useful in providing yet another flexible building block for agents to use. And so yeah. I mean, again, like, those workloads are not new. They've been workloads that have been in the background, but they're becoming much, much more mainstream as agents drive a new set of access patterns and new new different kinds of architectures,
[00:33:08] Tobias Macey:
you know, both the needs of running agents and the needs of running the architectures that agents are building. And you've also been involved, to my understanding, with the Kira project, which is Amazon's new entrance into the agentic development ecosystem. And so that gives you a different perspective on agents that are operating more in a local mode versus in a deployed environment. I'm wondering what you're seeing as some of the common patterns and ways that they vary between those local versus deployed environments and then especially given our the context of our conversation, the ways that you need to think about state management across those boundaries.
[00:33:51] Marc Brooker:
Yeah. So Kiro is super exciting. You know, Kiro is is the the IDE that we we we released about about two months ago. And, you know, this core idea behind Kiro is this idea of specification driven development, is the idea of, you know, instead of having a tight loop with with a an LLM where I'm doing this five coding thing of kind of step by step asking it to do things, I instead, maybe with help of the LLM, write a specification. I write down what I want my program to do, what I want my service to do. And then with that specification in hand, the agent can have a lot more autonomy to go off and make long term decisions, can see the bigger picture, can iterate on things, can, you know, highlight, hey. I actually didn't understand this the split in the specification. Let's go back and, you know, fix, you know, fix this ambiguity.
And, you know, and so specification development really drives two things. One of those things is taking the, I don't know, chaos of vibe coding and turning it into a much more, repeatable software engineering practice. And the other is to giving the development agent a lot more context and a lot more work to take on at a chunk or unleashing it to to do a lot more. And do that doing a lot more could be doing it locally, or it could be doing it remotely. Right? You might wanna say, hey. I don't want this running on my laptop right now. I'm gonna close my laptop, go home, and, you know, have have it continue the development task running elsewhere. And then you asked about local versus remote. And so if you look at the current MCP ecosystem around data, for example, everybody, including AWS, we started with local. And we started with local because it's easier. Right? Because you don't have to worry about having this endpoint available to the outside. You don't like, authorization and authentication are much easier. Connectivity is much easier. You know, you don't have all of the scalability challenges.
Typically, those MCP services, local ones can be simple single threaded programs that are proxies around something that you already have access to on your laptop. The remote ecosystem is more complex. Right? It has real authentication and authorization requirements. It has multi tenancy requirements. You know, often those things will be exposed to the Internet or exposed to large scale corporate networks where they need web application firewall type controls. They need DDoS protection type controls. They need billing and quotas and, you know, API definitions and all of these other things that real Internet scale services need. And so, you know, we I think the whole ecosystem started local, but it's going remote because remote is much more powerful and much more flexible. But the requirements are also hot also, you know, higher. Right? There's a higher bar to cross. And so the biggest trend, I think, is we're gonna go away from as many local MCP servers to web services and SaaS services and all sorts of other services having, you know, real remote MCP endpoints or real remote endpoints that LLMs can can use as a tool call.
You know, authenticating on behalf of a human or authenticating as, you know, machine to machine, fine grained authorization and all of these other things. And so I think that is the trend. We're gonna see this trend from local to remote over the next six months to a year or so as the protocols get get more mature and and that's something that we're contributing to heavily at AWS. You know, as offerings like our own agent core gateway get more mature and and and make it easier and easier to offer those, remote MCP servers. So I think that's gonna be the big trend. There is always gonna be some stuff locally. Right? Like, they're going to be local resources I want to interact with locally, like my file system. You know, like this thing this tar file I just downloaded. Right? You know, maybe local datasets. Maybe I want to have some local data for dev and test, have a really tight feedback loop on on development and testing. And so I think local will really be focused on dev and test, focused on tools that are powering the daily developer work, providing access to local resources.
And then anything that starts to look like production data, starts to look like shared data, starts to look like multi tenant data, I think is going to turn more and more into looking like a web service, but with, you know, MCP instead of, you know, HTTP REST as a way to, you know, to to make it easier for an LLM or for agents to to call those resources. And I'm sure there's some innovation, you know, that's gonna go into local. I'm sure that there's going to be some innovation in those developer tools that, you know, makes it easier to work with remote datasets locally and and get the benefits of that. But I think we're gonna see a very similar software development life cycle in the agentic development world as what we see in the cloud where, you know, folks are developing locally, running simple testing locally, and then, you know, there's a step out to a sort of real CICD pipeline where things are running against kinda real versions of services for integration testing and then finally in production testing. The other part of that picture is obviously evaluation, which is this very sort of AI and and and agent specific problem where you want to ask these questions like, well, is my agent working?
And the closer you can do that into production, the more high fidelity that answer is. Right? Because it's running against the real services. It's running ideally with real data. It's running with in the real environment. And so much like in the traditional software development life cycle, you the closer you test to production, the the more bugs you find. I think we're gonna see the same with with evaluations. So there's a lot going on in that space. I think local will survive, but I think we'll have seen less emphasis on local than we have over the last six months or so.
[00:39:42] Tobias Macey:
And in your experience of working across these various tool chains and product suites, what are some of the most interesting or innovative or unexpected ways that you've seen the combination or in isolation usage of serverless and database systems for these agentic workloads?
[00:39:59] Marc Brooker:
Hey. You know, this is where it becomes tricky because I have to think about which of the, you know, which of the great customer stories I can, you know, I can I can share? But, you know, one of the things I really liked was, you know, the team that was developing D SQL invested a lot in fuzz testing of of testing D SQL against Postgres to make sure that, you know, if you run the same queries, you run the same transactions, you get the same results. And, built a quite sophisticated automatic fuzzer that could generate SQL. And then what the team realized was, you know, they were thinking about, you know, how do we add capabilities to the fuzzer? And one of the capabilities is to say to an LLM, you know, hey, go off and generate a bunch of weird SQL. Right? Go off and generate a bunch of weird transactions with weird costs and join orders and all of these other things and, you know, and then we can run them against these two databases. And so that was a case of using generative AI as an additional fuzzing technique as we as we test that to that database, but also using agentic patterns as a way of of running those techniques and and and looking at them. You know, in a similar vein, I was talking to a customer just the other day who had built a really cool little agent to look at explained plans from Postgres and from MySQL and turn those into hints for their developers to optimize their code. And so they had taken this previous engagement, which had been an engagement between their teams and their DBAs, where they would have an an expert go and look over a piece of SQL, which was time consuming and expensive and and and so often skipped, and then building that same expertise into an agent that they could use as part of their development cycle, run the the queries with explain or explain analyze against the database, get back the plans, have the LLM and and, you know, some fixed code, analyze those plans for good practices and for optimality, and even directly suggest you know, send a code review for you are running the query this way. You should run it this way instead. You'll get the same results in a tenth of the time or and, you know, and save a bunch of costs in the database. And so that's the kind of development time innovation I think there's this huge explosion of. I think production workloads are are earlier. There's ton of excitement around those, but I don't have, you know, great stories yet for folks who are, you know, really seeing innovative or, I mean, there there are a ton of these great stories, but I I don't think I have any to share yet of of folks who are seeing, you know, really great business impact from in production agents that their business users are interacting with and and their end users or their customers are interacting with.
[00:42:46] Tobias Macey:
And in your own experience and work of developing in this space and working to help your customers stay up to speed with best practices and evaluate the ecosystem, what are some of the most interesting or unexpected or challenging lessons that you've learned?
[00:43:02] Marc Brooker:
Yeah. It's just been a bunch of unexpected things. Like, I think the relatively unpredictable curve of model capability has, you know, caught us caught us a little bit off guard. You know, if you look at the sort of previous wave of agent building patterns, there was this really dichotomy of plain text agents of, hey. I'm just going to prompt. Right? It's just prompting. I'm just going to ask the agent what I want to do. And purely imperative agent frameworks where you say do this step and do this step then do this step. And then if you look at the newer sets of agent building frameworks like strands, you know, they get around the lack of power of the pure prompting approach by allowing you to bring imperative code, but they get around the the fact that the imperative code wasn't really taking advantage of model capabilities by making that code much more declarative. And so it's much more about, hey. Here's the thing that I want to achieve. Maybe I'm gonna express that in natural language. Maybe I'm gonna express that in code, and I'm gonna have a lot more success building agents from that. And I think even a year ago, I think people would have been fairly surprised that that mix of of declarative programming and imperative programming and prompting was gonna turn out to be the most successful way now of building agents in general. Like, obviously, the great success stories, you know, across the whole spectrum of tools there. I think the other things that we shouldn't find surprising, but I think have caught folks off guard, has been the strength of some of those kind of, you know, for want of a better term, fashion cycles. Right? Hype. You know, there was this huge hype about a year ago, maybe eighteen months ago around RAG. Right? And and RAG was gonna be the be all and end all of of of data access patterns. And vector, you know, vector databases were going to be the only thing that that folks used for for GenAI. And then, you know, models came out with bigger bigger context windows, and there was this whole trend of of folks saying, oh, well, RAG is dead. Right? We are just gonna throw all the the data in the world into the context window.
Well, obviously, you can't do that either. And so, you know, RAG isn't dead. RAG has just joined our toolbox as one of the ways, one of the patterns, but it also hasn't turned out to be the be all and end all. And so I think what catches people off guard is this blend of, you know, what's really changing fast and what hasn't changed. And I think what hasn't changed is our data's particular shapes. Right? There are values in modeling data in in particular ways. Right? Relational, this model that is ancient in in the grand scheme of computing remains very, very relevant to agentic applications today.
And so I think that's what are going to what's going to keep catching the whole industry off guard. This is gonna continue to be hard to predict what remains the same and what changes. Obviously, there will be some things that are remain nonnegotiable. Right? The security of of and privacy of my customers' data that is is going to remain critical to me and critical to them, you know, no matter what else happens in in the world. You know, agentic applications are going to have to make business sense. Right? They're going to have to, you know, do more than than they cost, and and and that's not gonna change. And so there are some things to stand on that we know are going to remain relatively constant, but it is going to remain quite hard to predict for all of the remaining stuff what is gonna be replaced by something new and what is going to remain these sort of durable, pieces of value like the relational model, like the graph model, like the document model, you know, throughout this whole kinda AI AI revolution.
And, yeah, it's gonna be exciting to see, you know, what happens there. Hard to predict and, you know, these hard to predict times are are the most exciting times to be working in a space and and and innovating in that space.
[00:46:50] Tobias Macey:
Are there any other aspects of this overall topic area of persistence and the various agentic patterns and how they interplay with the need for storage and retrieval of information that we didn't discuss yet that you would like to cover before we close out the show?
[00:47:07] Marc Brooker:
Yeah. You know, I think there's one other other one that we haven't touched on, and that is this, you know, emergence of object storage as being the go to back end for for data applications of all types. And so if you look, even though D SQL is this, you know, very transactional focused system, it's this very OLTP focused system, the bottom total of its durability story is s three. Right? It it is object storage. You know, we've seen the analytics and data warehousing and data lake world really sort of embrace object storage, embrace these formats for data in object storage over the last few years, and and that has become the way that that that is done. And we're seeing more and more vector databases and transactional databases and graph databases that are essentially smart optimal front ends on top of object storage as that durable store. And that makes a ton of sense just as an architecture, as a cloud architecture, as the endpoint, I think, of a lot of, you know, twenty years of cloud innovation.
But it's also super convenient in this AI world because, you know, as I started off with, AI is fundamentally this data game. Right? Like, it is fundamentally a a a thing that gets better with access to data. And if you have all of that data together in your object storage in a format that it can be accessed together, that data is much more powerful for building applications, for training models, for fine tuning models, and for access from agents and access from architectures that agents are are are building. And so I you know, it's it's a trend that I think is going to turn out to be long term just, you know, super valuable and and and one that's very aligned with the needs of AI, but also a continuation of of a trend of innovation that has come from, you know, twenty years of, of cloud architectures and large scale architectures and and iterating on those things. So that that's another thing that I'm really keeping my eye on. We're investing in here at at AWS and making sure that, you know, our customers and their customers have easy access to to all of their data and through access to data through the right interface for the right structured data, which really has been the key to success in this industry for its entire four decades of history.
[00:49:24] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:49:39] Marc Brooker:
Yeah. Tooling and technology. I think there are a couple of gaps. You know, one of them is we still haven't, you know, reached this ideal of simplifying business user access to all of the data that a business has. Right? When I see, folks around me, even though, you know, at AWS, we have quite a sophisticated architecture around that, want to answer questions across all of their data and all of the data available to them, that is still way more difficult than it needs to be, you know, way more difficult than the entitlement. And and so I think that that's a place we're gonna continue to see a big wave of innovation, you know, both from AWS and and and across the industry. I think we're going to see a some of the gaps in the way that especially relational databases think about identity and think about authorization become more visible as folks want to build agents. And so you want to express things like this agent working on behalf of these five people. There's no way to express that in a model like Postgres's grant model. Right? And so, you know, okay. Well, there's gonna be a gap there. And it needs to be a gap there that we can address both with great security, but also, you know, flexibility that is, is meeting business requirements. And then I think the third thing, you know, is operational. Right? Databases, you know, we've made a ton of of progress in the cloud. You know, there there are a ton of really great low operations, you know, database options. But as we see more and more data systems being more and more automated through these operations agents, it's going to become even more necessary that those things have very simple operations, very simple security models, very simple scaling models. And, you know, I think, in a lot of ways, kind of serverless has won even though we haven't you know, that hasn't really sunk in yet as as agents really take take over. And, and and there does remain, you know, there remain gaps in in all of the product offerings across the industry. I'm super optimistic about that one, by the way. I think it's going to be, again, a real big driver of of a wave of innovation, a wave of simplification.
[00:51:45] Tobias Macey:
Alright. Well, thank you very much for taking the time today to join me and share your thoughts and experiences on these various agentic patterns and the ways that the persistence layers and execution environments will help to facilitate and shape the ways that people are bringing these new capabilities to bear. So I appreciate all of the time and energy that you're putting into being an enabler for all of that, and I hope you enjoy the rest of your day. Yeah. Thank you. It's been a great opportunity. Thank you for listening, and don't forget to check out our other shows.
Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and colleagues.
Introductions and episode focus: agentic workflows and databases
Mark Brooker’s background: from EC2 ops pain to database passion
What is an agentic workflow? Roles of data and databases at runtime
Agents changing provisioning: instant, ephemeral, and economical DBs
Choosing the right persistence: relational, NoSQL, graph, object
Vectors as a new access method—powerful, not a replacement
Model improvements and the rise of tool calling for data tasks
Stateless LLMs, stateful agents: Lambda vs. AWS AgentCore runtime
Geo-distributed needs and DSQL: serverless Postgres with global reach
Kiro IDE and the shift from local MCP to remote, production-grade tools
Real-world patterns: AI for SQL fuzzing and plan optimization
Lessons learned: blending declarative, imperative, and prompting
Object storage as the durable backbone for AI-era data
Gaps ahead: unified data access, authZ for agents, ops simplicity
Closing thoughts and sign-off