Blurring Lines: Data, AI, and the New Playbook for Team Velocity

Hope you enjoy this crossover episode of the AI Engineering podcast, which is another show that I run to act as your guide to the fast moving world of building scalable and maintainable AI systems.

As generative AI models have grown more powerful and are being applied to a broader range of use cases, the lines between data and AI engineering are becoming increasingly blurry.

The responsibilities of data teams are being extended into the realm of context engineering as well as designing and supporting new infrastructure elements that serve the needs of agentic applications.

This episode is an example of the types of work that are not easily categorized into one or the other camp.

Hello, and welcome to the Data Engineering podcast, the show about modern data management.

Data teams everywhere face the same problem. They're forcing ML models, streaming data, and real time processing through orchestration tools built for simple ETL.

The result, inflexible infrastructure that can't adapt to different workloads.

That's why Cash App and Cisco rely on Prefect.

Cash App's fraud detection team got what they needed, flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows.

Each model runs on the right infrastructure, whether that's high memory machines or distributed compute.

Orchestration is the foundation that determines whether your data team ships or struggles.

ETL, ML model training, AI engineering, streaming, Prefect runs it all from ingestion to activation in one platform.

Whoop and 1Password also trust Prefect for their data operations.

If these industry leaders use Prefect for critical workloads, see what it can do for you at dataengineeringpodcast.com/prefect.

Composable data infrastructure is great until you spend all of your time gluing it back together.

BRUIN is an open source framework driven from the command line that makes integration a breeze. Write Python and SQL to handle the business logic and let BRUIN handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement.

Bruin allows you to build end to end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster.

Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform.

Go to dataengineeringpodcast.com/bruin

today to get started. And for DBT Cloud customers, they'll give you a thousand dollar credit to migrate to Bruin Cloud.

Your host is Tobias Macy, and today, I'm welcoming back Max Boeschman to talk about the impact of multiplayer, multi agent engineering

on the individual and team velocity for building better data and AI systems. So, Max, for anybody who's not familiar, can you give a quick introduction?

Yeah. I'll do a quick intro. So I've been on the show maybe, like, what, five or six times, maybe more. So I've done a lot of work in data engineering in the past. So I started at Apache Airflow in 2014.

Probably Airflow doesn't need much of an introduction

anymore. It's used everywhere pretty much. And then over the past decade, I've been working on Apache Superset. That's a data visualization exploration

platform that's fully open source that's in the space of business intelligence. So, like, trying to push

Tableau and Looker and these, like, proprietary software solution

out of the way and displays with open source. Superset has gotten really, really good over the past few years or so. It's a super great app. So if you haven't checked it out, I would recommend you check it out and you stop

using proprietary software for data visualization, data consumption. What else? Yeah. Maybe it may also may also a a thing or two. Since the beginning of the year I mean, I've been adopting AI since, you know, g p t three five came out. But over the past, I'd say, six to nine months, like, really onboarded on Cloud Code and codecs and all their agent decoding solution

and been really thinking about how to paralyze

as many agents as possible. So these agents got so good and they're getting better. So since I would say April or May, that's when I, discovered Cloud Code.

I had, like, just

enlightenment,

you know, hit me. And I was like, okay. How do we really,

squeeze the juice out of this super amazing technology?

Realized very quickly that the the world of software engineering was gonna change very, very quickly,

and, onboarded

hard. And since then, I've been pretty much trying to figure out how to get more and more agents to do, like, you know, really great software engineering. So how do you team up with agents? How do you team up work as a team with agents? And,

at Preset, so I started a company five or six years ago. And at Preset, we onboarded hard on Cloud Code. So everyone got a anthropic Pro Max plan,

and we've been really reshaping the way that we work internally as a team to really,

get as much of a a boost, from these code like, AI coding agents, as much as possible.

And for the day to day usage of these AI

engineering agents, what are some of the typical types of tasks or workloads that you're farming out to them, and what is your heuristic for when you want to keep something for a human operator to actually maintain versus handing it off to the AI?

Pretty much everything. So, you know, as a founder, you get to do a lot of things, like, you know, including, like, legal operations

and, of course, some product development is the is the main focus and the main thing. But I would say, since g p t three five and more and more so over time, like, I developed a AI first reflex for just about everything. Excuse me. So the bulk of what I do, I try to do with with an AI and then more and more, I just fire up, you know, Cloud Code or Codex for, I would say, you know, 95%

of what I'm doing, including operations type work, including, yeah, legal, research, planning, preparing talks, writing blogs.

So I've been pretty much using

for everything. I would say the thing that's kinda left to me and that, you know, remains kinda my control plane would be like orchestrating agents really. So that really might see my role more and more as, you know, orchestrating agents. And, of course, there's the people component, you know, manage a team and wanna grow, foster, like, leaders, you know, inside the company. So that's a much more of a human thing. But, since the get go, I've been pushing everyone to use, you know, AI at everything it's good at pretty and it can really accelerate, you know, multiply efficiency across the board. So I've been pretty radical on using AI myself and and pushing others to to do so as

well. Yeah. One of the common themes that I've been seeing from the various blog posts and communications

of other folks in the space is that the thing that is going to remain distinctly human is being sort of the tastemaker, the person who has opinions about things versus the person who's actually doing the execution.

And one of the other interesting aspects,

that I've seen is that at a certain point, there are only so many agents that you can manage just just in terms of your own cognitive overhead, where you're trying to keep track of what are all the different things that they're doing. And I'm curious what you're seeing as you do accelerate more of the actual execution phase, how that changes where the bottlenecks show up in your own work and in your team.

Yeah. I think that's that's a good question. And, you know, that's been shifting quite a bit as people adapt their their workflow.

I wrote a blog over the weekend around, you know, how AI is changing software engineering and really see the future going towards agent orchestration and started building a tool that we'll we'll probably talk about in this, this episode around, like,

tooling to make it easier for you to

stay sane while managing a lot of agents and also allowing you to collaborate with your team. So bring,

you know, an army of agent, but also bring your team to, to share, you know, AI sessions and get work trees and potentially, like, dev environments too. So it's kind of bringing back dev environments

to the early UNIX days where multiple people work on the same box so they can share, you know,

a development environment that's live. But the bottlenecks have shifted very, very quickly. Right? So if you you know, people report being

different degree. Some people will say, like, I'm 20% faster. I'm slower. Personally, like, since the get go, I've been, like, two, five, 10 x. Like, somewhere, I would say, between two x and 10 x on coding specifically or, you know, the the

act of, you know, designing and shipping software. Some of the areas that are still, you know, that are kind of becoming a traffic jam now are, you know, code review for sure, QA. Right? So you still need to go in your app, though you can get a lot of help from the agents to write good unit tests and integration tests or even, like, Playwright MCP, which is super great. If you haven't tried it, you should try it. But, Playwright MCP is, like, browser type automation. So agents can go in your app, look at the the consult the consult, take screenshots,

use multi modality while using your app. So I would say everything is accelerated even the the areas that are more bottleneck y now, like the code review. Like, agents are great at code review. You know? Now

I don't push a PR without having, codex five dot one. That's out I've been out for a week now. But, I'm using codex to do first first check first round of review on most of my my cloud code edited code. So So I think all these areas can also

be accelerated by AI, but I would say the real traffic jam is around, like, async workloads. Right? So if you code 10 times faster,

there's a lot of PRs out and you probably need, you know, people to come review your code depending on your internal policy. You know, there's an interesting question. I'm thinking more now that can I can we just say that the AI wrote the code and I'm the reviewer so I don't need to bring another human at least, you know, for some PRs?

So being able to elect to say, like, listen. The AI wrote the code. I'm the reviewer. I don't need another human in the loop that can

probably greatly accelerate things. So that bottleneck of, you know, you ask for code review.

What's the mean time to first review? Right? Twenty four hours, forty eight hours. Like, if you're in a very dynamic organization

that has shifted their workflow to it because of AI, maybe you can get review the same day. But that workflow is just like now your your your context, your own personal human context evaporates. Right? And then someone else has to build up that context to do code review. And then it's been forty eight hours and you get back in that AI session if you still have that CLI open. Right? If you kinda saved it in a TMUX window, the work tree with the Docker environment, maybe you can get back to it or maybe you have to fire up a new session and load up the context in your brain inside the the AI's,

context window. So that, I would say, is really the thing that I'm trying to or one of the things, you know, I'm trying to, address with the the the agent orchestration tooling I'm building right now.

One of the other interesting aspects

of these agentic capabilities

that I found as I was going through that adoption curve is that at the outset, there's a little bit of uncertainty as to what you can use it for because you haven't started to explore

their boundaries,

and you don't necessarily have the workflow set up for being able to easily give the agents the appropriate context or you don't necessarily have all the tools

available that make it easy for the agent to validate the changes that it's making.

And I'm curious how you're using some of that adoption curve of using these agentic engineering processes

to build the tools that are necessary to further accelerate the agentic use cases.

Yeah. I think I think it's interesting where you hear from some people how

they try AI at something and it didn't work, and therefore, AI is bad at x. Right? But I think those boundaries are shifting really fast. So if you think about the horizon of what AI can do well or what you know that AI can do well, that's shifting very, very quickly. So you need to, like, keep testing the boundaries. So maybe AI was not good at using a browser automation to run some tests like last time we tried three months ago. But now there's Playwright MCP. There's, like, cloud codes got better or, you know, the the latest version of the models got better multi modality

and the tooling has improved. So I think you need to constantly

revisit the boundaries that you found in previous experiment. So it's not because it didn't work a month ago or three months ago that it won't work now. So that's something to to keep in mind. Now in terms of if you have this AI first reflex, there's no question of whether you're gonna use AI or not. So it's like, how do I get an AI in the right context to help me accomplish this task? Right? So there for sure, you know, maybe the first reflex you're working on a new repo, assuming you're working on software engineering. Fire up cloud code, run slash init, and you get a cloud dot n d, you know, in that repo. Maybe you install the GHAs

on that repo to to have something like cloud code or codex review. So you can add cloud, can you review this PR first? And then maybe you start building a little context folder with, context nuggets that are useful for the task at hand and might be reusable or useful in future AI tasks. So that's one thing, you know, I I think the first reflex was to create a cloud dot MD and cram more and more stuff into it. That doesn't scale. Right? Like, you cannot just have, like, one context file. So on our side, what we discovered is we're starting to have a context folder in all of our repo where we keep,

what I call context nuggets, but just mark down files that are, you know, generally less than 500 line of documentation

that can be retrieved for a different task. Right? So if you're working you're getting an agents to write unit tests, then you fire up your front end or or you bring in context, your front end convention, maybe your testing guidelines. Right? So if you have this portfolio

of context nuggets, then it becomes pretty easy for you to retrieve the context that's required and just that. Right? Not just like vomit the context over the context window, but just like pick the things that are gonna be useful. So our club code or agents.nd,

preset has become more and more just a glossary

of these context nuggets.

So treating context as code is, like, you know, the first thing to do probably.

So that's definitely an important thing to do, to do early on and systematically now.

And I think too in the data context, it becomes particularly interesting as far as that context engineering challenge

of how do you wire in the tools for the agent to be able to understand the overall organizational

context and the system context

for what data is being processed, how it needs to be transformed

because the data itself is going to immediately blow up your context window. So how do you determine what are the appropriate pieces of metadata? How do you structure that metadata effectively?

How do you wire that into various MCP tools or whatever other command line utilities you want to pass to the agent to be able to have it help you effectively

identify issues with your pipelines or design new pipelines or manage your overall warehouse schema, particularly if you're talking about enterprise scale just because just the schema itself, just not even with any of the data, that can easily explode your context window. So I'm curious how you're seeing teams

use some of these agentic coding practices in that context of the data ecosystem where even just the metadata could be too large for the agent to be able to properly encompass?

Right. So there's, you know, may maybe different categories of context if you think about it or maybe one way to segment this idea of context is, like, context that can be retrieved just in time. Right? So if you need to get sample data or retrieve a schema, do you need to put that in a in a markdown file or in documentation? Probably not. Right? Because your your agents got access to dbt run or, you know, dbt like, the various dbt come in if you're working dbt

or the airflow CLI. So MCP here is probably a much better construct or having the right CLIs. So I haven't done a whole lot of, agent decoding in the data data engineering space. I've been mostly working on Apache Superset on Igor itself that we'll we'll talk about in a in a minute. So I've been doing more software engineering with AI. But I've done I've got, like, anecdotal evidence, maybe, like, call it a a dozen or so tasks that I did on our data warehouse, mostly d b t stuff. And, you know, my cloud dot m d doesn't have my schema or my data in there. Right? I just I just give the CLI or I just give the agent to coding. I tell it that it has access to SQL. So we have a little a little nifty, like, superset CLI where you can run arbitrary SQL. So you you provide the tool to the agent to run arbitrary SQL. And just with that, I can use, you know, info information schema of the tables, information schema of the columns. You it wants to know what are the value the percentage of null or the value or statistics. It can just run an on the fly query. So if we would refrain from

building context documents that are duplicative of what the agent can retrieve just in time, you know, from the database itself or from the tooling itself. So in data, I've been seeing, like, the the state of MCP in the database, which is some really good CLI. So you just off your CLI and you let the AI know, like, hey. I've got this SQL running thing. Or you can use the BigQuery,

you know, Google Cloud CLI to run arbitrary SQL, and you can run dbt dbt run. You can run the airflow CLI. It's all set up for you. Just go and fetch the context you need as as you go. And that that was, you know, really the big breakthrough with cloud code. Instead of adding to copy paste or prepare context or do prompt engineering or fetching the context like you need for the task, the agents just, has the tools to to go fetch it. So just just rely on that. That's where the the magic truly

happens here. So then what does belong in context, right, at that point? So say for data engineering, I would say the things that are tribal knowledge that are not stamped in the database or in the data. Right? So things like your your high level data flow diagram of, like, where does this airflow job fits in my bigger data flow diagram for my company or things that are external to the repository,

like business logic and things like that. So things that are not accessible to the agent but valuable

for the execution of of task you want the AI to perform for you. Those are the things you wanna, you know, curate and context and treat context as code as much as possible. I'll say one more thing, which is there's a lot of, like, information that is specific to your repo and you can store it all in your repo. But there's a lot of things that might exist in other repos or in, you know, in Notion or in that are called institutional

knowledge.

And that,

unclear, you know, exactly where you put that, but you you could imagine that you would have some sort of, like, knowledge repository,

you know, somewhere,

that your agents would be given access to. And there's some work in that space of, like, you know, trying to get all the your institutional

knowledge and some sort of, like, knowledge base that you can, you know, brag with your agents. We haven't really hit that too much. Now for us, I I've pretty said, I would say most tasks are within the scope of a repository,

and you usually don't require things that are external to that or the user, you know, can kinda pass it in the context of the task in there, you know, as they craft their their PRD or their design doc for the task at hand.

Yeah. In particular for me, over the past few weeks, I found it extremely helpful to be able to tell the GitHub Copilot CLI, you have access to kube control at Helm to help me diagnose this problem, and it rapidly accelerates the time to resolution for determining what bugs I've just introduced in my infrastructure as code or understanding why a particular service is behaving in a certain way. And, yeah, the just in time retrieval and

hydration of context is absolutely

game changing when you are able to wire those tools in. Of course, that also brings in a lot of considerations

around security and access control and what tools do you want to give

to the agent in the event that it makes a wrong choice. And so making sure that you're properly supervising exactly which commands it's running when it's potentially going to be manipulating state that will impact other people.

Yeah. So sandboxing becomes really important. Right? Especially if you wanna run, you know, the the agents will get naggy and it will ask for a lot of permission. It's really tempting to bypass the permission so you can, you know, go fix yourself a coffee while the the agents is cooking. But yeah. In in my case, you know, I run a lot of these agents in Docker

with, like, limited damage they can inflict, you know, to an environment. I think that's somewhat, important

in terms of, Yeah. You're right about the like, giving access to the agent to the tools that you would use to do the work. Right? Whether it's a kube cuddle or the the the g h CLI to interact with GitHub or, you know, these agents do so well with git g h. They do so well with things like yeah. You give it a CLI that can run SQL, and it was just, like, go to town with this thing. So give it access to the tools. I think I think, like, there's also this this, like, DevX idea. I've been think maybe, like, three or four months ago, I was thinking a lot about DevX and AIX, I call it. So I called it DevX for AI. So if you have a really good DevX and it's really easy to run a unit test in your repo or fire up a a docker environment, where you have really good docker compose, you have really good

unit tests, integration tests, fire up a Docker,

clear volume on dock on Docker. You give that to the agent. It's gonna do so much better than if you have, like, really convoluted

type, you know, DevEx. So I think DevEx becomes much more important. So as coding instead of coding becoming, like, 90% of time you spent, now if that goes, like, five to 10 x faster,

then you end up spending a lot more time, like, firing up environment, pulling branches,

and, you know, running that one unit. Like, whatever is glitchy

is gonna is gonna become apparent,

really, really quickly. So, like, DevEx, I think, becomes much more environment now that we're coding, you know, 10 times faster. I had this interesting analogy I've used before, which is, like, if you had a car that could go to, like, a 10,000 miles per hour, but there was, like, some 10 miles per hour, speed limit zone, you know, somewhere in the world,

you would spend all of your time seemingly

in 10 miles per hour zone. Right? So you end up like because you're going so fast elsewhere, you really you really, really, really feel the speed bumps. So I think that becomes important to figure out how to remove these zones or bypass them out altogether. Right? So if that zone is where you end up spending the bulk of your time, like, let's fix that zone or let's find a detour around this this new kind of bottleneck area.

And I think that transitions as well to what you're building with Agor as far as

being able

to run more of these systems in parallel and be able to monitor them and keep them

occupied

without having to worry about, oh, I've got two different terminal sessions open. And now that I've got these two different agents, they're starting to conflict with each other's changes where I know you can use GitworkTrees to address some of that, but it also speaks to the need to have very

well engineered

developer environments where you can easily spin up copies of them and have all of the dependencies necessary, have all of the context necessary, have all of the tools necessary for the agent to be able to actually execute, especially if you want them to be able to run when you close your laptop or and then come back to it. I'm just wondering if you could talk to what were

the motivating factors that led you to decide that you actually needed to build a new utility

to facilitate that and some of the core requirements and use cases that you were trying to address.

Yes. I would say it's gonna be a little difficult to talk about Agor. So Agor is a new tool.

I would call it a an agent orchestration

platform. So it kinda relates to air like, Airflow is, like, data pipeline orchestration, but this is very much, like, agent orchestration

platform. And I would say the main problem

that it's after

is this idea of, like, I want to work with a lot of agents. I wanna put an army of agents to work and do that effectively, efficiently without losing my mind. Right? So the place where it came from is, you know, using t mux and and docker and git work trees, you know. So I started doing that. I discovered, I think, git work trees. I'd heard of them, but I I was like, oh, man. I need this stuff. Now I'm cloning the same repo five or six time to work on five five or six features at a time. Found git work trees probably in in May or so. And then redid a lot of our dev act stuff in supersets, basically, to be able to run multiple

Docker without them conflicting. Right? So I could have five features

working at the same time

with five AIs working on those features and maybe still be able to pull someone's PR and fire up an environment,

in all of these environments, you know, running in parallel. So I got myself a Mac studio at the time because I needed, you know, some like a server to be able to run all of these Docker containers. And I would say since May, I've been really no. Not necessarily suffering, but, like, juggling with, like, five to 10

AI sessions in parallel in TMUX. So TMUX, for people who are not familiar, is just like a terminal, like a pain manager, like, as p a n e. So you can have, like, multiple tabs in your terminal. You can split the screen. So, you know, so TMUX works somewhat well, but you go kinda crazy quickly if you have, like, five environments working on five different ports and five TMUX

tabs. You know, each one is split screen, and you're what I call session hopping. You know, so you're going from an AI session to the next, trying to figure out which one has returned

and what the heck you were doing in there. So Igor is a tool where you can essentially

manage

the the tool will manage your git work trees for you. Maybe we'll we'll start from the ground up. So first thing is you add your repos to Agor as and it's a super visual interface. Right? So you you get into this Agor interface, and the first thing you would do is bring in the work, the the the get repos you wanna work with. And then it will manage your work trees for you. So you can say, I wanna create a work tree to work on this specific feature. And then what happens is Agor is a and people gotta look at the website while listening to this because it's really it's so rich visually.

You know, it's almost like a data visualization, you know, tool in many ways. But Agor as a spatial

first type of environment. So if you're familiar with Figma, it's the idea of, like, you have a board and you have cards on this board. So this way, every time you create a new work tree, that's essentially a project that is represented

as a card on a board. So because it's a board, it's a two d landscape, right, or it's a two d layout. So you can start grouping

some things,

inside your board. So you can say everything that's related to this project, I'm gonna put in the, you know, top left corner. And then there's this notion of a zone. So a zone think of it like a kanban. Right? Kanban being the vertical split of, where different projects or or, you know, issues or PRs are in your development life cycle. So you can create these zones in Agor. They they don't have to be vertical. Right? They're just like blocks you put on your spatial canvas. And you might create, you know, one for analysis, development,

testing,

you know, ready for review. So you create your zones, and then you can move your cards in these zones. And these zones are set up in a way that you can you can use templated

prompts against them. So every zone can have a prompt associated, and the prompt might be something like, look up at this GitHub issue template,

you know, with with a template,

pointer that points to the URL and do some deep analysis around it and propose a solution, file the design doc into, you know, a specific location if if you wanna, you know, for it to work that way. So you drag and drop your card. It will get the issue URL, you know, if you had put it as metadata on your work tree and the agent gets to work. So then agent's gonna do a bunch of work. You see a little spinner on that card, and you go to the next card where you can start up another

session there. So it would really help to have visual support to talk about Agor, but I encourage people to go see the websites at agor.live,

and then you'll get a sense very quickly

visually for what it looks like. But each card is a git or tree on the board, and each card can have one or many AI sessions inside of it. Currently,

when you add you click the little plus button to create a new AI session inside your work tree, you have the choice between

cloud code, codecs, Gemini, or open code .ai, which is an open source one that has opens up, you know, like, 70 different models. So here you're in a environment where you can use all these AIs or all these agentic coding tools with the variety of models that they expose.

And you can put some AI to work and kind of organize the work spatially so you really know what's happening there.

I think that's one of the interesting aspects

where you're talking about having these templated prompts is that in my own experience of building these systems,

I'm developing my own best practices for how I like to prompt these systems, how I like to structure the inputs, how I like to structure the guidance. But that is very difficult to then propagate and popularize throughout the rest of the team. I guess I could write markdown documents that I put in a repo, but people aren't necessarily going to read them or use them. And so I'm interested in how you're seeing that impact the ways that you're seeing your team

converge

on certain

structures and best practices for how they think about what prompts to use for what types of use cases, how they segment up the work of the agents where you don't necessarily wanna have it running too long because then it'll start to explode the context window and start forgetting things. And you also don't necessarily wanna have it making changes that are too sweeping because then they're harder to review and just how having that multiplayer canvas changes the ways that people are thinking about the scoping and structure of the work that they're having the agents do at the team level?

Yeah. Good question. So when I started to build Agor, I was like, I want this to be fully multiplayer. So I went with, you know, architectural decisions so that the app would be all WebSockets

and all instantaneous.

And so that we would have features like, you know, a Slack like interface on the on the left panel. So you can have a conversation. Facial annotations a little bit like you have in Figma where you can put a little, drop a little pin with a comment and have a conversation there

on the board itself. And a bunch of other social features, like, if you're two on the same board or if you have five people on the same board, you'll see everyone's cursors moving around. So a little bit of eye candy so you can see what everyone else is doing on the board. Now I would say we're very much still in a phase where

software development is still kind of a or at least, like, local software development is, like, everyone's got their own process. Right? And especially working with AI,

I don't think teams have converged on any kind of workflow or prompts or structure to use because

every project is different. Every repo is different. Everyone, you know, develops their own technique in terms of, like, how they like to prompt the AI and how they like to categorize this stuff. The beauty in Agor is, like, you can really just define your own framework. It's it's an open canvas. You know, you can define whatever zone you want and whatever prompts you want and whatever whatever workflow you want. But I would say now

Agor is still, like, an early experiment. So the project got open source two weeks ago. We just started, you know, having a shared box internally at preset

to unleash a bunch of people on the same boards.

And we have to figure out, you know, if people are if those workflows

and prompts are gonna converge. I really think they will. But, you know, until now, we just had no visibility into how other people were working with AI. Right? Like, I I just have no visibility on your terminal. I have no idea where you're prompting. So now and maybe it's the first opportunity for a lot of us to expose or open up how we work with AI. So very much an experiment. I think things will tend to converge. I think every project is different, and, you know, we need a very open ended framework

like Agor. It's for people to just define their zone and define, you know, how they wanna work with things.

Something I wanted to mention that's kinda interesting to I think is groundbreaking in many ways is in Agor,

the fact so Agor manages your Docker environments or your development environments for you. So for each card right? So every card is a work tree. If you set it up, if you give Agor an up down app help if you give it a set of templated commands to say, here's how you start my app. Here's how you stop my app. Here's the link to my app, and here's how you do a health check on my app. So once you configure your repo level to have these commands that are templated,

you can start an environment by hitting the play button. So that's super useful. Right? Because, like, I mean, normally, you would just run your docker compose up command, but, but then you have to manage the ports and all this stuff. So you would gotta make sure that it won't conflict. But once it is set up, once for a repo, then anyone who's working on this Igor board can create a new work tree and start an environment. But the beauty is these dev environments now are live. So if the AI changes some code, then their the app will change right away. Right? You have your app in watch mode, and then that means that you can bring a PM or QA person or your code reviewer, and they can just hit up your live environment.

They can see your, your AI sessions. Right? So they could post prompt one of your session to say, like, k. Can you change the you know, I'm looking at the app right now. The button color is not right. You know? Can you just switch it to this particular color? So then you end up in a shared

dev environment

with all four AIs kinda ready to go on a live Git work tree and a live app. So that's, I think, something that's really groundbreaking.

In my career, I haven't seen, like, shared dev environments

in ever. Right? Like I think if you think about it, like, you know, if you wanna see you wanna reduce some code and, you gotta pull the branch, you gotta fire up your environment and hopefully that stuff works, you know, you might get tangled up and it might conflict with something you have to stop this other Docker that was already running and start this new one. Now you're you have to switch branch to now. You can truly have, like, a shared development environment and it's still unclear, you know, how people are gonna rally around that. So a bit of an experiment.

I'm interested in digging a bit more into,

in particular,

how you and your team are provisioning that environment

because

being the creators and early adopters of this utility,

I'm wondering how you're addressing some of those challenges

of, do I just have one beefy e c two server somewhere that has a lot of copies of Git somewhere? Are you digging into some of the GitHub code spaces? I know that,

there's the ONA project that used to be Gitpod that's leaning heavily into that agentic coding environments.

Hugging Face is investing in agentic environments, though I think that that's a a different use case where it's more for deployed agents. I'm just wondering how you're thinking about

near term,

what is the quick hack that you got running? And as you continue to invest in this and plan forward, what are some of the ways that you're thinking about how to actually make that shared developer environment something that is sustainable and maintainable over time?

Yeah. So first, I wanna say we're kinda in the first inning, but in some ways, like, at the bottom, the seventh too because the project I started I wrote the first line of code, like, six weeks ago, and it has gotten so good so quickly. Right? And I would ask like, I would kinda rate my productivity

developing Agor with Agor as, like, maybe 50 x pre AI

type of workflow. So we're pretty early early on when it became something shareable that we could, you know, bring a a handful of people on the same Agor instance. I use Codespaces. So Codespaces,

for people who are not familiar, it's just hosted by GitHub. It's it's essentially an e c two instance with some,

GitHub hooks.

It's pretty sweet. It works well. Actually, if you wanna try Igor, there there's a single click button that you get on, igore.live

where it will create,

a GitHub code spaces for you. And GitHub's code spaces, you know, it's hosted by GitHub, and you can make the URL ports that is that come out public. So you can invite other people. It's kinda convenient. They the code spaces will pause themselves if no one is using them for a while, and then they're really not meant for a group. Right? So only one. I don't think there's a way to invite someone else inside your code spaces. They're very, you know, designed to be private, but then they expose public URL. So you can serve an app and make that app public and public and shared. But we found, like, working with that, just had some issues around state persistence. Right? So the databases,

was SQLite by default, not support Postgres. So less of an issue now, but, like, it feels like an ephemeral construct that's really private. So until,

I believe, last week is when we set up a preset NEC two instance, like a beefy EC two instance to run Agor. And then we provisioned it, you know, pretty aggressively so that it could run a lot of Docker because that box is a shared dev box that we're gonna have, you know, a dozen or plus engineers hammering at very soon. So right now, we're still super early. I shipped,

Postgres support last night. I merged a PR.

So, you know, we're still fairly early. But the vision is you crank out a big e c two instance. You open it up to your team, probably behind VPN.

We haven't talked about security. We probably shouldn't get too deep into that. But currently the Agor daemon is running as an, you know, Agor user, a single user. So everything on that box becomes

essentially shared across people. And then, you know, people once you have this instance running, there's some really good user settings in Igor. So you if you go into your own user settings, you can set your own API keys for your, coding tools and your own preference. So there's pretty rich I would say there's, like, six or seven tabs in the user settings where you can define your API keys, your codex home, if you want tons of feature, your audio settings so that the task will chime at you when the a when the the session returns. So super rich features here. And I would say the whole, like, multiplayer

or multi agents. So the multi agent, I've used a lot. So I've I'll put, like, 12 agents in parallel, a mix of codex, and cloud code, mostly some Gemini,

you know. But so multi agent works awesomely well. It's super great. So single player Agor is fantastic.

That's part of the reason why I was able to build it in such little time. I built, like, an app with a lot of product surface very quickly. So it works extremely well. The multiplayer stuff, we have yet to see how it's gonna work out. You know, what happens when you invite a QA person to come see the live app that's essentially your dev environment and let them, you know, prompt against it. So that's pretty interesting to see what's gonna emerge here.

Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to a factor of six while guaranteeing accuracy?

DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems.

Whether you're moving from Oracle to Snowflake, migrating stored procedures to DBT, or handling complex multisystem migrations, they deliver production ready code with a guaranteed timeline and fixed price.

Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafull

to book a demo and see how they turn months long migration nightmares into week long success stories.

Another interesting

shift that I can anticipate

when you have

that agentic session being long running and multi

user where it's not just one person prompting,

giving feedback, steering the agent until you decide that it's done and then shutting it down. It leaves it open for you to maybe start with something and say, this is something that I am

speculating on. I think it might be useful. You get an initial

set of changes done, and then you can actually hand it off to the next person where maybe you are doing some exploration of a potential feature, and then you hand it off to your product manager who's going to have more opinions on how it should be run. And then they can then hand it off to the QA person to say, hey. This looks like it does all the things that I want. Now I need you to make sure that it's actually going to

do what it needs to do, and they can just take over that session to add in the appropriate fixes as and then you can say, okay. This has gone through all the cycles. It's ready for actually

doing the code review rather than making some changes, doing the code review, doing a test deploy, getting feedback, saying, oh, no. You actually need to redo it and just adding a lot of the extra cycles versus being able to actually hand off that work to other people who have different perspectives and different views on the overall problem space.

Yeah. I think I think, you know, that's where software engineering, you know, is going in many ways. Like, you know, orchestrating a lot of agents and then bringing people to work to rally and to work together

on a shared environment.

And so Igor certainly,

you know, allows for these kinds of workflow, and we have yet to see how it's gonna manifest itself. I think the idea of a hand off is good. You know, git branches have been so private forever. When you think when you think about it, your git branch, it's like, they would you let someone else access to your file system and mess with your branch and commit in your bridge? No. It's private. Right? So I think in a lot of ways, people might not be ready for this, and there's some question of, like, what happens when someone else is prompting your session. So formalizing the idea of, like, this is a private work tree. Like, this is my branch. You know, no one else can prompt in it. So,

I've been working designing some RBAC solutions. So you can say this work tree is private. I'm the only owner of it, or I trust someone. I wanna bring them as an owner of this work tree. Right? So they can prompt my session. They can change the git state. And so I think these controls are vital. And then, you know, for the rest of the people, are they visible to others or or private? They can't even see that you have a work tree. So I'm baking that in the framework right now so that you can

define how you wanna work with others and have some gatekeeping of, like, I don't want anyone else to prompt this work tree. I didn't talk too much about this, but there's some because we're not in a CLI, so if you've been living in Cloud Code or codex like I have, you know, CLI they're they're pretty sweet CLIs. They're they've done wonders in terms of, like, UX of what you can do inside CLI. But there's still CLIs. Right? So here and they go, like, visually

as much more clear. So there's a bunch of features, like, showing you your context window as a progress bar. So it's very self evident, very visual. A lot of information about how much every prompt costed in terms of, tokens

and and dollars. So you get all this visual feedback with rich tool tips and all of that. So that's super valuable. Another thing that's really valuable is we capture the git transition. So for every prompt that you do, you have, like, a a collapsible bar. So usually, you just look at the latest one. But in that little prompt header, you'll

see what was the git before

and after

this prompt. So what was the git shot? So it will say, you're on this shot, and it's dirty. And if the agent doesn't commit, you'll see the the git transition. So you can at all point in time, as metadata goes, see, like, hey. When the QA person prompted this, what was the git shot at the time? And I wanna revert to that point. So so point being, like, at really highly visual interface

to your AI session is really, really useful. We also capture, you know, all that metadata so you can do analysis on it. There's some some features that are pretty cool around, like, you know, running some leaderboards or who burned the most tokens, you know, what are the work trees that took the most, you know, cost the most money or burned the most tokens. So we capture all this metadata. So the half life of a set of a session is, you know, longer in in Agor. But yeah. So as to, like, how people are gonna work around it, I think it's, like, largely

to be discovered. I think we're in a new space, like, in terms, like, being able to share dev environments, AI session. You know, know, you think how much we've been working with AI over the past, like, six to nine months. And then I've never seen any other software engineers, like, AI session. Like, I don't know how they prompt. Right? Like, it's completely

opaque to me. So, like, Igor kind of casts a light. You can go see other people's prompting style, right? You might run an agent to say, do an analysis, compare my prompting style with this other software engineer who's not saying they're getting, you know, 10 X acceleration.

Like, let's see, let's try to identify the patterns of doing statistics, metadata analysis becomes possible instead of adding these super ephemeral sessions. Wanted to bring up, like, another feature I think is super groundbreaking. So I said earlier, it's hard to talk about Angular because there's so much to it. Right? There's a spatial canvas. There's a super rich interface to AI conversation.

There's a multiplayer

option and its impact. But the one thing I haven't talked about, I think that's super groundbreaking,

is

the Agor as an internal MCP server about itself. So every agent that you start inside Agor has access to the Agor system through the Agor internal MCP. So what does that mean? So that means that any agent that you fire up can go see what another agent is doing or can go prompt another agent. You know, there's also a scheduler inside Agor. So that means, like, you could say every hour, I want an agent to wake up and go see what all the other agents are doing. And based on what it finds, do some prompting or move some Gitwork trees to a different zone if they're ready to be pushed. Or you can schedule an agent. So because the agents have access to a guard, they can you can tell your agent, hey. Can you fire up my Docker environment and look at the logs? And then they will use the MCP start environment for the work tree, and then it will call the MCP check logs for that environment. And it's like, oh, the environment did not boot up because there's a TypeScript compilation error. Let me go fix that. Okay. Let me restart the environment now. So the agents are Agor aware

in Agor. So that means that they can do pretty much

anything. Right? An example of that is when I invited my my team, I took the d l, just the email addresses for, you know, I went in Gmail and then just got a list of email as CSV, then pasted that in an Agor session. Say, can can you create one Agor user for each one of these emails? Give this password, make them all Admins. And then Agor just went and created the 12 users. You know? So sky's the limit there in terms of what you can do. Agor's got deep support for MCP, so you can just put your MCP

configuration in one place. Only codecs, I couldn't wire out the SDK to work with with MCP, but Gemini

and plug code have full access to MCP. So if you wanna have a Slack MCP

or a Airflow MCP, you set it once and it should work with all of your agents. So that means you could have an agent that says, hey. Every week, go look at all the Igor's, you know, chat session, figure like, you know, produce a leader board of, like, who burned the most token and post on Slack. You know? So you put that on a schedule. So very meta. I don't think it's been done before. Right? Basically, agents being connected

to other I mean, agents connected to agents. Yes. But, like like, agent orchestration

where each agent can go and

use the system, use the framework to do some pretty intricate things. I'll have it, like, just another, like, example of that, something I did recently. I prompted an agent to say, hey. Go look at the superset repository

that I work on. Look at the the past, like, week or so of GitHub issues, and let's identify, you know, a dozen or so GitHub issues that we that are good for us to work on, you know, to push forward. And once you do that, you you know, let's curate the list. And then we identified, say, five or six issues I wanted to work on. And I say, can you create a Git work tree for each one of these? I I asked for that. Oh, like, I saw six cards land on my board, you know, with the GitHub issue URL attached as metadata. So it gets my six new cards, and then I just drag and drop them all into my analyze the get the this GitHub issue zone. And then all six agents, you know, got to work, and they started it, you know, retrieving the GitHub issue, parsing it, looking at it against the code, you know, and ultimately producing an analysis of the issue.

And then my next zone after that is, okay, I like your design doc, you know, let's go and start coding this up. So I could just drag my cards and all the work, you know, gets done. Just with a little bit of drag and drop, something interesting is I could have just gotten another agent to say, look at these six cards, you know, on this board. You know, evaluate whether they're ready to be dropped to the next zone and just move them to the the, you know, start coding,

zone and fire up the prompt. Right? So there's this all, like, agent automation

that's emerging that is possible because

of Agor's internal MCP server. So to to put it all together, it's, like, quite a few super groundbreaking,

you know, features. Right? So the internal MCP, the spatial layout,

the multiplayer

stuff,

the manage shared dev environments. You know, I think it can it has really the potential to, like, reshape or define how we do, you know, software engineering through agent orchestration

over the next, what, six six months, twelve months until, like, agents take over the world maybe.

And as you've been

building this system and introducing your team to it and seeing how it shifts your own practices and your team practices, what are some of the most interesting or unexpected or challenging lessons that you learned in that process?

So we're fairly early there. Right? Open source two weeks ago. I'm still personally, like, reorganizing

my workflow around these groundbreaking,

you know, new set of features and capabilities.

So my workflow

has become a lot more parallel. Now it's really hard to change the way that people people's workflow. Like, people and that takes a moment for people to develop new workflows. I would say we're tiptoeing, and the the the rest of the team just got introduced to Agor, you know, a few weeks ago. Some have played with it. I'm getting good feedback. You know, you gotta set up the system. So I think that a few people really started defining their zones and defining their work trees and, you know, reorganizing

their workflow, but we're still fairly early. The tool is, like, so much more mature than I thought it would be at this point. Lots of product surface, you know, first line of code code written six weeks ago. The app is just so slick and works so well. You know, I think it has been transformative for me. Now I gotta push people. You know, part of the reason that I come on the podcast is, like, I built this thing. It's amazing. Like, now people need to hear about it and play with it. And it's gonna take a moment, I think, for for really people to reshape their workflow and start transitioning to be using this primarily.

And are there any cases

from your early experimentation

where you would advise against adopting Aegor or advise using some other agentic scaffolding for it?

Yeah. So I would say the the main thing for the multi player,

some of the the areas where I'm, like, be cautious is the fact that everything runs as a daemon currently. So there's no UNIX level isolation. So that means that, you know, if you work in Agor, there's also there's a nice little terminal and browser terminal you can fire up and use this team box behind the scene. But you have to

essentially trust the people that you share the box with as if they were on that box, you know, with pseudo access pretty much. So I think there's some some security stuff I'm sorting out around

making sure each user gets a unique user. If if you decide to run-in multiplayer mode and you decide to enable the UNIX

OS level isolation,

that there will be a way so that everyone gets their home and no one can look at your, you know, your SSH keys or your API keys. So currently the security stuff, I tell people like, you know, it's as if you would invite someone on your laptop and they've had full access to your file system. And there's agents in there too that effectively run as the Igor user and could you can and break stuff too. Until recently, I was working on only on SQL lights, so you might lose the state. So, you know, it's fairly early. And I think the security stuff,

we gotta sort out. But to some extent, if you're gonna have a development environment,

you know, and a box shared with others, there's always some risk. Right? There's nothing that says that your agent is gonna check your, you know, your agent might just say, I'm gonna check to see if your GitHub token, you know, environment,

environment variable is here and it runs an echo GitHub token. And that goes into the a word database, you know, as a response message. So I would say the area where I would advise to be cautious is around multiplayer

and, like, bring only people you trust, you know, to the box that you set up for it. There's gonna be some work there too. So thinking about, you know, getting that UNIX level isolation

security,

but still becomes, you know, in many ways, the breeding ground for for things to get uncontained or for secrets to to get open to other people potentially.

Are there any other aspects of the work that you're doing on Agor, the ways that it's changing your own workflows, or how you're thinking about this multi agent, multiplayer

setup that we didn't discuss yet that you'd like to cover before we close out the show?

Yeah.

So the meta stuff was kind of a discovery

along the way. Right? So early on, I was like, hey. It'd be cool if Agor

exposed this MCP serve like, this MCP Agor service to itself. So it can be used internally, you know, but externally too. So since I've had that, I added a lot of MCP tools. Right? So get the list of session, get the list of work trees, inspect a session, you know, fire up an environment, look at the logs for an environment, you know, create a user. I added a leaderboard or leaderboard

service so you can get,

the sum of tokens earned, you know, by work tree or by user. But the whole agents

orchestrating agent or the agent to agent cross communication

is completely new to me. And I'm just starting to do some cool stuff with that.

You know, it opens so many doors. So recently I added a queuing

mechanism for prompts. So until recently, you couldn't prompt an agent that was busy. But now we have a queuing system. So you can, you know, at any point in time, an agent can put up, you know, a prompt for another agent and we'll just get queued up. So that that opens so many possibilities that I don't know how people are gonna use this stuff. Right? Like, if you there there's this idea of, chain reaction of agents. If you ask an agent to fire up another agent who has fires up three agent, then, you know, there's potential issues with chain reaction and, you know, kinda explosion of agents. I'm realizing now I'm gonna need policies, like, never more than 12 agents on this board or something like that. Because, like, who knows what's gonna happen there. Another thing I I didn't talk about that I think is really interesting is

session trees. So inside the CLI, I don't know if people have discovered that in the wild, but if you've been using cloud code, you probably discovered the dash dash resume DLI function. So when you fire up cloud code, you can say cloud dash dash resume, and it will show your session. There's also a dash dash fork. So you can fork sessions. And what does that mean? What does that fork?

So when you fork,

a cloud code session or a codec session,

it basically takes the context window, but forks the context window. So I added some some primitives in Igor. So when you prompt an agent, you can always fork this session, which is really interesting. So that allows you to manage your context window better. So say if you're like, you wanna ask the agent to compile a report of everything that's been done so far, you can fork the session. It will burn some some some context in that in that new session without polluting

that main session. That's a pretty cool feature that I'm starting to figure out how how to use more. You know? I found some use cases for it. So call it forking sessions, which forks the context window. And then recently,

I added

spawning sub sessions. So,

so the the the power users of Cloud Code have figured out the task tool or the sub agent workflow. Right? So the sub agent workflow is when you ask your main agent to fire up another agent to do a task, and then that runs inside its own context window with its own fresh context window from scratch. Right? So the parent agent

prepares a prompt for the sub agent, and then there's an event loop or, you know, the parent agent will wait for the sub agent to be done and get a report and then keep going. So I added much better

support for this inside Igor so that you can ask your agent

now to so say if you have a cloud session, you can say, fire up a codex, spawn a subsession for codex and for Gemini to review this code. So what you'll see is, like, your session breaking into a tree. Right? So your your session inside your card is gonna break into two child sessions,

and you'll see a Gemini session and a codex session. And when they're done, they will call back the parent agent to say, I'm done reviewing the code. Here's my report. So here there's this multi agent spawning tasks in parallel

that's extremely powerful.

A little bit dangerous just in terms of, like,

you know, agents run away and that or context management at some point. You're like, okay, what do I have going on here? But one thing I've used it for is

to find a way to refactor

is specific, like JavaScript to TypeScript migration

on on a single file. So really define how to do that. And then asking my agents to spawn 10 sub session to process 10 files in parallel. So then, you know, the you see new sessions pop into existence,

do some work, report back to the parent agent. And then what's cool compared to the task tool for people familiar with cloud code specifically is the session remains after it's done. So you can still go into,

say, the

the sub session

that process a certain file and say, like, it can you know, you can post prompt the session or you can fork that session or you could spawn another sub session from there. So crazy stuff around, like, call it session trees with forking and spawning sub session. Still don't, like, have some use cases for it, but, curious to see how the world's gonna take advantage of these, like, these primitives in the in the wild, you know, because it really opens up some some crazy potential workflows.

Definitely very interesting and,

unpredictable world that we're in these days.

For anybody who wants to follow along with the work that you're doing on Agor and experiment with that, I'll have you add your preferred contact information to the show notes. And, yeah, appreciate you taking the time today to join me and share your explorations and experiments with this exciting new space and some of the ways that you're using it to accelerate your own work and your team's work. So I definitely look forward to digging into it further myself, and I hope you enjoy the rest of your day. Awesome. Yeah. I highly encourage people to check out, Agor live. The docs are really good. So somehow I got, you know, agents to write really good docs with rich screenshots,

some gifts, and it's super super easy to install. So, you know, npm I dash g agor dash live. It's a single command. You could add it, run it on your,

on your laptop in the next, like, minute if you want to. Right? So it's really easy to fire up. And it's I think it's like you assume that open source, like, an open source GUI is gonna be a little rough around the edges. Like, this thing is slick. It's snappy.

It works super well. It's so feature rich and, like, visually too. Like, it's so much better than using a CLI. So even if you'd all you wanna do is just use it to run your cloud code or code accession, you get a lot of benefits and then then work tree management

and and, you know, environment management is something you can grow into. So you can you know, you don't have to have the problem of I wanna orchestrate 20 agents. Like, if you just want to have a little bit more

orchestrate 20 agents. Like, if you just want to have a little bit more recall and

a little bit more control

over your AI sessions, it's super easy to use, super easy to fire up and,

you know, pretty pretty slick awesome application.

Alright. Well, thanks again for putting in that work, and have a good rest of your day.

Awesome. Thank you so much.

Thank you for listening, and don't forget to check out our other shows. Podcast.net

covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems.

Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@dataengineeringpodcast.com

with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends

and coworkers.

Data Engineering Podcast