Enabling Agents In The Enterprise With A Platform Approach

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Data migrations are brutal. They drag on for months, sometimes years, burning through resources and crushing team morale.

DataFold's AI powered migration agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches.

And they're so confident in their solution, they'll actually guarantee your timeline in writing.

Ready to turn your year long migration into weeks? Visit dataengineeringpodcast.com/datafolds

today for the details.

Poor quality data keeps you from building best in class AI solutions.

It costs you money and wastes precious engineering hours. There is a better way.

Core signal's multi source enriched cleaned data will save you time and money. It covers millions of companies, employees, and job postings and can be accessed via API or as flat files.

Over 700 companies work with Core Signal to develop AI solutions in investment, sales, recruitment, and other industries.

Go to dataengineeringpodcast.com/coresignal

and try Core Signal's self-service platform for free today.

Your host is Tobias Macy, and today I'm interviewing Arun Joseph about building an agent platform to empower the business to adopt agentic capabilities. So, So, Arun, can you start by introducing yourself?

Hey. Hello, Tobias.

Thanks for inviting me to this podcast.

So I'm Arun. Arun Joseph.

I'm based out of Germany, Bonn.

Well, I've always built distributed systems.

I've been leading engineering organizations.

In my most recent role, I was heading the AI engineering program for Deutsche Telekom Group, where we build something referred to as Elmos, which is the agent platform,

which is open source. And now, right now, I'm on my entrepreneurial

path,

building something around multi agent systems because this has been something which is close to my heart quite a for quite a long time. And, well, I I'm originally from India. I used to work for large scale enterprises

and leading engineering teams across the globe, also in San Francisco, Juniper,

then,

Merck Pharmaceuticals,

for I built their

industrial IoT platform for Merck. Always build large scale distributed systems. It was mostly entrepreneurship

kind of,

kind of, endeavors is the first time I'm jumping headfirst into, you know, entrepreneurial

stuff outside an organization. So I'm passionate about distributed systems and large scale AI systems, which is gonna define the future.

And do you remember how you first got started working in data and also in the AI and ML space?

Yeah. I think the first introduction to large scale

data was

when I was used to work for Nielsen,

which is into a market research company as

and me Nielsen had this interesting subsidiary called Arbitron.

And Arbitron

had this device which used to collect listen to radio signals placed in these homes where people sign up and collect

listening data, listenership data. And this used to go through these large data pipelines to build listenership

market intelligence.

So this was the first time, we started thinking about large scale data pipelines. This is I think this was right around the time Hadoop

was,

brought in. And, AIML pipelines, MLOps,

which was also there in in several of their endeavors, but that's essentially the the the recent experience.

And so

in terms of the overall space of building agentic systems, particularly when you're looking at the organizational

scale and not just a little toy implementation or a proof of concept,

There is a lot of complexity involved.

So

the early stages was very oriented around just simple rag bots. I I say simple in air quotes because they could be very complex.

But as we move to more agentic capabilities where we're adding to the sets of tools and we're giving more

free rein to the language models to make decisions and determine the execution paths in nondeterministic

fashion, there is a lot more that goes into it. And particularly, if you are trying to

empower

nontechnical

stakeholders who don't necessarily have the deep domain expertise in how these systems work, you wanna make sure that they're fairly foolproof.

And I'm wondering if you can talk to some of the ways that you thought about

the different

components and domain segments that go into building something like an agentic system and then being able to scale that to the overall organization, just how you started to approach that overall problem definition.

Yeah. Absolutely. So this is a fascinating topic because there are several constructs to unpack in here of which one of the most important construct is English is the new programming language, or your thought is the new design.

And,

essentially, what happens what is happening right now is this this thing called as agentic orchestration,

which is merely a feedback loop if you look at it. Right? You provide a goal. And to an intelligent system, it could also this is how even people work. You give a goal,

and the entity

attempts to do it given enough tools.

If it doesn't work,

loop through it, get new insights, and then iterate. So this orchestration

loop

makes it possible

that if you're defining

your instructions very precisely,

like as a program, add two numbers, it doesn't have to go through that loop multiple times. But if you're specifying something in an abstract manner, I have two chocolates and then somebody's giving me something and I need to figure out what's the best way to do it without knowing the construct of addition, then this is the loop first figures out what should be the mathematical construct to apply. Okay. This is good. Now let let's apply this. Great. And then let's verify it. So in this example, what just happened? So you move from

programming, which was very deterministic, used to be written by SMEs or specialized expertise, which was required. It shifts

to this capability allows people to describe

in a more broader manner what they want, and the loop takes care of it to some degree. The implications are profound,

especially in enterprises.

What is most enterprise information? What does most enterprise information systems do? They merely move data from one place to the another and then do some kind of transformation.

And, of course, there is resilience, reliability, and distributed systems. But most of the requirements are around moving data from one place to another, and and some business stakeholder would say,

I want it this way. I want it that way. And and now tying it back to the analogy that I said, now the engineers of an organization need not be building systems

exactly as how

the business people would say, but rather build systems

which allows them to

the business to tell what is required, and the system makes it come true. So this is this is the fundamental mental model. But in order to make that possible, you need great models. You need,

a robust way to connect data and tools, a robust way to do orchestration loops, a robust way to manage the growing complexity as you build more such programs, and the complexity on how these programs can interact with each other. That, in essence,

is the anatomy of a multi agent system.

And in terms of

the primitives that you have available to build a system like that, that's obviously a very rapidly evolving

ecosystem

to your point of multi agent systems that, of course, brings to mind the a to a protocol that Google just donated to the Linux Foundation.

In terms of tool call definitions, it brings up the hype that's been growing around the model context protocol.

There are obviously

custom tool definitions that you can create.

The idea of handing tools to these language models has been around for at least a year or two. And I'm wondering how you think about what are the

stable primitives that you have available to be able to build an enterprise scale system on top of without risking that one of these protocol authors is going to introduce a new revision that's going to massively change the capabilities or require a lot of rework in terms of how you actually think about your implementation?

Yeah. This is a very, very good question in terms of why went through the other side of it. Because we built a product all first

before MCP, before a two way, into Elmos, the platform that we built. But then these MCPs came up, and our stakeholders and as well as our team started to ask, what are you gonna do now? Yeah.

So this is a valid question, but there there there's, let's unpack a little a a few things around here.

We went into production

back

in the first quarter of twenty twenty four as as an agentic platform, not not only, the first two agents which went to production was in the first quarter of, of twenty twenty four. It used to connect to telco APIs, which are really

complex.

So there was no MCP.

There was no eight way, but we could do it. The first principles don't change. That's the whole point, essentially.

And then we started to when we started to build more such agents and more such APIs,

the need for

some sort of harmonization

came in. What is the pattern that you tell an engineer without reinventing

the wheel or the pattern which we follow?

This was the beginning of what we refer to as our Elmos protocol, which was based on web of things, because Elmos was built on the fundamental construct that everything should be open,

and not reinvent the wheel. So let's pick up the best protocols out there to without reinventing a new protocol, how do you allow agents to collaborate? But at the same time,

organizations, especially when models needed to connect to data, MCB was the first one which came out with that topic. And it immediately gained a lot of adoption because of the size and scale of the cloud and desktop,

as well because it could connect to the models quickly the tools quickly. So but the first principle of how

you would need to connect your data is not gonna change. How you need to the need for bringing your data, because most of these enterprise APIs are have been there for years. When GraphQL came in, a lot of people tried to build

GraphQL,

wrappers on top. And now MCP comes in, and then people are trying to retrofit

MCP into the traditional APIs. And then on top of it, a two way. My suggestion is before betting into any big protocol, it's pretty simple to connect your existing API to any of these LLM prompts.

Start there,

experiment it, and then if it scales, then bring in new additional

layers. And MCP looks like

is adopted by OpenAI

and also Entropic

and Microsoft. And a lot of MCP servers

is good enough to bet, but asynchronous

large scale asynchronous communication is not solved yet in MCP.

And,

a two way, similarly,

whether the need

for a large scale agent collaboration is required or not is is still a question. Because if you need a large scale agent collaboration, you need large scale state management,

which should be true. Otherwise, two agents cannot collaborate.

So a two a is still shady area in my view, but,

and most of the enterprise system might not require

every problem to be seen as agents. That's that's another thing. In microservices,

well, we say don't start with microservices. Right? Start with the monolith.

So it's easier to build your particular department's use case as a single system.

Connect your data with MCP. Or if it's legacy API, don't try to MCPify it. Keep it simple, the case principle. Keep it simple. Stupid.

And then think about scaling it to other protocols.

The other interesting challenge

when you're dealing with an organizational scale

enablement

system,

particularly for something like agentic use cases, is to your point, you need to be able to connect it to the underlying data that the agent needs to be able to use either to operate against or to use as context.

And that's a substantial challenge in and of itself, again, because of the fact that data in and of itself is challenging, but, also, you're working with people who don't have that domain knowledge to understand all of the complexity that goes into all of the data prep, all of the ETL pipelining,

how to manage the different chunking strategies, etcetera.

And so I'm wondering how you think about that element

of exposing the existing organizational data and then streamlining the work of actually doing the additional preparatory work for being able to load that context into something like a vector database and standardizing

on that core technology.

In fact, I mean, this clearly has been a missing piece in most enterprises.

Enterprises used to have data pipelines, but it was mostly for transactional data. Transactional systems would emit the data, which is going into huge data pipelines, NiFi's, etcetera, landing in warehouses.

What

right now emerged is a need for,

unstructured

natural language data

to be parsed

into

semantic vectors, which could be queried by and and and finally collapsed for an answer by the LLMs.

So

this is a new skill and technical skill sets that most organizations

need to learn, but the first principles

of

how large scale data ingestions work still applies, which means

the data still would need to go through phases and pipelines. For example, in, when when in in in Deutsche Telekom, when,

we wanted to ingest,

this,

corporate knowledge base for customer support people and FAQs, if you just ingest it as it is, it would not

the the the the search endpoints were not

performing

as we would have wanted it to be. The accuracy rate was subpar.

So we had to create the ontology and cleaning up of the domain objects and the domain topics to create the ontology of the knowledge for the customer support groups and build that ontology into the pipeline. So it it requires two skill sets. One is a technical skill set on what is a chunking strategy, what is an embedding model, what kind of vectorization approaches can you use. The other is the domain knowledge

on, hey. My customer support is broken into domain billing domain and customer domain and contract. I have Magenta TV and and all that. So both the knowledge are essential,

and this is this in essence need to come hand in hand to create those pipelines. But but at the same time,

there are only a few things that you can tweak

to bring the best answers, especially in RAG. One is definitely the vector database. The second is the pipeline itself on how do you create the ontology and the cleanup.

And the third is the actual

search endpoint that you built, which

should be able to rely not just maybe on vectorization, but also on additional dimensions to create a hybrid search approach. These are the levers which you have. And and I think the vector side plays a huge role in terms of

not many developers are going to create their own vector databases for sure. Rest of the pieces, they can tweak. What kind of search algorithm that they need to put in? What kind of pipelines that they need to build in? So the choice of, the vectorization

approaches and and and how that system would scale with large ingestion pipelines is something which need to be really thought through. I think in in DT,

we brought in Quadrant,

which served really well the purposes of operational simplicity that Quadrant baked in. And in terms of the vectorizations,

it it Quadrant has this brilliant user interface, which shows the entire embeddings

space, and the operational simplicity was unparalleled.

And this helped in streamlining, at least, for the developers to focus on the ingestion pipelines and not on operational characteristics

of maintaining the topology

of the of the vector databases because you would need multiple multi tenancy and all that, which resulted in that pipeline that we referred to as.

So this was developed by a couple of engineers in my team, Thomas Weigel.

So in German,

means

root. So this was, like, built as pipelines where it it's it's like roots that can go into the soil. So it's like the enterprise world is like soil, and it need to go in and collect the nutrients,

which is the data, which is in many different formats. So, essentially, it's it did not try to replace anything, but rather what's the best embedding strategy or what is the best frame the a lot of frameworks were coming up, and these frame there will be one or two capabilities within the framework, which would be really good. How do you club together these capabilities and not bet on only one Lama index or, something like that? So this was WordCell, which is a pipeline, and it ingest the data, and it can land in the quadrant cluster that we had. And then the search endpoints were built.

In terms of the

embeddings and

the additional context that you need to bring in,

one of the common pieces of wisdom when you're building RAG systems

is that you don't know what chunk chunking strategy you want to use until after you've tried it and experimented a little bit. And so I'm wondering how you thought about

either determining

this is the standard strategy we're going to use because it's good enough on most use cases or how much control you wanted to give to those

end builders of these agents to be able to say, you know that content that you're working on, so you're going to set some parameters to determine which chunking approach to use and just how to think about building that into a

flexible system without making it flexible to the point of being useless.

Yeah. Absolutely. I think too much obstruction. It's it's a right balance that we need to think about which went into the system as well. So, essentially,

since because we were not building a platform

right from the start as in we we were supposed to solve these use cases. So we were only measured by one one,

one metric.

How accurate are the answers

for the German

customer service bot for the customer? And and in order to achieve that, we did not start with by thinking what flexibility do we need to have in the framework.

We did the

heavy lifting of figuring out what is the right chunking strategy. And even embedding models, we ended up having our own fine tuned embedded models,

because it was German. Yeah. So we started building our own,

embedding models back then, which which proved with better accuracy than the off the shelf models back then,

from OpenAI. So but once we started figuring this out, then the next country came in, which which was, I I believe rightly, it was Croatia.

And then you needed to abstract away what's the best way for the Croatian

knowledge ingestion pipeline to be created such that the data scientists and the people who know Croatian, which is a Croatian telecom,

group, can manage that pipeline. And then we built World Soul into in in such a manner that they can play around with multiple chunking strategies, not on a UI level because these are small atomic units, which was built on so

there is DVC,

the DVC pipelines. Right? So this Woodrow was built on top of DVC, and you have something called as

the the Woodrow steps or the Woodrow routes here.

One of the step is there will be a default chunking strategy step, which is, again, not exposed as a UI. It's it's a simple module, which is a boot sales step, which is a Python program. And

you can bring in your favorite library, which has

better utilities for the chunking strategies that you might want, import it into this unit, and try it out, and then plug it into rest of the pipeline, and it will work. So it's like the UNIX pipes approach,

that we use to build the Woods Hole pipeline. So this flexibility

was crucial if you want to expose it to other people who know the language because, otherwise, you are limited by the engineering group who might know only one language. So that's the way we started to,

expand it to then Hungarian and Croatian so that the data scientists in those group are not working as a black box with some UI drop down chunking strategy, but they can bring in their favorite library into the step and only change that. Rest of the pipeline remains the same.

So based on the work that you did at Deutsche Telecom, you ultimately

created the LMOS

open source project.

And one of the perennial challenges

of releasing something as open source that you first built for a particular organizational use case is that the design of the system as it's implemented

in the business

implicitly encodes a lot of the organizational patterns of that organization, which don't necessarily translate externally.

And so I'm wondering how you approached that translation process of figuring out what are the pieces that are generally applicable, but not so general that they're useless,

and then design it in a way that it can be used as a foundational substrate that other people can build on top of without making it so complicated or confusing that they're never going to start down that path?

That's a very sharp question. It's like the Conway's law design

Conway's law design of a framework. Right?

Right? So two things that really, really worked out well in terms of reaching a just enough point in abstraction is in order to mention this, there's a few things. Yeah. For example, when we started in, 2023,

there were these frameworks. Slack chain was a major one in 2023.

We started picking Lang chain, but almost all our transactional systems and APIs,

the the profile API, the the the billing API,

All of that were built in JVM stack. The second point

is almost the entire operational systems

with distributed systems were built on the Kubernetes

CNCF stack. So you have the Kubernetes, the observability, the Grafana, the Prometheus, and everything. Now comes a new framework which was invented somewhere else based on some use cases. Now how do you fit this in into the people who actually know the data, Do know the domain, and what happens to all the client SDKs that has been built? What's the reason we went back to the drawing board in starting creating Elmos? And and we did not stand out creating Elmos by wanting to create a framework. The problem statement was quite simply this. We we implemented something in Langchain. It took a couple of months, and after that, no one knew how to build it into a platform because it was so chaotic. And how does it actually fit in into the rest of the power stack? So we went back to the drawing board and came down to the realization. Only if we provide

the right amount of tooling to the people who know these APIs and domains and let them build it. That's the only way this will scale. Otherwise, you'll have a new team building something else, and then you need to ask data from this team and Jira tickets and this and that, Conway's form. Now second point is the only thing that you can do is

you cannot state that this is how the framework should be. What you can only build is how to shorten that loop of doing an experiment. Let's say changing something in an agent and how to take it to staging, for example. For that, you need to have a robust pipeline from the time a developer

changes some behavior

in an agent. An ephemeral environment is immediately spun up to test this in isolation.

And this is all coming in from the distributed systems back up, right, from the Kubernetes world. So this is how we approached it and not as a framework, but how do we shorten the feedback loop for testing because no one knows how to build these systems reliably.

So which resulted in the stack which goes not as a framework. It's not Elmos is not a framework. Elmos has something called as ARC, which is the agent reactor as we call it. Right? Like the

the Jarvis ARC. It was built on Kotlin

so that we could build a DSL, which is just enough information

for the engineers who know their APIs who are on Java to build agents, not having to figure out hundreds of new APIs and Spring AI and this and that. Then these agents

need to live somewhere, which is a life cycle management. So we build the Elmos platform

to deploy these agents with one Git push, for example. Right? It spins up an ephemeral environment. And in that Elmos platform, it's entirely Kubernetes based, which means you're not reinventing the wheel. Agents were created as first class citizens there. So you could do kubectl

get agents. And then

life cycle management of an agent is taken care. When you push an agent, you could say,

I'm a billing agent. I can handle billing queries,

and I can handle billing disputes.

I advertise this as an agent. I advertise it to the network when I am deployed into the Kubernetes,

platform. So you use the discovery mechanisms, which is already proven in Kubernetes,

right, through STOs of the world, and then bring that into the Kubernetes registry,

the STO registry. So you're not reinventing

an a two a registry or something like that. It's all

stack which enterprises and operational teams are familiar with. And this,

the Elmo stack is

should be universally applicable for distributed systems teams without trying to do too much into ARC. And Woodzel, for example, is also similar. So we knew exactly where we wanna stop.

If you try to build a UI and drop down with, chunking strategy and this and that,

it will immediately lose the flexibility of bringing in the best, you know, Python

frameworks or stacks, which is on the right pipeline side. So we just stopped at that, which allows

and picked DVC for the point in time recovery and all that. So it's also bet on large scale ingestion pipelines based on Kubernetes,

etcetera. So it's universally applicable in enterprises would be my answer.

Interestingly,

the LMOS

acronym

is language model operating system.

Operating systems

are a ubiquitous concept in computing for decades now. I'm curious what you were thinking in terms of the naming about what you're trying to convey with that concept of operating systems for this language model context

and some of the core

theories around operating system design

that you brought into this framework to enable

such a generalized

substrate for agentic use cases?

Yeah. This is,

this is clearly one of the reasons I'm starting the start up itself. Yeah. Because,

I was so fascinated by the idea when AI took off. Different people saw it as, oh, it's magic. Right? But there's

I I love thinking in terms of fundamental abstractions, either in physics or biology or computational systems.

And,

yeah, I I used to wonder, oh, I was I wasn't there when Linux was born or something.

As in, so it was like a new way when computing was emerging on how do you build programs. For example,

when I first interacted with the language models, it felt like now you have a new microprocessor,

and this is made up of some magical silicon. So instead of

precise,

you know, x 86 instructions, you

can

give natural language instructions,

and it's gonna chunk out some response into your registry. So now,

suddenly, the it it flips.

Oh, this would mean you would need to build

new

programming

and operating constructs

from the scratch to build a new computational unit, which could be agents is was the thought process. So so that shift from language models

as microprocessors

resulted into thinking,

let's build all the layers above starting with how do you interact with these models. And models are going to emit strings as tokens. And these strings and tokens need to

control the program flow, which is a totally different programming paradigm. Right? And how do you build

scheduling on top of it? For example, let's say there is a request coming in, and it need to do some planning while at the same time, the same program need to

give response to a task. So you need a scheduler which optimizes the resources, which in this case is the language models, which could be either cost or unit of time. And and all those constructs needed to be revisited in an and how do you handle nondeterminism

fundamentally in a program? So that resulted in this thought process of

Elmos,

and we thought

Linux had the Tux,

mascot. So

let's, let's bring in Sesame Street Elmo

into this being the next stack for agentic computing. This was a vision in 2023

as in, like, like the Xerox part group. Yeah. This was how most of the engineers joined the team as well when, you know, personal computing taken off and and, Xerox Spark came up with object oriented programming and all that. So we were a couple of engineers

passionate about building something great. And then, oh, let's let's do something. Let's build the foundations for agent computing, and it's called Elmos. Let's build

agent communication protocols, agent computational units. Let's build the scheduler for interacting with language models, while at the same time solve the customer use cases for Deutsche Telekom. So this was the background,

and the storyline behind, Elmos.

One of the other interesting challenges that has become exacerbated

by the scale of capabilities

of these language models and the high degree of variability in terms of their pricing structures and the complete unpredictability

in terms of the number of input and output tokens that are used for the context.

How do you think about reporting on and managing

the

costs and budgeting of these different agents to make sure that you're not going to bankrupt the company in the process of solving their problems?

Absolutely. This is so, essentially, this is also this need to be fundamentally looked at from a computational point of view. Most of the agent building process these days falls into the category of let's use LLM invocations for every computation. So let's assume if you're if if there is a flow when a customer

requests a refund,

then call the refund API, then get the response, then then do the,

account update API, etcetera, etcetera.

What is being observed today is most people are using LLMs

all the time for this invocation. When you take a step back

and and and and and and think about how computation works in nature for energy conservation, we don't try to do we don't try to, you know, reinvent

the same process using our brains all the time once it becomes deterministic

or like the habit formation loop, you you put it into the low energy

execution phase. So

for agents too, this sort of a paradigm need to

emerge is is the way I'm thinking as well. So

instead of invoking LLMs all the time, if the LLM has figured out or the agent has figured out 80% of the use cases can be solved by this deterministic flow, the agent itself constructs that deterministic flow. And for all the invocations

that comes in, uses this deterministic flow and then falls back into this rest 20% flow. But all of this requires rigorous instrumentation

into looking into costs. So, essentially, just like in the in the operating system. Right? For each process, you assign

CPU and memory, and then you do the stamping and observation, the process IDs.

You would need to think that unit, in this case, is agent,

such resource allocation, and then monitoring it, and

then massive observability platforms, intelligent decision making platforms keeping in check on this. So two things. It's like the OODA loop. Right? Observe you need to keep observing what is going on, and then

you need to make the decisions on what need to be optimized. In this case, computation need to be optimized. It doesn't make any sense to do LLM calls all the time. It is soon going to come

crashing is what I what I bet.

And then in terms of the focus on openness and local control for businesses that want to be able to

capitalize on these agent capabilities,

the ability to use local models for more of that cost control or more predictability in terms of being able to freeze or pin to a specific model version so that you're not at the whim of whatever the model provider is going to do under the covers of their API. I'm curious how you see the work that you're doing

complementing

or overlapping with other projects in the ecosystem

such as Oomi or OUMI,

which is focused more on enabling organizations

to build their own foundation models and control more of that element of the life cycle as well.

Yeah. So this is also

one of the reasons. So I'm going into this entrepreneurial journey on on betting on also one of these key constructs.

What we have observed is in enterprises,

once a pattern has been set in let's assume if somebody has built in with the best model available for some use cases.

No. There is no incentive for it to be shifted later. No. In enterprises, typically, once once something has been set, it's very difficult to shift. But the worrisome

point might be all this information

in terms

of the feedback loop. For example, if you had control on

the fine tune models or or getting information on the model outputs that you need to rigorously track, This is a wealth of information that you could use as we move forward to do fine tuning and maybe bringing down the computational cost and also building better intelligence

for your organization. And that opportunity is missed if you keep betting on these large models. So for that to happen,

you need to have a platform or a layer which allows this mitigation strategy to be baked in. So when you call the completions API, instead of directly going into, let's say, OpenAI or Gemini or whatever it is, there should be a way in which

this is mitigated by that semi proxy layer, which allows quickly shifting different models to be plugged in. And, essentially, as OpenAI recently has shown with the responses API, which is a higher order API, there is a beautiful example where OpenAI shows with three lines of code, you are writing a call command, call, and say, add

this

toner pad to my shopping cart, to a Shopify

MCP server tools which are plugged into the responses API. This is the only only, input that was given, and you configure the MCP server. There is no additional code. It does the agent decorecastration,

calls the search a product, add a product, and,

check out

a shopping cart. All of that,

by that agent deco. This happens

underneath the platform. So if organizations

start to use these APIs, it is super simple. The the the simplicity of such an API and the value it brings in is tremendous. You don't you could write any any number of use cases in a in any given day. But the problem is if you don't have the mitigation strategy

of how your organization is orchestrated

to to to some some black box API,

you will miss out on a lot of information which you could have preserved for the time when model tuning is gonna become much cheaper,

and you would soon be tied to these large model companies and just consuming

some black box higher order API. So the layer that I'm also building is something like the responses API. One of the components that we are building is this one, which allows the same OpenAI like structure, but it allows different models to be plugged in while you can be deployed on your premises. And this data is immediately accessible for you to fine tune your models. It's immediately traceable

with the same simplicity as OpenAI that in four lines of code, you can build a number of orchestrations.

One of the other interesting elements of this new world of agent and capabilities

is the broad applicability of the problem spaces that they can be applied to. And I'm wondering how you see the potential for something like Elmos to

be employed in the context of a data team to be able to build their own agents to help to manage some of the pipeline design implementation,

do some of the, maybe,

analytical queries that you want to expose to the business or some of the other end user

capabilities

that are particularly interesting or innovative that you have either conceived of or seen in action?

Yeah. So, essentially, there was,

recently a meetup for the Eclipse software defined vehicle group in, I think it was in Copenhagen. So where we demonstrated

with Elmos, so the vehicles

are emitting telemetry data. So the software defined vehicle group is a consortium of some of the major companies in Europe. They're focusing on telemetry data emitted from vehicles. Yeah. Software defined vehicles. So we demonstrated

a simple agent, which was built with Elmos Arc, the agent developer framework, the agent,

reactor framework, which could connect to this telemetry data. It was a time series database, and the query building was being done

by the agent using OpenAI APIs. But the fun part is so you could ask questions like, how many of my

vehicles are running low on fuel, for example, amongst maybe thousand vehicles, and all of that being returned in a few lines of code. So, essentially, what this actually means is for data analytics teams, it is all about querying.

Right? Because the query language, it could be a SQL or

a cipher or, a number of approaches are there. But this acts as a layer on top for them to quickly build this dynamic

query creation agents

even though the underlying systems won't change. You don't have to change your Athena. You don't have to change your Redshift. You don't have to change your in any of yours, your your snowflake.

You could

build an agent that constructs these queries because language models are also good in understanding the existing querying languages.

And, it could be done pretty easily,

as we have

seen so far.

And in your experience of building these agentic frameworks,

figuring out how to

manage the creation and maintenance of the context corpus to enable these agents to work effectively?

What are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

Yes. Andrew Karpathy, you recently

pointed out,

yeah, every engineering information engineering is gonna converge into context engineering.

So it's a man.

Simple questions are easy to handle, but as the system grows in complexity, how do you pass the right context to the same agent or different agents? Yeah. There are many approaches which are emerging. Nobody has fully figured out this pattern yet because as it grows and scales, this this is gonna become challenging. But one of the things is if you have a platform

or a layer let's take an ecommerce organization,

a company with champions commerce. If that company either if different departments starts to build different agents, then already

what you would end up is today's

spaghetti systems.

You would lose out on on deriving the true value of AI because the context is now segregated. So the true value of AI in enterprises is going to emerge only from unifying

at least the foundation for the context so that other agents can be built. What that would mean is a big shift in enterprise architectures.

The microservices

world

taught that small is good, small, nimble teams. They can go and do whatever that they want. But with AI, there is a problem because if you don't

have a unified view of the truth of your enterprise

stored somewhere,

then

the agents that are being built will not be as effective as

if it had not been, which brings in the need for a core

context

and model management

platform

where you connect your enterprise tools so that your different departments and engineers can build easily without asking for permission.

Hey. Can I use your department's data? Because what will matters is the business outcome. Suddenly, your orchestrator,

if you say, I want to optimize my profits,

for the teenagers,

what suggestions exist. Now if the platform has the tools on, you know, you know, sales projections

analysis tool, then

buyer patterns tool, then

inventory tool, and market analysis tool. Then suddenly with this question, immediately, the agent orchestration picks in. It's able to come up with immediate results.

So this layer is the essence. If you're not building your organization

for agent dec orchestration,

that is the definition

of AI native as I would call it, the architectural

equivalent

of an AI native. What is the definition of an AI native? How to prepare your organization to be a native? Models are gonna get better, cheaper,

New paradigms will emerge in programming. But one thing certainly is not gonna change, which is you need to have this context building platform layer for the world which is going to come, And and that is going to be the core of your AI native transformation journey.

And I think to extending on your

reference to the microservices

architecture

is that we went through a similar problem where as you go from the monolith to the microservices,

you still have to figure out

where that state gets created and maintained and how it gets distributed.

And then that also leads to all of the complexity of

reconstituting

all of the relevant context

in the warehouse

context where you pull the data out of all of these different microservices

systems, and then you have to figure out what are the linkages,

recreate some of the API based connections

at the data layer to figure out how to bring it all back together to make it semantically meaningful. And we're in a similar stage with these AI systems where if you have that warehouse catalog, you can use that to a certain extent, but we need to figure out how do we feed a lot of these

LLM interactions back into that to help maintain and grow and evolve the corpus

without having it shard into these different domains that then have to be reconstituted

if you even know that they exist in the first place.

Absolutely.

Absolutely spot on. And that's what most people are I think it's it's from what I've seen and and some of the stories from other places, the spaghetti that we have seen in the previous world, it's only going to get multiplied.

Everyone is building three agents on top of the existing microservice.

Now the microservice is now four

four units of computation,

and no one knows the context anymore.

My

agent can only respond address customer address.

Then then now you think about what protocol should I use, a two a, to connect to the order order thing? And and now you have scrum meetings, and it's insanity.

Alright. For people who are figuring out how to

build their own enablement layer for these agentic use cases in their organization,

what are the cases where you would say that Elmos is the wrong choice?

So Elmos has multiple components. So let's assume if you're so Elmos itself has the agent framework per se, which is around Kotlin. So if you're a non Kotlin JVM organization,

it doesn't make sense to use that agent framework. But,

but then

the platform layer, it's still an incubation phase. It's standard Kubernetes.

It at least will provide a lot of inspiration for you to pick up irrespective of whether you run Python agents

or so. So organizations who are not into JVM and Kubernetes

or,

if it's entirely based on Python or c sharp, then it might not make sense. But there is still a lot to learn,

on how to manage the life cycle of agents. But that's also the reason why the learnings

and also where it goes is it's going beyond agent frameworks and which is one of the components which is built under the new startup is is, is, is the Mesioc open responses component.

So, essentially,

it does not restrict you

from using any framework. As I described earlier, it's the complete the responses API just like OpenAI.

It provides once it's deployed this component, every agent framework that you can think of, OpenAI agent SDK

and,

AutoGen or Kotlin or whatever should work

because

this addresses

this problem of context building for large scale enterprises and model switching. So it goes just if if if the OpenAI developer platform,

if it were to be deployed on your premise, what would it be like? That's that's the way we are thinking. That's the essence of what we need to build in your organization,

irrespective of whether you use this component or not.

And as you continue

to contribute to Elmos and continue along your own venture that you're building, what are some of the new capabilities that you have planned? What are some of the

ecosystem

evolutions that you're paying particular attention to

or any of the capabilities that you're particularly

excited to dig into building and growing?

Yeah. I think the on the Elmos side, there are now two initiatives.

One is Elmos, which is on under,

the Eclipse Foundation,

and we are listening to the industry

as well on what is actually required and not try to make it bloated to become,

you know, let's add this feature or that feature. So it's kind of at least the agent framework is very stable. Kotlin is used in production. The Elmos platform, we originally built it as you should be able to deploy any agent to be Python or this into and and convert them into

agents as first class citizens in Kubernetes.

This, we are

rethinking the approach because

if you try to fit for everybody,

it will never work. So we just wanna convert that into a minimalistic registry based on Kubernetes, which does ex external orchestration.

So

agents

for JVM stack, Elmos should hold true for today. I would say it's it's it's one of the best frameworks out there, which allows business

and engineers

to allow the same thing, which is very rare because a business is able to use if you're using Eleazar,

the business can define the use case. I want to define the billing dispute use case. The business would write it in English language,

and we are able to do it because if you just randomly write prompts, it will never work. So you need a runtime

for the English language,

meta language, which we created, which is referred to as ADL in ARC, so it works. The part which I'm most excited about is is this

open responses

component,

which is being built by my cofounders,

Jasbir and Amanth.

They are building this component

to become, like I mentioned, that that orchestration layer for models and context building layer for the whole enterprise

so that your developers, your departments,

no one is gonna change the department's structure. They should still build agents on their favorite stack, but the minimum guidance

is use

to to to prepare yourself

to change models quickly

and don't lose context. You should have the ability to connect to any number of tools, and don't you bring your siloed tool registry and all that. That layer is called the Mosaic,

open responses layer. And by the way, Mosaic is the name of

the company I'm building, and,

Mosaic stands for multi agent systems

MAS. So this is the thing which I'm most looking for. It's also open core, open core model as open source, then building on top. Yeah.

Are there any other aspects of the work that you're doing on Elmos or Mosaic or the overall space of enabling agentic use cases,

all of the data requirements that go into it that we didn't discuss yet that you'd like to cover before we close out the show?

Yeah. I think we went into the idea of orchestration. Yeah. So I would say everything revolves around the idea of orchestration from here on. And the idea of orchestration is simply feedback loops and feedback loops of experimentation,

like in real world, yeah, as in how the the Apollo model versus the SpaceX model.

In Apollo model, failure was not an option, and,

SpaceX model

failed fast and learned. So orchestration is

a programming engineering, systems engineering view, which is gonna transform organizations.

And it's more than a pro programming construct. It's a mindset

construct and behavioral change inducing thing

is the thing which enterprises

need to think through as well if you try to retrofit

your existing processes and and your ways of working

into using any framework and AI, it will not work. It should be used as a lever to become space excess of the wall so that you focus only on one thing, which is feedback loop shortening and learning and reapplying. That's, AI nativity part, I would rather say.

Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

I think I have my personal

opinion on see, first of all, I have seen large corporations try to see essentially,

it's time to take out things rather than adding more things. This is

this is the way I would rather see it. Yeah. Every tooling that has come in has resulted in there is transactional system, then there is data pipeline, then there is some other team sitting somewhere building some new other tools. And when a business stakeholder asks, hey. What is going wrong?

Now you're deracing Jira tickets between five teams. And every team would say I have the greatest framework and the tool. So I think it is time for a cleansing, and it's more like adding more but taking things out. For example, can there be new kind of systems that can be built, which can be queried both transactionally and operationally because agents might be able to reconstruct the truth,

using computation

as well, for example. So I would if I were to frame it as one statement,

it's over engineering

has crept up the data space as well. The it's time to clean it up, and AI provides a good lever to do it.

Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing on building the Elmos and

the thought that you've put into how to enable the creation and maintenance of these agentic use cases at organizational

scales. Definitely a very interesting and complex and timely problem, so I appreciate the time and energy that you're putting into making that more tractable for everyone else. Thank you, Topias. I really enjoyed the conversation as well. So thanks for the invite, and thanks for the thoughtful questions.

Thank you for listening, and don't forget to check out our other shows. Podcast.net

covers the Python language, its community, and the innovative ways it is being used. And the AI Engineering Podcast is your guide to the fast moving world of building AI systems.

Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com

with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.