The Data Model That Captures Your Business: Metric Trees Explained

Hello, and welcome to the Data Engineering podcast, the show about modern data management.

Data teams everywhere face the same problem. They're forcing ML models, streaming data, and real time processing through orchestration tools built for simple ETL.

The result, inflexible infrastructure that can't adapt to different workloads.

That's why Cash App and Cisco rely on Prefect.

Cash App's fraud detection team got what they needed, flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows.

Each model runs on the right infrastructure, whether that's high memory machines or distributed compute.

Orchestration is the foundation that determines whether your data team ships or struggles.

ETL, ML model training, AI engineering, streaming, Prefect runs it all from ingestion to activation in one platform.

Whoop and 1Password also trust Prefect for their data operations.

If these industry leaders use Prefect for critical workloads, see what it can do for you at dataengineeringpodcast.com/prefect.

Are you tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that time line by up to a factor of six while guaranteeing accuracy?

DataFold's migration agent is the only AI powered solution that doesn't just translate your code. It validates every single data point to ensure a perfect parity between your old and new systems.

Whether you're moving from Oracle to Snowflake, migrating stored procedures to DBT, or handling complex multisystem migrations, they deliver production ready code with a guaranteed timeline and fixed price.

Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold

to book a demo and see how they turn months long migration nightmares into week long success stories.

Your host is Tobias Macy. And today, I'm interviewing Vijay Subramanian about metric trees and how they empower more effective and adaptive analytics. So, Vijay, can you start by introducing yourself?

Hey, Tobias. Great to be here. I'm Vijay Subramanian.

I'm the founder and CEO of a metric tree based analytics startup called Trace.

And do you remember how you first got started working in data?

Oh, yes. Yeah. Back in, 2010,

I joined the then seed stage startup called Run the Runway

to head up all things data.

And, that became almost ten year journey to IPO.

And I had, like, a front row seat

to what, is now considered the modern data stack, the full birth and the evolution of the modern data stack.

And,

I don't know if you remember back in 2010, the rage at that time was the Hadoop ecosystem.

So that's how far back that was.

We actually went with Vertica early on. We had a Vertica on prem data warehouse

before we migrated to Snowflake.

We had all these, custom scripts for data pipelines before we tried this tool called Luigi,

and finally settled on DBT.

And then early on, all of our reporting was just spreadsheets before we moved to Tableau and then Looker. So we were basically the poster child for adopting the modern the modern data stack. And it's precisely

for that reason because I had a front row seat to seeing

the innovation that was happening

to enable the data producer to be able to ingest data, build the pipelines, maintain the pipelines.

But I didn't see as much happening on the consumer side is exactly why I'm doing what I'm doing now. In fact, I would even argue that enabling the producer almost led to a proliferation of data assets, so dashboards and data models

without regard to the consumer and how they're gonna piece together everything in order to make their workflows work.

So, we're hoping that MetricRease is that framework

and that platform on which we can regulate this and also really provide value to the end consumer.

Yeah. Absolutely. It's definitely interesting how

the late twenty tens into the early twenty twenties was really a sea change in terms of

investment

in and application

of data

at a broader scale than what had previously been the case where,

for a long time,

organizations would have some form of business intelligence. It was generally fairly small scale, limited, focused on

a very small set of

data sources largely driven by their core applications.

And

then the introduction of Hadoop brought in the idea of big data of just capture everything, and, eventually, it'll be useful. But the investment required to actually run an operation of that scale meant that it was largely limited to the big tech players and early adopters of the enterprise.

And then as we moved later into

the late twenty tens, things like Redshift and Snowflake helped to

reduce the barrier to entry for data warehousing and business intelligence, led to what you mentioned as far as the rise of the modern data stack as well as,

introductions of things like

Fivetran,

Airbyte, all of these data ingest tools that made it easier to onboard new data sources. Yet, to your point, led to this proliferation of data without any real solution of how to actually integrate it and make it useful.

And going back to the point of data warehouses

in the nineties, there was the introduction

of the Kimbell style star schemas.

There was Bill Inman with his third normal form data warehousing.

I think it was maybe early to mid two thousands was the introduction of Data Vault. So we had all of these frameworks for, in the abstract, how to organize data into

reusable

structures.

But, again, that required a lot of investment. It required a team who was very well versed in those practices

as well as being very well versed in the business and how it operated.

And I'm wondering if you could, given all that context, describe a bit about how the idea of metric trees fit into

this evolutionary

ecosystem,

where we are today with this proliferation of data sources,

the challenges of being able to actually activate it, especially now that data is being applied to a much broader range of use cases than just retrospective analytics.

Yeah. Exactly. Yeah. I mean, that that was a great that was a great timeline there. Yeah. I think the in in fact, I don't know if you follow the timeline.

We kinda got away from data modeling for a bit in the modern data stack. We just thought we'll throw all the data in. The compute is cheap. Storage is cheap. We'll just run whatever we need to for the business.

And we got away from a lot of the principles, which I think we're coming back to now. And if you really read those classics, if you read Kimball, Inman,

they were actually obsessed with the idea of gathering business requirements

in order to do the data modeling. It wasn't just go and build a bunch of data models.

Everything derived from what the business needed to do and working backwards to build the data models. So if you look at it through that lens, Metatree is just simply an evolution of that concept.

Because, you know, you can imagine going to

the growth team and saying, what are your

five or six different output metrics you care about, and what are the factors that drive that? Let's design those. Those become your metric trees. And then now let's work backwards to design the data models you need to populate those metric trees. Because you really you're really going business first before you think about the data.

And I think that's from that lens, I think metric trees are very logical evolution of what

data modeling can be.

And if I were to sort of, like, step back and sort of define met what metrics are before we dive, you know, far you know, deep into it, that is their producer angle. Right? And the and from a data producer angle, it's just a next

evolution of data modeling. Right? It's just maybe it's the final frontier of data modeling. From a consumer standpoint, a metric tree can be seen as a

a metric template that captures the business process or the business model.

So, you know, the way I think about it is sort of combine these two concepts, and I like to define metric trees as

the data model that captures the business model.

So that that's sort of how we think about it. And so yeah. So, you know, why are we doing that? Right? Like, what's the purpose of of doing all this work? If you could capture

the business model through data and you could capture that in code

and all of this work you mentioned. Right? I mean,

we are spending so much time today, tedious work, manual work, repetitive work, time consuming work, even boring work, trying to understand what is happening to my metrics. Why are they up? Why are they down? How are we performing versus budgets and plans, OKRs?

What does the forecast look like if things stay the course? What does the forecast look like if things change?

Are my experiments working?

Are my features working? What should I be doing if I move this metric x? What does it do to revenue? So all of these questions that sort of animate the organizations,

we could actually start streamlining them in software with this data model that captures the business model. And in fact, the most ambitious vision would be, can we automate them? Because, you know, in in sort of, you know, what if what if there are a

certain set of metric tree templates that govern the various business models in the world? And what if there is a finite

contained list of analytical

operations or functions, if you will, that can be applied to the metric trees

that can actually power the workflows that users need. What if that were true? Right? That's sort of, I think, the where I think I'm excited about the purpose of metric trees.

And metric trees are

another layer that builds on top of those dimensional structural elements of warehousing.

And in terms of the nomenclature,

they

beg a comparison

to the introduction of the idea of the metric layer, also known as the semantic layer, also known as headless BI,

that became

popular

in the 2021

to 2022

time frame

and gained a lot of popularity in the initial introduction, has since faded a little bit in terms of the hype around it, but is still very much part of the data ecosystem. And I'm wondering if you can talk to the ways that the work that you're doing with this idea of metric trees relates to

the principles as well as maybe the technology layers that were brought in with those concepts of the metric layer or the semantic layer.

Yeah. Maybe the best way to illustrate this might be just an example so we can make this whole thing more concrete, for us, for the audience. So let's take Uber or Lyft because we all know that how they operate.

And in Uber or Lyft, the key metrics would be number of rides, number of riders, revenue.

Right?

And if you were to if you had a semantic layer or a metrics layer where you could define these metrics, you could then ask that, system to give you, hey. Give me the number of rides in New York City in the month of September, and that system will generate the sequel and give you the output. Right? That's sort of the the purpose of a metric layer or semantic layer. In a metric tree formulation of this problem, you would define, you know, individual metrics don't really play a role, right, because you're defining the metric tree

and the output metric, let's say, is revenue, and you would define that as a function of number of rides,

ride frequency,

average price of the ride, maybe a promotion rate if that's applied, the take rate that Uber takes, right, you know, one minus the commission that they're paying to the drivers. So you express this function and you say these five, six different things ladder up to this thing called revenue. Now that's what a metric tree is. Now you can obviously see that a metric layer, if you already have built it,

that could be an input to hydrating and populating a metric tree. Right? So if you already have defined this metric layer

and you could access that in a nice way, then you could populate the metric tree. But you also don't need

a metric layer

to populate a metric tree. Because you can define a metric tree, and you could push a lot of these metric definitions into the metric tree system itself. You don't need a separate layer to define the metrics in order to populate that. So the way I sort of think about this this concept is, like, if there is a standardization in the in the ecosystem where there is a standard, metric layer, then, yeah, that can be consumed to hydrate and populate the metric tree, but you don't necessarily need that in order for a metric tree to operate on its own. And I should make this point because it is also a question of purpose here. Right? A metric layer's purpose

is

almost sort of self evident. Right? You define the metric and then you ask for the metric and it gives you the metric, and that ends the purpose. But as a metric tree, if you simply hydrate the metric tree, that is just the that is not sufficient in my opinion to to to write value. It's just a visualization of a bunch of metrics in a canvas with a bunch of boxes. Right? I don't think the value is yet realized

unless you're building these analytical functions I spoke about, unless you're building a real workflow around it. So so I do think there's a lot more to be built around metric trees than what the metric layer ecosystem today has.

Another interesting aspect of this

question

of metrics, particularly in juxtaposition

to those dimensional schemas that have been the bread and butter of data warehousing for the past thirty years now, is

if you're doing the dimensional modeling

correctly, then why do you even need a separate representation of metrics? And it also introduces

comparisons to things such as the differentiation between a data warehouse and an OLAP server as well as comparison to practices such

as master data management

as well as data governance policies and how that reflects to the technical representation and just the idea being that

raw data is only useful if you contextualize it. The way that you contextualize it is to

compare it to the questions of business process.

And then the canonical example of why you need effective definitions of metrics is that different people across the business

have a different concept of what a specific phrase might mean where customer has a different,

meaning to a marketing team versus a finance team versus a sales team. And so

the business events that cause somebody to go from a prospect to a customer

have different thresholds across those different use cases. And I'm wondering if you can just maybe wrap that altogether and

explain why just the practice of dimensional data modeling

is not sufficient, or what are some of the aspects of that as a general practice and policy that

leave gaps

that necessitate the introduction of a new concept and a new technology layer.

Totally. I think this sort of yeah. Did it just wrap everything? I think it's really it gets down to the the purpose question. The Kimbell framework, the Inman framework, they were all designed

to thoughtfully build data models with the use case being that a business is able to retrieve

flexible calculations of metrics, right, inside, like, some BI tool, for example. So yeah. So give me revenue in New York City, for, you know, for Uber. Right? Give me rides. Give me riders. Give me for September. Give me the the rolling twenty eight days. Cut it by UberX versus Uber share. Like, all of these various

cuts can all be provided

in a well governed way, but that was the end purpose. The purpose is can I generate these metrics,

which is really what the metric layer is also, in in a sense, trying to do? Right? If you look at it through that lens, the metric layer is just a formalization

of

on top of Kimbell,

a structure that also captures the metrics itself as a concept. So it's not just facts and events and dimensions. It's also metrics and dimensions. Just like one other it's one layer further up in terms of abstraction.

Whereas I think of metric tree as really getting as close to the business model as you can. So back to the point of it being a data model that captures the business model.

And now that is a back end

on which we can build analytical functions that can actually directly affect the workflows of the work that's being done around data in the organization. So it's just so I don't think, like, Kimball, there was any drawback to that. It's just, like, that framework was useful to generate metrics.

And now we're talking about

users wanna activate the data to use your to use your phrase. They wanna use the data. They wanna get insights quickly. They wanna make decisions quickly. And how do we make all that happen? And the answer is we gotta keep pushing the attraction, keep pushing the frontier of modeling forward, and that's where metric trees really fit in.

And so digging into that purpose and the

definition

of the metrics and their relations

and how that factors into

the questions that the business is asking and needs answers to.

What are some of the strategies

that you have found most effective and that you recommend for teams who are in the early stages of trying to adopt metric trees and understand

what are the

definitions, how do they relate to each other, how do they ladder up from the very fine grade events into the more abstracted,

contextualized

business questions that are being asked about the data that underlies those calculations?

That's a really interesting question because,

if you talk to business folks and I, you know, I I helped a start up go from seed stage to IPO, so worked with the business extremely closely. And you talk to any business folk, you will and you talk to them about, you know, what are the drivers, what are the input levers you have, what are the output metrics, the KPIs, and have this conversation.

You will realize very quickly that they already have a rough mental model that kinda looks like a metric tree if you were to, you know, think about it that in that context.

They wouldn't call it a metric tree. They definitely don't have it fleshed out and thought through and refined it, but they're already thinking of the connections. If they think of these metrics that they are tracking and how they're sort of connected, and that's what they're trying to accomplish when they log in to a BI tool and look at a metric and then hop into another dashboard and apply a filter and look at that metric. And so they're kinda like dancing around these dashboards or spreadsheets

in a way sort of tracing the connections between the metrics because they know these are all laddering up to something important. Right? And there's some connections between them. So, really, the job to be done here is not anything radically new, but it is sort of like in fact, pushing the frontier of how Kimbell and Invent's and those classics thought about it is, like, can we work with the business

to define and flesh out how they really think about all the different metrics and the levels and how it ladders up to the outcomes that they care about? Because really, in a sense, the reason why the dashboards are proliferated is because

they're sort of, like, making these requirements come piecemeal. Hey. Give me this cut of this data. Give me this cut of that metric. And then it starts to be starts to grow and grow. But, really, there is a connective tissue underlying all of that, which actually is a framework

in which they are thinking about what are the input levers, what am I moving, what is what metrics are moving in the output layer. So I think the job to be done is relatively simple, but simple to say, but kinda hard to do in organization, which is really

have a very active conversation between data teams and business teams and flesh out

what are the metrics, how are they connected, what are your levers, what are your outputs, and keep refining that mental model as you learn more and more about the business. That's the job to be done. In fact, by the way, a financial p and l, which is probably one of the oldest data models you've ever built, is actually a metric tree. It just ladders up a bunch of concepts to drive in the net, you know, margin and EBITDA. It is it might be the OG metric tree. Right? A financial model, a financial p and l. And if you go to any

FP and A team at any reasonable scale of an organization,

they have a sophisticated financial model, And they're operating on that because it describes how they make money, how cash

flows. And so all I'm all we're talking about here is taking

that model, taking that discipline,

and making and and ensuring that we can do that at all these different functions, sales and marketing and product growth, if you exact team and how they think about metrics. So just take the discipline of modeling the business and all these various functions, operations, right, and making that come to life is really what we're talking about here and working with the business on that.

You mentioned too that a dedicated metrics layer as a separate technology implementation

is not a requirement

for

creating

and using metric trees, and I'm curious what your

overall recommendations are as far as the storage and technology implementation

of the metric tree. Is it just another set of tables within an overarching data warehouse?

Is there some other

access layer or

security layer? Or what what are some of the considerations

to go into as you're starting to think about, okay. Well, metric trees as a practice seem beneficial.

I'm going to go ahead and define what are the different

calculations and how they roll up into the top level events that I care about for being able to then also support this drill down approach

and just some of the overall planning and implementation

details that go into actually

creating and activating these metric trees in the business context.

I should mention upfront that we're very early in this, and we're trying to figure the you know, we're gonna figure all this out as as the market evolves. I think the it's a very important question that you just asked because it gets to the

utility question. Like, why are we doing this work?

Why are we storing metric trees? What are we generating? How are we retrieving it? What are we gonna do with it? And I think it goes back to the the point I made earlier about the analytical functions that we can operate on top of the metric trees to provide value to the business. Right? So through that lens, I mean, maybe the best way to think about this is let's just do an example again. Let's go back to the Uber example.

Let's assume you had a metric tree of revenue

with,

with these terms. Right? Rides, right frequency,

average price, commission rate, etcetera. And let's say revenue is down 5%,

and you wanna understand why well, what is driving that. So, a, you have to first now apply an analytical function to see which of these terms are driving that 5%. And so now let's assume that the right frequency is the driver, and that's explaining 4.3

of the 5%, hypothetically.

You may wanna go to the right frequency and say, well, how does it vary by all these various markets?

Right? New York or, you know, in The US, we you know, globally. And when you do that, you you also may wanna know, is it because you have certain markets where the frequency is going down? That's where the frequency is going down. Or is it because markets are shifting around? And a higher frequency market like New York is going down in market share and a lower frequency is going up, and that shift alone is causing the overall right frequency to go down. Right? So these are the things that the business wants to understand instantly.

So to do that, take that question and apply it in terms of how you would think about storing and retrieving metric trees. A, you need to first define the core revenue tree, save that somewhere. Okay. This revenue tree has these terms. You need to be able to apply an analytical function to that to figure out which is the term that's driving that.

Then take the that term and mutate the tree because you're now saying, okay. Let me take the auto frequency and mutate it and further

decompose it, if you will, into all the different markets or maybe into UberX and Uber share. Right?

So you wanna mute it and then apply another another function to further explain, you know, what is what is causing the frequency to go down. So you can see that if you really

look at it this way, it is not sufficient to just generate the metrics

and populate the metric tree. You really need to think of the metric tree

almost as a back end.

The the metric tree data model is almost like a back end to a system, to a software system in order to empower the workflows for the consumer. And I make this point because, I have definitely seen out there people talk about metric trees and they will you can see a visualization of a bunch of boxes of metrics and arrows and they're they're populated.

Yes. You can run SQL through them and populate these boxes. And that is there's some value to that to seeing all the metrics in one place. You can think of it almost as a dashboard or dashboards. Right? Like, you know, you have all the different dashboards in one dashboard. But I think the utility is fairly limited

unless you're operating on it, unless you treat it as a unless you can apply analytical functions to it, unless you can mutate it, unless you can do things with it in an interactive system. So

my view of it is that that that that's why I said, you know, metric,

metric layers can be useful

as a mechanism to populate the data. Or, ultimately, you have to treat the metric tree itself as a key data artifact that can be a back end of a larger

system.

To your point of the metric tree being something that can be used as a dashboard of dashboards, it also brings up the question

of

what are the systems that are going to be interacting with the metric tree? Is this just an optimization to the business intelligence dashboard where you're doing this retrospective analysis, or maybe you're using some sort of forecasting model to give projections in the dashboard, but, ultimately, it's a very static and

manual process of interacting with that data? Or do you see metric trees as something that feeds into

a broader range of data consumption use cases? And what are some of the ways that you're seeing metric trees feeding into the broader applications of data maybe beyond just these static dashboard focused use cases?

Yeah. That's sort of the key question here. Like, how how are things gonna play out? I think so the way I'm thinking about it again is, like, well, let's take all the workflows that users wanna do, and let's try to use metric tree as a back end that we can use to empower these workflows.

Can we automate them? Right? That's where the ambitious state here. Can we automate this work? How much can we automate? So in that sense, yeah, it is almost like we're taking all the work that's being done today

manually in within BI tools or spreadsheets and all of these things, and we can build workflows around it.

Now so does that mean that we're the Metairie will be a back end that feeds into existing tools and they're gonna adapt that framework? Or is it that we need a a new set of tools that will do these functions that we're talking about? Right? Obviously, our vision is that we wanna be that place where we can do all those functions, but you you know, you never really know how it's gonna play out.

I mean, you already know that there are players out there,

like, older tools. Like, we maybe even take a a mixed panel, for example,

you know, a tool that does web analytics, bot analytics,

and they are releasing metric tree

into their framework. Right? So so, yeah, we just have to see how this all plays out. Like, whether this is a another input into another tool or whether this actually can be a net new thing on top of the BI infrastructure. My honestly, if you would ask me for my grandest vision, I would say this probably sub plans BI. Because BI had its,

the BI as it formulated today, which is a a catalog of metric dashboards

that can be created and disseminated,

will be replaced

by analytical functions that are directly embedded within the users the business users workflow. And what does that look like? That's, I think, what metric trees can enable, and we have to see how that all plays out.

Because of the fact that metric trees have that relational aspect to them as well, relational in terms of

the graph sense, not just the, you know, relational database sense, It introduces

comparison to things like GraphRag where the introduction of

a graph and the

connections and the relationships that objects have to each other

enhances the ability

for things like generative AI models

to perform better reasoning tasks across that underlying data asset. And I'm wondering how you're seeing the utility of metric trees as a means of

context engineering and context management for these generative AI cases that are maybe doing some of that question answering, whether that's text to SQL or talk to your data use cases or if it's something where you're using those ML or those AI models to generate new derived data assets

using those metric elements as context to understand

what are what is the actual resulting context

that it outputs beyond just being some scale or value.

That's a super exciting topic. You know, we've been chasing this,

holy grail of self serve in data for a long time. And what's interesting to me is how self serve has always been formulated as self serve data. Right? You're going to

have a business user be able to ask some question to retrieve some custom dataset, and this back end system is going to find the right tables and do the right joins and aggregations and filters and write the SQL and give you back a dataset that satisfies your request.

But the interaction is still, like, give me give me some data. And I can tell you, you know, you know, I've been in this space for two decades now, and the vast, vast majority of business folks

really do not want to spend hours porting over datasets, trying to make sense of it, doing their own analysis.

They just want

very

contextual

insights into what they need at that moment. You know, what is working, what's not working, what do I do next? So through that lens, I'm thinking that this text to SQL,

ask your chat with your data kind of formulation of

AI is much more in that vein. Right? Like, I can ask a question and then AI is now going and doing all this work and it generates the SQL and gives you the data. But is that really the right problem to be solving? I'm really not convinced, honestly.

Not to mention the complexities of doing this in a in a deterministic reliable way and making sure that the numbers you're generating are accurate. Is that even the right problem to be solving? So the way I'm thinking about generative AI

is, can these

let's just use the word agents because we haven't used the word agents in almost half an hour in a podcast, and that's and it's criminal. Right? So we gotta say AI agents. So I'm thinking of AI almost as,

if you give it this graph, as you said, all of the the graph which actually represents the business model. Here are these, like, you know, 15 different ways in which the business thinks about its metrics and how they all relate to each other. And you give it access to the analytical functions we talked about. Can it actually can different agents, you know, blend these two concepts together and perform workflows for the business user? So that's that is the vision that I'm that I'm actually pretty excited about, that AI is not

necessarily in the business of going to the raw data and trying to write some SQL and generate some data, but it's actually

using

the framework of how the business thinks about metrics and the kind of operations that need to happen and instead of cobbling these together to drive a workflow for the business user. So that's sort of the the the division that, we will we wanna work towards, and I think that I think is probably the more exciting problem to be solving than, can we do some text to SQL and give the business users some datasets.

The other

aspect of this is, as you mentioned, there is a purpose to this, but it's also not free. It requires

work in terms of the initial definition and creation of these metric trees and the underlying metrics and the computation and derivation and storage thereof.

It also,

once you have it in place, requires ongoing maintenance and monitoring and validation as well as occasional pruning

as either you discover that an initial metric that you thought was useful

turns out to not be useful or the business evolves such that those metrics are no longer relevant to the questions that are being actively

asked

and sought after. And I'm wondering if you can just talk to some of that overall

life cycle and workflow management

of metric trees and

maybe some of the places where you're seeing the responsibility

lie in the organizational

structure of the

investment in and maintenance of those metric trees as a core organizational data asset?

Yeah. So I think the the producer angle is a bit probably a bit more clear and straightforward.

So because you can think of the metric tree artifact

as an extension of the data pipeline, if you will. So you have your ingest. You build your dimensional models, which we talked about. You have your facts, your dimensions.

And while a metric layer will build maybe metric cubes, for lack of a better word, a metric tree is building a metric tree cube.

It's generating all these different metrics and how they relate to each other,

and that could be viewed as an extension

of the existing data models that are being generated.

So so through that lens, the the creation, the maintenance

of these assets will follow sort of the same principles that's already been

established in the

in the data domain.

You know, you care about things like, is the data up to date? You know? Is it fresh? Is it, is it running reliably? Is it accurate?

And so it's sort of the same ideas that we've been you we've been working on for the last decade will still apply to those assets. Now the again, if you wanted to think about, you know, what if I change the metric tree? What if I build a new metric tree? So the same sort of ideas will apply. Right? Now you have to generate a new asset.

Can you borrow from existing met you know, nodes in the metric tree into this new metric tree? Can I you know, as opposed to, you know, recreating the business logic again for that? So So the same ideas again will apply there because, you know, you can think of a metric tree actually because it is a graph. It has DAG like properties. So if you're deriving a second metric tree that can be derived from the first two metric trees, then you wanna be able to borrow those nodes so so you don't have to rerun the the data for those nodes.

So in a sense, actually, it kinda, like, falls nicely within that within the work that we've been doing for the last decade

and builds nicely on those principles.

Digging now into

some of the challenges and edge cases of metric trees,

it's

definitely very easy to see the value when you have clearly defined hierarchies

of events that roll up into one another

or things that are easily attributable based on a cause and effect relationship.

But in business, as in life,

there are numerous situations

where you maybe have a correlation but not a causation, but you still want to try to understand

what are the correlated activities that roll up into some higher order

insight. And I'm wondering if you can just talk through some of the ways that

you think through how to manage some of those fuzzier relations

of events and how they roll up into metrics and some of the

nondeterministic,

but at least tangentially related

business events that you want to be able to report on.

Yeah. I mean, I do get that question a lot, and it's surprising

how often it comes up in conceptual thinking. But when you go and work with the business, you realize that 95%

of how they are operating the business is actually

modelable

through relationships,

whether it's mathematical, whether it's two dimensional cuts, or what have you. So, yeah, I mean, I would probably, you know, put that in the category of the known unknowns to use the Rumsfeld for to phrase.

There might be there might be metrics that are floating around that you don't know how they relate to each other. By the way, they may or may not have a relationship to each other, and they might even be unknown unknowns. Right? Things that are related that you don't know yet. I you know, if the way we are the the way we think about that is that, you know, you can definitely

the best that humans can do at this point is, you know, you know, put them on a, you know, tool and try to run, as you said, correlations and regressions and see if there are relationships there. If you can come up with a tool that automatically figures out causality, then I think we're we're approaching,

we're approaching AGI. Right? Because you basically can have

a a metric tree that models the business, and there is this, this machine that's basically figuring out what is driving the business and starts moving the levers automatically. So that would be an extremely exciting, time to live in, but I don't think we're anywhere close to that right now.

So yeah. So while Matituaries can sort of

hint at causality, I mean, as you said, we're very much at this point focused on modeling the known relationships. And the best we can do for the unknowns is to try to regress and come up with some some metrics, for lack of better word. But it's not, at this point, like, a tool where you can automatically just divine causality. Right? Because that sounds that's that's incredibly hard problem.

There's definitely no magic bullet, as you said. And so from what I'm gathering of your response,

there's no definitive way for you to be able to intuit those correlations.

But building on top of the core principles of metric trees and their definition,

you can create the relations

as you go through the process of defining how they fit into the overall business processes. But I'm wondering if maybe there are some some some of the primitives in terms of how you think about the definition of metric trees that maybe conveys

some of the level of uncertainty or the level of a lack of a strict deterministic

relationship

between those two metrics that can be reflected in

the resultant

analysis or the resultant

display and presentation of that information.

Right. You just reframe to the fact that we wanna be careful not to imply a relationship when the relationship may not be there. Right? Yeah. I mean, this happens a lot even in when the financial team builds a financial model. They'll just go in and say, what if I move this cell?

The output metric changes because you've expressed relationship, but is that actually

valid at all times? It may not be valid in the future. Like, if you throw more money on paid marketing, it doesn't mean you're gonna grow revenue proportionally.

You expect the conversion rates to go down. Right?

So I think what is probably

what you may be hinting at, I would have I would think about it differently. I don't think it's that the relationship is I don't think the relationship is completely unknown,

and it's completely confusing or fuzzy. I think it's that the relationships are changing over time, that there may be boundaries within which they apply and boundaries where they may not apply. So if you think about the example I was saying,

as you spend more money, maybe you expect,

you know, another metric to go down, and then there's a net result of that, which is your cost of acquisition. So I do think we can model those kind of concepts in the product. Right? We can say, okay. We know that these two metrics ladder up to this output metric, and we know that, but these two metrics tend to go, in the inverse direction that as you spend more money, the other one goes down. Right? But is that by the way, is that true for every company at every regime? Probably not. This, you know, you know, in in some regimes, you probably can spend more money, and you probably still can grow successfully at the same at the same profitable rate. So there's definitely some contextual business, contextual aspects of this that are not as straightforward

in just the

the the the the mathematic expressions itself.

But the concepts that these metrics

can behave in certain ways in relation to each other can be expressed.

But the key is to get the the relationship model in the first place, I think, because that's what's not happening today.

I don't think we're expressing relationships as explicitly on paper

for the organization to align and have a discussion around that. And this metric is just a is just a way to force that to force that conversation. And I don't even mean that for for things that are related like, cost of acquisition and and

conversion

rate. I mean, even for metrics that you think are

related. Right? If I improve my customer service response time, I'm gonna drive retention

of my customers.

Those are too fuzzy, you know, it's it's it's not there's no mathematical relationship. It's sort of implicitly there.

But people will talk about it. But who actually goes in and and, takes those two metrics and runs a regression today actively?

Not a lot of companies actually. You'd be surprised. So a lot of the discussions happen in theory. They say, hey. If I'd I think this will be affecting that, so I'm gonna go work on this. But you don't actually express it and try to run something to validate or or invalidate it. So, really, I just think of this ultimately as a way to just start forcing these conversations, putting these things on paper. Let's actually see if there's a regression even if it's not causation.

Let's see let's see. Is there even a correlation between customer response time and retention rate? Right? Let's actually prove the you know, let's see what the data says. So, yeah, that's that's kinda how I think about it that this is just a forcing function to align the business and the execs and the data teams around the metrics that matter. And then the actual analytical functions and the causality and those hard things will just follow over time.

That brings up an interesting

question too as far as the

behavioral impact of having metric trees as a

more

declarative representation

of the

cause and effect of different business activities

and the ability to be able to do some of that analysis

and also maybe how that fits into some of the

patterns around experimentation,

both in terms of technical systems as doing things like AB testing of different features,

as well as organizational

practices of saying, hey. We're going to experiment

with maybe

changing the allocation of

our salespeople and the regions and how we define those regions and seeing how that impacts the overall outcomes and

maybe

forcing that into

before you embark upon those experience,

maybe doing some of that definition of, okay. Well, we think that doing this activity is going to have this outcome. But before we

bother making that change, let's go ahead and

do that definition of these metric relations so that we can get a more clear before and after picture and see

what is the actual causation

and not just correlation.

That's an excellent point, actually. Right? Because the bleeding edge

companies that are operating on data who are, let's say, running and they and they can afford to run experiments at scale, do think about this all the time. Right? What is what is driving

what is driving what? Can we isolate the impacts? Can we run an AB test? And

we're, you know, only a small select group of people are doing that today. And I think,

in in a sense, the met I'm hoping the metric tree is a more is a simple but still a more accessible

and a more intuitive way that it sort of

it's able to,

it enables the business and the data teams and exec teams to have this conversation.

And you don't necessarily need a cutting edge data science team doing experimentation for that. Right? It just becomes part of how they talk about everything

as opposed to just give me this dashboard.

And in your work of

building

the

in your work of

building a business focused on metric trees as this core

data asset and

encouraging

its use in organizational

exploration of their practices,

What are some of the most interesting or innovative or unexpected ways that you've either seen metric trees used or implemented?

Yeah. I mean, when we when we started working on this, I of course, we knew that the the the most obvious metrics would be time series metrics like revenue and users and whatnot, and that is still the case. But we have, built a bunch of customer journeys, for lack of a better word. You you take a cohort and you track how they evolve over time

in order to ultimately, you know, ladder up to some lifetime value or just like we're modeling all the critical junctures of a customer

and what the business actually wants them to do, right, versus what they're actually doing, which if you think about it is actually,

really a very pragmatic way to think about levers. Because, ultimately, you're you're you're you're building something to

to facilitate a customer's existing demand or try to change their behavior on the margin or, you know, radically.

So I I found that to be a pretty

interesting use case of building a tree is to track a customer journey over time, which I you know, we you know, you know, as businesses, we've all done those, but to see that in a tree format is pretty powerful.

And as you have been digging into this ecosystem

and building a a company around it, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

Well, yeah. So I I think, honestly, I've been surprised by a few things. One is because I was a practitioner myself for a decade.

And as I talk to a lot of companies now, I am surprised with all this focus on data and the investments that have gone into it. I'm surprised how many companies are still early in their journey on

metrics maturity, for lack of a better word, knowing deeply how their metrics connect to each other and what the drivers are, what the hypothesis around the drivers are. I'd be surprised by that because I think it sort of reflects my original point at the very beginning is, like, so much focus has happened on, can we ingest all the data? Can we put them in a place? Can we make it accessible?

And I don't think enough time has been spent on

what is the what is the business model. Are we capturing it? Are we understanding the drivers? And I don't I'm I'm I'm sounding something I'm saying something very obvious, but I've been surprised by that conversation that we're having with organizations,

which has not even been a function of the company size or scale. Right? Even large companies,

I'm finding that to be the case. The other thing that that that is that that is surprising me is, like, you know, I I'll come across two companies that are very similar to each other. And they will talk about some overlapping metrics.

But they do think about these metric trees and these templates

quite differently, which I found fascinating as well. It's pretty much the same business model, and And I can clearly see one company is doing a better job of expressing it than the other one. You know? I can see the differences.

So all this to say is, like, I've been sort of just generally surprised at the maturity curve and the standardization, if you will, of metrics and how people think about it and because that's the lifeblood of data. Right? That's what data is really for, is to capture the business. So which just means there's a lot of opportunity out there to for organizations to level up and really,

you know, spend their time on this problem and also level up in terms of,

building your metrics in a way that actually is best of breed for your business model. And it's not a discussion that happens a lot, publicly. Maybe maybe it's because companies believe they might have an arbitrage opportunity here. I don't know. But

that part of the that part of the equation is just not discussed as much as we discussed what database we're using and what, you know, what BI tool we're using. Right? So that that's interesting to observe.

Yeah. I think it's also interesting

as you're talking about the

ways that the businesses, even if they're effectively

doing the same thing, are going to reflect their metric trees in different ways because they think about their business in different ways.

And the

lack of maybe

standardization or componentization

of those metric trees, it also brings to mind the work that, for for instance, companies like Segment were trying to do or are trying to do in the realm of things like customer data platforms where

you say, oh, you're a business

that is working in ecommerce, so we're going to presume that you're using

Shopify and HubSpot

and Marketo or whatever. And so we're going to standardize the way that we ingest those datasets

and the way that we generate the Skibas. And so we're going to try and create a one size fits all solution

for this segment

of a a business or the style of industry

and how

that over the many times and many versions of that effort that have happened over the past several decades, it never actually quite works because everybody has a different way that they're thinking about their data. Maybe they're able to get some initial value from that out of the box standardized

set of visualizations

and reports. They always wanna ask different questions than their neighbor who's doing the same thing. And so it's just an interesting

perspective and an interesting insight into the fact that

even if it's the same business, it's not going to operate the same way.

Yeah. I think that's actually a really good point. So is the the question really is, like, is it because they have different

strategies

and tactics

to drive what they're doing, that the way they think about how the metrics connect each other is different? Or is one of them doing something wrong, then the other is doing it right? And I think this is sort of the interesting question. And this is obviously independent of the data sources in which I think you were talking about. You know, can you standardize the data models? Because that can be that I can see being, being very complicated because there are edge cases. Right? You know, you need every you need two customers in the same business to be using exactly the same data sources end to end, and it's always the tail that gets you. Even if you Shopify for your main reporting,

there's probably this additional tool using for customer service that are the other company is not, and now you that whole premise that you can build a unified data model sort of breaks, I think. At least that that that's my 2¢ on it. Whereas on the consumption side, I'm actually intrigued. Like, why is it that two companies in the same business are

thinking about it differently? And maybe it is the fact that they have different approaches to how they wanna grow, and both might be valid for the phases of the company that they're at. But yeah. But I will definitely be surprised by seeing but I also will say I also see a hierarchy for sure. I don't think it's not necessarily a and b are different. I do see some companies are much more mature and thoughtful. They've thought about how their business works in a much deeper way than the other company. And I, honestly, I don't think that has anything to do with tooling. I think it's just the I I I think it's a people thing, honestly. I think it's just the individuals there are I've been thinking about this, and they are thinking about things at a deeper level than the other organization for whatever the case might be. That would be my humble hypothesis because I don't know if you have any other thoughts on that.

Yeah. I mean, I I think it's also maybe just a a factor of people wanting to be contrary and not being pigeonholed and saying, no. Even if you have these out of the box reports and maybe they're useful, I wanna think about it differently. And so I'm going to do the work to customize it because I have my own opinions about it.

Right. So we're creating work to to give ourselves value is what you're saying. Okay.

And so

as people become aware of and explore the idea of metric trees, what are the situations where you would say that that is the wrong abstraction

and they should just lean on

standard dimensional models, or maybe there is some other approach to determining that causation

or determining the appropriate context for a given business process?

Yeah. I mean, wrong might be a strong word, but I've definitely come across

conversations where it might be either overkill or just fundamentally not interesting or applicable. Maybe that's sort of a better way of thinking about it for me. Like, a few weeks ago, I had a conversation with someone who

has a early stage start up, and they are running, like, a few 100 projects. And these projects go through complicated steps to lead to an outcome in the end where they realize their revenue in the in the physical world. And they were interested in the idea of modeling that through metric trees so we can identify bottlenecks and optimize the process.

Yes. We could do that. Right? We could do that. That's certainly doable. But my initial my my gut reaction was like, you have a 100 projects, just put that into a spreadsheet and pivot your way to glory. Maybe I was wrong, but that was my gut reaction. It's like, you know, do you really want to build a metric tree? And maybe there's more reaction of the effort it takes to get something up and running maybe. I don't know what it is, but I felt a gut reaction that maybe it's overkill right now for you to do this. In terms of, applicability, I think it really comes down to, I think, the business model itself. Right? I've definitely come across business models where,

let's say you're highly partnership driven and you have a few deals that drives your business. It's still not the best use case for even metrics in the first place. Right? I mean, you know, even metrics are not that useful fundamentally.

And what are you tracking really? And consequently, metric trees and metric layers are also not very useful. So you do need, I guess, some kind of some level of data volume and some level of, breadth and complexity

that the business can be measured through data for it to start becoming useful. And that's where I think, like, you know, I've come across cases where it's just it's probably overkill or just not applicable at all.

And as you continue

to

explore

and

popularize

and define this concept of metric trees, what are some of the things you have planned for the near to medium term, either just as far as metric trees and how to think about and popularize them or the work that you're doing at Trace to make them operational?

Yeah. I mean, the the I I I have to go back to this formulation I had of these analytical functions. So a big vector in our road map is to keep

building out these functions and these use cases. It's one thing to do retrospective trend analysis. It's one thing to then do how am I doing versus my budgets. Right? It's a different kind of analysis, and you're doing a variance analysis.

It's one thing to then compare how is my channel a and channel b doing. It's one thing to report on experiment

outputs and a b test output. So we're really focused on these analytical functions or workflows or what, you know, what have you. The other vector, which I'm genuinely

excited about, not because everyone's talking about, is the use of AI agents.

I

keep thinking that if we have this abstraction

that captures a business model and it's already hydrated through data, it's already sitting there, and I have these functions that I know people want. What I mean, it seems like the perfect tailor made case for an agent to figure out how to combine them on demand for a business user's request. And maybe in the future, even connect to some activation tool, right, as you mentioned. Maybe it leads to some action and some tool,

you know, and that can all be orchestrated in a layer

where you no longer and and in the long run, maybe there is no such thing as, static dashboards and everything is,

a fluid

system of agents figuring out the business drivers and starting to push the users into taking action. So that's a super exciting, obviously, vision that I can paint, but the making it reality is gonna be incredibly hard. So,

we are starting to take baby steps in that direction as well.

Are there any other aspects of this overall concept and application of metric trees that we didn't discuss yet that you'd like to cover before we close out the show?

Yeah. Maybe the one, aspect that sort of jumps to me is,

is I do and I I I may have made this point earlier. But when I do talk to folks in the in the wild who have never heard this term before

and I start describing it, their first reaction is like that it that is that is a visualization.

Right? We're gonna draw some boxes and you can do that in a Canvas tool. You can do that in Miro or Mural. You can draw boxes.

Okay. Now let's say the boxes can be hydrated with, some data. Maybe you can

by the way, I've seen people do that in spreadsheets. Right? You just draw some boxes, and then in that cell, you have a SQL query running or something, and you're you you have some data, you know, and some data populated.

And it sort of they can do that, and then they ask them. So, like, okay. Let's say I do that. Let's say I visualize it. That's cool. Now I can have a dashboard of dashboards. I can see them all in one place. But then they start to hit the question of, like, what is the value? Right? What is the long term value? Is there sustainable value? These are all very important questions.

So I just I do wanna make the point again that

our realization

is that

the visualization

is

interesting and

useful as an alignment vehicle.

But I think the real value is going to be when that can serve as a back end to a system

where you can actually operate on them and you can actually build workflows flows around them. So that's kinda what I'm really excited about. And I think that's not super obvious

in the discourse around metric trees.

It's just seen as like, oh, one yet another visualization.

We have tables, we have charts, and now we have a graph. And I'm like, yes. That is true on the surface

superficially, but I think there is a lot more to it and there's a lot more complexity

to making that real thing into the organizations, and that's kinda why we're doing what we're doing. Otherwise, you know, why would I be building a company around? So yeah.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management

today? Well, I mean, no surprise here. I I feel that we have spent enormous,

effort and will continue to on how we ingest data and clean data and build pipelines and monitor them and maintain them. I think,

we to me, I hope the next decade is really about how do we activate, which I love the word, by the way. I think I'm gonna I'm gonna borrow that. How do we activate

data? How would we make it operational? How do we make it useful? Can we build an operating system for business folks? Right? So it's not just self serving data, but they can actually do workflows around data.

So that's sort of the the the thing that I think is the biggest gap. That's why companies

spend they build data teams and they invest millions of dollars and the execs feel still frustrated, honestly.

Because I think that's the gap that I don't think people are able to quite articulate.

And and and I also think the problem is interpreted correctly. The execs thing, like, we built all this thing. Now data should just tell me what to do, which is not how data works. Right? It's it's it's a reflection of reality. It's not a it's not a cause is it? If in the magic world where it can, causally find the drivers, then you're reaching AGI. Right? So I don't think that's data's job today. I think it's to reflect the business and help you be smarter and more thoughtful in how you operate the business and the levers. So I think that's the biggest gap is how do we make it useful and operational where people feel the value day in and day out. And and, honestly, at a point may maybe where it becomes almost invisible. Can we can we aspire to a world where

you don't have these dashboards of numbers and cells floating around and it's just invisible in the way the business operates? That's that's that I think is sort of the goal I want us to work towards. I mean, is it achievable in in in our lifetimes? I don't know, but that's what I wanna aspire to, where data feels as invisible as software and AI or whatever, you know, whatever framework du jour that we're dealing with today.

And I think too on that point of breaking out of the static dashboard mindset, I've also been seeing

some recent work as far as moving that analytical

and question asking and answering workflow into

more of the communication

platforms

of the company where

the data analysis is happening

using

agentic workflows

in the context of Slack where you can talk to an agent and say, hey. I'm curious about the sales numbers from last week and how they correlate

with the

number of calls that my salespeople made or what have you. And then the agent is able to,

using some of these contextual

cues, such as the work being done with metric trees,

retrieve that information,

generate an analysis, summarize it, provide a visualization

on demand

in a fashion where

it also invites other people in the business to be able to participate

rather than dashboards being more of a single player option of everybody can look at the same dashboard, but they're doing it in isolation unless they all happen to be in the same room at the same time talking about it and just bringing that more into a contextualized

and conversational

workflow,

I I think, is an in interesting evolution that we're going through now as well.

Totally. Yeah. I I I think that's a great way of thinking about it. I I think that you wanna go away from static dashboards and you want dynamic

on demand computations and insights.

You do need that. You do need a back end that is going to be powerful, and the back end has to understand the business. It has to understand the analytical operations or functions as the tool I've been using internally in my company. And if all that come together, then, yeah, then the UI is just a slick interface where you ask us you ask a question and you get back a very powerful answer that you don't have to dig five levels deep to, to understand. So yeah. You're absolutely I mean, I'm I'm seeing it happen as well. I think that's probably how it'll eventually morph into, and you can just then tell your chat system to go do something as well. And then that go it goes and does something in response to that. So

Absolutely. All all exciting stuff. But we have to make it a reality, and we have to make it reliable, right, that actually can do these things correctly.

Well, thank you very much for taking the time today to join me and share your thoughts and experiences

around this idea of metric trees and the work that you're doing to help support them. It's definitely a very

interesting

new

addition and a new style of data assets that I think will be very valuable. So I appreciate all the work you're doing on that, and I hope you enjoy the rest of your day.

Thank you so much. It's a pleasure to be here and chatting with you.

Thank you for listening, and don't forget to check out our other shows. Podcast.net

covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI

systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@dataengineeringpodcast.com

with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast