Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linotetoday.

That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads?

Hi touch is the easiest way to sync data into the platforms that your business teams rely on.

The data you're looking for is already in your data warehouse and BI tools.

Connect your warehouse to Hi touch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems.

No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for reverse ETL today. Get started for free at data engineering podcast.com/hitouch.

Your host is Tobias Macy. And today, I'm interviewing Serge Huber about Apache Unomi, an open source customer data platform designed to manage customers, leads, and visitors' data and help personalize customer experiences.

So, Serge, can you start by introducing yourself? Yes. Thank you, Thomas.

So I'm Serge Huber. I actually have 2 hats. I'm the CTO of a DXP company called Jaya,

and I'm also the project management committee chair of the Apache NumPy project

at the Apache Software Foundation.

That's in a nutshell. And do you remember how you first got involved in the area of data? Basically, we were a CMS vendor to start with at Jai.

Very quickly, we realized that a lot of people wanted to have data associated with what people were doing on our systems. So we started looking into that and trying to figure out what the best way would be to sort of associate

data and content.

When we started looking

at things that were out there,

to see what we could integrate with or if we should develop our our own thing,

And we didn't find anything really compelling, so that's sort of how

I got started there. And we started 2 projects. We started first a specification

around customer data exchange.

So we did that at the Oasis Open Foundation.

And there we did initially, it was called the context server specification.

So we were around

data, but we called it context because we were in the context of the web.

And we're realizing that there was an emerging field called customer data.

So that's really how I came to it and got involved with this.

The other idea was also to do things, in an open fashion,

not using closed platforms or closed tools.

In terms of the Younomi project, I'm wondering if you can give a bit of an overview about what it is that you're building there and some of the story behind how it got started and some of the early decisions that you had to make in terms of what the capabilities

needed to be and how you wanted to architect it? So this that actually ties in pretty well with what I was just saying. We're basically

where we were looking for a way to integrate data with our platform,

but we're looking for open source tools because we're always based on open source.

And we found a few things, but nothing that really fit the bill.

And so the idea was we work on the specification,

and the specification was really about trying to figure out a standardized way to access data and, you know, push it in and pull it out. This was pretty abstract and I was getting very confused by it, to be honest. So I started hacking around and that became the basis for, the You Know Me project,

which started as a project at our company but very quickly we realized that it made more sense to do this in the open. And

as we're dealing with data that can be sensitive, that can be

specific

to visitors, I mean, you have people that are coming onto your systems,

and you know more and more about them.

We were really early in the concerns of data privacy,

data management,

and how to actually

do something in the most transparent way possible. And that's where the idea of doing this in the open and then contributing to the Apache Software Foundation

came to say, we want full transparency on what's going on with the data. And the best way

do that is to have an independent project that shares the full source code of what's going on with that data.

And, yeah, that's the basis of the Younomi project.

As far as

the overall aspect of customer data, there have been a few different systems that have branded themselves as customer data platforms, possibly the most well known being Segment and then some of the followers on, like, RudderStack.

And I'm wondering if you can just talk to some of the kind of goals and target use cases that you're focusing on with Younomi and how that might compare to what the segments and Rudder stacks of the world are doing in terms of their kind of system design and architecture?

Yeah. That's a very good question.

So,

I mean, you know, we really started as a tool to fit our needs. You know? And our needs were pretty basic. It was to have something that was capable in real time, you know, collecting events

behavior

of visitors that were coming into our system. And so in a way,

from that starting point, it's very similar to what Segment or Roadster

are doing. But the biggest difference was that we quickly expanded that to be

agnostic to the incoming system. It doesn't need to be a a web platform. It could be native mobile application. It could be a a CRM system. It could be anything

that can talk to, you know, me's API

to provide

context and information about

who the user is,

what he's doing,

and how we can personalize this experience and provide more better data sharing between the systems.

So very early on, Younomi became a platform to

unify

all the data that's inside of different platforms.

And we really focused on having something based only on an API first

so that it would be really easy for people to push things in and pull it out.

And the other thing, the data model is,

I would say, more opinionated than most customer data platforms,

meaning that it's not

very restrictive. It can do a lot of things, but it does have a few basic objects that make it pretty easy to understand what's going on. So we do have, like, profiles. We do have, like, events,

sessions,

and then, of course, any kind of other data you need. But so, yeah, I think that's basically

the difference, but there are similarities with those platforms as well. In terms of the

types of information that you're collecting and some of the aggregations that you're doing for building the profile information.

I'm wondering what are some of the challenges that developers often experience when trying to do this on their own without the support of a dedicated platform such as Younomi

and some of the ways that the design of Younomi helps to reduce that burden and allow developers to integrate that information more fluidly and more seamlessly into the applications that they're building

without having to spend a lot of cycles on building it all from scratch.

Yeah. Again, that's a very important point because,

I mean, the same way Be Built You Know Me to serve our needs so, you know, we wouldn't have to do this from scratch. I mean, we did start the project, but 1 of the ideas was really to

help the platform improve

with more people using it. And so the initial idea, if you know me, was to make it flexible enough

so that it could address a lot of different use cases.

And at the same time, that it would be

easy enough to understand,

integrate with so that it would be compelling in terms of not having to rebuild something from scratch,

for the future needs. And this ties into 2 examples of, usage of, you know, me that really surprised me. 1 of them was a company that was actually

already doing event collection

and had their own way of collecting events,

but then they realized that the way you could use Yyou know me to analyze those events

and build real time personalization

was more powerful. So they choose to do that, but you know me had some problems with doing that because they were doing something very strange with events. And most people consider events as

unmutable data. I mean, data that's time stamped and, you know, stored and never touched again. In their case, they were actually modifying the event data over time.

That was not a use case that was, you know, compatible with Younomi at the time, but they contributed to the projects, a session that made it possible.

And the idea of the project is to have that flexibility and to be able to adapt to use cases that might have not have been anticipated, and that way you can address a lot more people.

And another use case that was very interesting was people

actually not doing this with customers, but with patients.

So we had an application where people were using Unomi to track people that have addictions,

you know, like addictions to drugs or stuff like that. And these people you have to keep track of, otherwise, they will relapse and stuff like that. So they were using Younomi in that way. And, again, it's something that I don't think, you know, people would have done if they had to start from scratch or build something based on another platform.

As far as the

sort of architecture of Yonomi,

I know that in terms of the sort of system requirements, it's actually fairly lightweight where lot of systems such as RudderStack and Segment require you to have some sort of cloud data warehouse in place. You know me instead orients itself around Elasticsearch as the kind of core architectural element. And I'm wondering

what the

thinking was in terms of the design requirements and system requirements and the target user group for Younomi

that influenced your decisions

to focus on Elasticsearch as that core storage location

and maybe what the kind of technological landscape was at the time that you started it that also fed into that decision.

Yeah. So that's a decision that was quite controversial when the project started.

Because when we started the projects, as I said, the early project was, you know, just something that I was

trying out as ideas. And using Elasticsearch as a storage system

was, you know, dubious to say the least at the time. But, actually, over time, it's proven quite reliable, and we have a lot of people using this in production

now. But, of course, Elasticsearch

has greatly improved since the project started. But the objectives were always sort of the same. The system had to be highly scalable.

The idea was really to make sure that we had targets. I mean, I can't say that it's always easy to hit those, but the idea I have in mind is always to be under

a 100 to 200 milliseconds in terms of full request time. And in order to make that scalable,

that means that your whole platform has to be able to scale so that you can always compensate for the extra load and not have bottlenecks,

be problematic.

So that's why this very simple architecture was put in place because

from previous experiences, if you had, like, state management,

node to node communication,

these types of technical

challenges, that makes scaling and node independence

very difficult.

And so Elasticsearch did a pretty good job of implementing things like that. We were quite happy with that, and so that served really the purpose of having this back end that was, you know, high performance and scalable. And then Younomi added on top of that the the the rule system that was needed to offer the real time personalization

and segmentation.

So digging more into,

you know, me itself, I'm wondering if you can talk to just the overall system design and the capabilities that it provides and how it might fit into an existing

application architecture or somebody's data infrastructure

and kind of the

designs and decisions that have gone into sort of the API specification and how you think about sort of where to deploy it and when to use it?

There are different parts of it. But if I take, let's say, the use case of integration

with a web based system,

So where you would start is we have endpoints in the API

to collect events

and to also

track users or visitors. And so, basically, the main endpoint will be the 1 that's basically taking those things as input

and keeping an identifier and building

progressive profile

of the person interacting with You Know Me. At the same time, we have a system where you can actually

send

queries to You Know Me to know if this user matches

a specific set of conditions,

and that's used for personalization.

So, basically,

you can say the the basic use case is, is this user from a specific country? I mean, a really very simple condition. But you could also have more complex ones where you say, has this user come

in the last x days and visited

this page and

whatever? You can build really complex things. And that way then you can build personalization based on that and you can access profile data at any time.

So that's what, say, would be the basic API for real time personalization,

tracking,

and event collection.

Then

at the core of, you know, me's functionality

is the rule engine.

The rule engine is basically a rule is a very simple

you have the, condition,

like when

this condition, then you do a set of actions.

And conditions always react to events.

So basically, if you, for example, you have an event that comes in, so a user has clicked on something

or he has viewed a specific screen, if you're in the case of a mobile application,

or in a support system, he has opened an issue or whatever the event is,

you can associate it with a rule.

And if that rule's condition matches the incoming event, you can then execute actions.

You know me has an extremely,

flexible

system of conditions and actions.

You can build your own conditions that you deploy to Unomi and as well as the actions. And so Unomi comes with

built ins, conditions and actions, but you can provide your own. And this rule system is, you know, a very

basic thing to sort of explain,

but it's incredibly flexible, and the rules are already executed in real time. So as an event is coming in, it will immediately trigger

rules.

Those rules will perform actions in real time. So an action could be update the user profile with a specific property,

or it could be

call a third party system,

for example, a CRM,

and see if the CRM has any information about this profile,

then pull that data back and perform any action you want with that. So the rule system can serve as an orchestration system,

and it can all happen, you know, dynamically.

And, of course, other usual services provided by CDPs such as segmentation

are, of course, also built in. And there's also, of course, import, export, and there's, all kinds of other services that you find in typical CDPs.

Now 1 thing I should mention

is there's a lot of confusion around the CDP term.

I would describe, you know, me more as a CDP engine

than of what a lot of people consider as a CDP because Yonomi does not have a UI. It's an API

based platform.

If people are looking for CDP as a full based product that has, UI, that's not what you know me is, at least not right now. Given the fact that it is API only and doesn't have that UI element to it, it's definitely

very much a developer centric project. And I'm wondering how that focus has influenced

some of the

kind of feature

decisions and prioritization

as far as engineering effort and some of the

limitations that it imposes

in terms of where you might want to use it? That's where the community comes in, I would say.

The nice thing about having a project at Apache is it's very visible, I mean, the software engineering world. So people will pretty quickly discover the project and then come with their use

cases. So they've sort of done the work already at that point of, you know, saying, okay. Does this look like it's in the sort of field of my needs and will it match or not? And then that's where they will start a discussion with the other community members and say, okay. Can this work or not? Do we need anything

that's not there? And, you know, that's something that works pretty well. 1 of the things that's pretty surprising about the community is that there's a huge silent community that we still want to talk to a little bit more because we just had our 1st meetup,

and we had a lot more people show up at the meetup. But we have in our Slack channel

over a 150 people that are constantly

in the channel

and that are mostly just, you know, not very active, but they're clearly following the discussions and things. So there's clearly a community of people, developers, of course, that are using the system, that are interested in it, and that are contributing,

you know, to their own abilities.

That's definitely something that's happening.

And now 1 of the challenges is that Younomi being a Java based platform, there are a lot of people using Younomi

as an API, so they don't need to know any Java to use the system. Now if they want to contribute extensions,

that's where it becomes a little bit more tricky if they don't have experience with Java.

But we are working on that, and we are, for example, adding in the new versions, possibilities to, for example,

build actions using other languages such as Groovy.

I haven't seen somebody come to me and say, you know, I don't know how to use this. That's a rare use case, rare scenario. But we do have a lot of people saying, okay,

You know, I have this usage.

Will that fit well or not? And in the case where I describe with the people doing these mutational

on events, that took a little bit of work, but it worked out in the end. In terms of the

kind of use cases for Younomi, Me, given the fact that you are building up these customer profiles and you could potentially use that information that's aggregated across these series of interactions to feedback into your application, to

tailor the experience, maybe provide some recommendations to the user or

help to remember some of their preferences, maybe across a suite of products that exist within an organization.

Wondering if you can just talk to some of the types of

user experiences that can be powered by the You know me platform in ways that you might tap into that as an application developer to pull in maybe some of those recommendations or use that profile information to enrich the context of the application that the user is interacting with?

Again, I know the web case is the best because that's 1 of the ones that I see the most often.

But for example, you could tie in,

I mean, any kind of information

coming either from behavior. So if you have the user interacting with the system and he's, you know, viewing content or editing content

or, you know, he's putting stuff in a cart,

the ecommerce scenario.

For example, the ecommerce scenario is interesting because

there you have another system, which is the ecommerce system, and you could, for example, say,

if you see this visitor coming back and you know me's building this progressive profile,

you could have you know me actually query

the ecommerce system to say, okay. Can you give me more information about, for example, previous purchases?

That way I could either push some content to him or recommendations

or whatever. In the case of, you know, sales scenario, you could also do the same thing instead of an ecommerce back end. You could have, like, a CRM such as Salesforce.

You know me does have a Salesforce plug in that's already in the project.

And in the case of Salesforce, you can just say, okay. I've identified this person

with this email.

So do you have somebody in the system with that same email so that I can augment the information that's inside of you know me? For example, if there's any kind of indication from the CRM about

this person is hesitant

to do a sale or something like that, then I could, for example, build a personalized

experience for this person using Younomi that would be based on the information coming from the CRM.

Now if we look at

some of the scenarios from other users of Yonomi,

in the case of these people that were using these events,

In their case, what they're doing is they're actually in the business of building reviews

on products.

And so they're using, you know, me to study the behavior of people on reviews, what they're doing, the type of language they're using in their reviews, and stuff like that. And that could be used from anything to detecting spam,

to actually suggesting

more reviews based on the products that they buy, for example, if it's tied in with the ecommerce

system. So you can really with,

I would say, you know, very little amount of work tie into a lot of systems. You know me as more dependent on external systems having,

providing access

than the opposites. Usually, you know me as an open book mostly,

and so you will be able to do, whatever you want with the data that's in you know me. But if you're trying to get access data from other systems, you'll be more dependent on what those systems let you access.

As far as the

ways that people are maybe building up these different user experiences, if they're not already using Unomi, what are some of the alternative

infrastructures

or

architectural and software patterns that they might lean on to be able to get a similar outcome for their end users?

Well, we do see a lot of people building their own data solutions, if I could say that,

ranging from many friend

from log as an analysis

or behavior analysis.

But I think that sort of ship has sailed.

Now with you mentioned RudderStack. You mentioned Segment.

There are other players

where, basically, instead of building the whole thing from scratch, you will look at at reusing things.

But I still think there aren't that many purely open source

solutions that will help you. So

you could, of course, also use, you know, traditional CDP vendors to address that. But then again, you will be dependent on what those CDP vendors

give you access to because some of them might be more open than others,

and so it could be problematic.

1 of the worst

to my knowledge

in terms of data access

is people using stuff like Google Analytics

because you have very little access to raw data there or you have to pay

I mean, I don't know the exact prices, but you have to pay quite a large amount of data I mean, a large amount of money to get access to the raw data.

So

1 of the things I tell people is be careful

what data you're giving to

3rd parties

that will sort of lock you into the platform.

Because

once you've sort of gone that direction,

kinda lose control of that data and you become dependent on what that

provider gives you access to or how he monetizes it. So that can be a challenge. But, yeah, you don't have to start from scratch.

You know, we use you know me ourselves,

so we don't see that many other systems being implemented. Although, on our customers,

we have seen people, you know, coming with other systems that they have right now, but most of them

are not already on CDP platforms. They're more on CRMs

or analytics platforms like Google Analytics, something like that. So

CDP is still pretty new in terms of adoption.

To your point about the raw data access and how it might be limited or nonexistent if you're using a third party service, I'm wondering what you see as some of the important capabilities that are enabled by having that access to the actual raw events to be able to maybe reprocess them or add different logic or do multiple different aggregations across it? I would say I think the sky's the limit because

we've identified some of those over the time that we've been working on it. But 1 of the biggest issues is privacy

and regulations now. You have to have,

especially in Europe with the GDPR,

you have to have control over that data in some way or

you have to ask the user's consent.

But the problem is that, basically, as soon as you lose control of that data flow,

things can become very fishy. So that's 1 problem. The other problem is analytics, basically, where you want to, you know, do other types of analytics

than what the platform offers by default. 1 of where the way I explained that is if you're using Google Analytics to do analytics,

basically,

every single person has the same analytics.

So

you can't use that as a competitive advantage.

And that's a big problem because, basically,

if you cannot use

data

freely

to actually build a competitive advantage, then you it's a problem because you're missing

a lot of people now. Some of the most successful companies

use data analysis and data processing

as a way to be very competitive and ahead of the competition. I mean, even politicians do that. I mean,

a lot of people think that part a lot of the success of Donald Trump's election

was due to the data processing and the data collection and stuff like that. So I think that not having access to raw data,

yeah, is really cutting people off from a lot of opportunities and being able to do whatever you can imagine you could do with their data. And a lot of these cases might be, you know, things you identify at 1 specific point in time, but it they might also happen, you know, a few years from now. And so maybe, you know, like 4 or 5 years from now, you're like suddenly, okay, we would really want access to

this kind of event data or this kind of profile data.

And if the solution you've chosen doesn't give you that raw access, then you have problems. And, of course, the worst case scenario is a company going out of business if you have, like, a big cloud. I mean, that's I don't see that as a problem with Google, but with others, you could wonder, you know, if if suddenly their platform disappears, you know, what happens?

And so, you know, that's another case where it could be problematic. So

and from an end user point of view,

the idea was really to be transparent as possible. So having all the access all the way from production to the storage of the data

and having that flow be completely open makes it possible for anybody to review it, anybody to understand it, and so that you can actually say, okay. I know what's happening with the data. You know, we don't have to believe anybody. We can actually trace the whole thing. In terms of the broader integration

of Younomi into

somebody's

data infrastructure where because of the fact that everything is living in Elasticsearch and you can use it as an application developer, but a lot of other uses of that data might be in a broader data warehouse context or they might want to be able

to do some of the reverse ETL or operational analytics

aspect of pushing that data back into some of the sort of CRM systems or sales platforms, things like that. What are some of the patterns that you've seen in that regard for integrating Unomi into a broader data

user applications, but also have it be a first class citizen within the broader data platform and have pipelines that feed off of, you know, me into the data warehouse or other downstream systems.

We've seen people try

to do this, let's say,

the traditional way, which is handling batches of data

and basically, you know, saying, okay. I have this data either coming from you know me or I want to send it to you know me. And, you know, so from the data warehouse or data lake or whatever.

And

we try to sort of, you know, go back to the basic use cases of what people are trying to achieve in the end because

although it's possible, because with You Know Me, you can either talk to You Know Me at the API level, or you could directly go to

search and directly access that layer and do whatever you need there.

So you can say, I want to do these batches or these ETLs

or both into You Know Me or out of it or into Elasticsearch or out of it.

But I think that you know me is really good.

I think it really helps

more when you can do whatever you need to do in, I would say, close to real time.

And so a lot of the times, we sort of say, okay. So, you know, if you need to

have a copy of the data that you know me is collecting,

then either you know me can send it directly or in that case, we can look at batching scenarios.

But if you're, for example, saying, you know we need something from the data warehouse or from another system,

we would ask, is there any way that you know me can access that, you know, as quickly as possible in real time as the user is interacting with it? And although that does, you know, challenge some people sometimes because they're not necessarily used to working that way, in the end, it provides these experiences that are really compelling,

because you can really as I mentioned, for example, the cases of ecommerce

or support or whatever, we could imagine events scenarios of fraud detection, things like that, where you do have lots of data that's being processed, but and you know me will then be able to pick that up and immediately react upon it. So I think that's some of the use cases that were the most compelling. But as I said, we also have people that are, in the case of the reviews, they are actually using Kafka to inject batches of events into You Know Me, and then You Know Me is just processing that as it comes in in in batches and uses that to augment the profiles and stuff like that. So

it's still possible, but I don't think that's where the biggest benefit comes from.

In terms of

being able to have ownership of the data and use this information in various ways, some of the other aspects that come into play, particularly

when dealing with customer information,

are these

increasingly

prevalent regulatory regimes of GDPR, CCPA, some of the other privacy considerations

that might be, you know, up and coming. And I'm wondering what are some of the ways that You Know Me is designed to help manage some of those considerations

and some of the ways that,

people who are running You Know Me can easily

delete data for people who request that.

So that was 1 of the nice things about having a project that that

independent and open is that as the GDPR was coming into effect, you could actually see all the companies getting really nervous and

many things. And for You Know Me, it was pretty simple to actually directly

add the stuff that's needed

to, for example, anonymize data.

So 1 of the things you can say is you can say inside of you know me. Okay. These properties are personal properties. So either they should not be stored or they should be erased upon you know, a request. So in the API, you can directly say, for example, delete all my personal information.

And, you know, me actually went a little bit further than that is you know me also has a built in consent API.

So you can directly store consents inside of the profiles.

So in a way, it's like a constant aggregator kind of thing where you can use Unomi and say, okay. For example,

you know, if you have third party data that's being sent to 3rd parties, you could then have inside of the Unobee profile and then a consent that says, okay. We have this consent,

and it will allow us to send the data to the 3rd party. Also, the consents

were designed

to

fit with the requirements of the GDPR.

I mentioned GDPR because I mean, I focused on that 1 because since it was on the the early ones, they went pretty far in terms of specifying things. And 1 of the consents

and I'm still wondering how many people are actually compliant with this, is that consent is not infinite.

Consent has, if I remember this correctly,

has a maximum duration of 2 years.

I'd have to double check that. But the system that was developed

creates a consent. For example, if you grant a consent for,

having mail sent, the consent would automatically have an expiration date in 2 years from

now. Things like that.

So, yeah, I'm not saying it's perfect. A lot of the APIs

were done

very early on, and, you know, they are more

tools

than opinionated.

For example,

consents

will not directly trigger actions.

So if you change a cons you will actually need to set up a rule that says, if this consent is

granted or not granted, then you perform such an action.

But what that also means is that you can orchestrate

consent changes. So for example, you could say, I have a consent for 3rd parties. Somebody says, I want to erase my data from all these third parties, then you just need an action that will connect to all the third parties and say, you know, you need to get rid of the data of this person. So, you know, it's not

something that is set in stone. It offers tools, and I think it will help a lot of people, you know, deal with these regulations because,

you know, we've worked on being, you know, helpful for the GDPR,

but more and more of these regulations are gonna appear

because everybody's,

you know, building some form of these regulations.

Struggling with broken pipelines,

stale dashboards,

missing data?

If this resonates with you, you're not alone.

Data engineers struggling with unreliable data need look no further than Monte Carlo, the world's 1st end to end fully automated data observability platform.

In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines.

Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes.

Start trusting your data with Monte Carlo today. Go to data engineering podcast.com/montecarlo

to learn more. The first 10 people to request a personalized product tour will receive an exclusive Monte Carlo swag box.

In terms of the

customer consent and the information that's managed, as end users of a lot of these platforms, it can often be very opaque what information

is even held by these systems. And I know that 1 of the capabilities that's built into Unomi is the idea of a customer dashboard to be able to see what does my profile look like so I can either

request that it be deleted or request modifications to make it more accurate. I'm wondering if you can talk to

some of the security and design considerations that go into that and what kinds of capabilities

you

have make that easier to manage for people who are running Unomi?

Again, so Unomi is an API. I mean, it's not gonna directly have a UI that, you know, the end user can use.

So we are relying there on whatever the integrators are offering. But the integrators have the possibility to actually show the full profile to end users

should they choose to, and then the end user could, you know, make a request to say, okay. I want you to delete my profile. That's a very simple thing. And, actually, when you delete a profile, you can actually use the API to not only delete the profile, but actually go delete all the events associated with that profile. So

or something that's

acceptable in terms of regulation,

and that's where the user has to sort of choose

is say, I don't need you to to

erase all the events, but I do need you to anonymize all that. So that's some of the things. And you know me, I was actually modified

to

collect less

information than it used to initially.

Because, for example, in the GDPR,

the IP address

was defined as a personal identifier.

So a lot of the information that you know me was collecting stored the IP address.

So we changed that so that you were no longer doing it by default. You could still do it if you really wanted to, but by default, the system will try to avoid that. So, again, it's really offering a toolbox that makes these things much easier to do. We have these discussions a lot about, you know, what is acceptable, what is not acceptable to do with profile data and customer data.

You know, I personally

think that we should

explain

as much as possible

to people

what these systems do

and how they're built,

what they can get out of it, and what the company can get out of it. But the person doing the explaining is somebody you have to trust, and that person has to be able to trust the whole chain. And so, you know, me was part of that

sort of thinking

was

making sure that the systems

could be as trustworthy as possible. And then based on that, the trust could go all the way up to the end user so that he would be more willing

to use the systems

than not. So, you know, it's always difficult because,

you know, we're now seeing all these exaggerations

of people using too much customer data and abusing it. So it's always a challenge there. In terms of actually being able to run Unomi, as it stands right now, you have to deploy the infrastructure, deploy the application,

build the integrations yourself, So it can be potentially a significant lift for people who want to just get started with it. And I'm wondering if you can just talk to some of the operational characteristics of deploying and running Unomi

and what you see as some of the potential for it being able to be offered as a service either by an individual company or by various cloud providers or some of the ways that the specifications

that you're working on as part of the OASIS Foundation can

be integrated into other manifestations

of similar systems?

Yanomie has a Docker compatible deployment system now. So you can easily use Docker to set up, you know, Elasticsearch containers and, you know, me containers, and then deploy all that in a Kubernetes

cluster and things like that. So in terms of actual deployment frameworks,

you know, we're using sort of what a lot of people are using today, so that shouldn't be too strange.

Of course, when you're talking about what's built on top of that, that's another story. But before I go into that, I want to address the service provider. So, you know, Apache, you know, is completely independent.

My company doesn't own it. We contribute to it quite a lot, but, you know, it's really an independent project. So, you know, if any of the big cloud providers, Amazon or Google or any others wanted to provide, you know, me as a service, you know, there's nothing preventing them from doing that. And maybe there would be even an incentive at some point to to do that. If you know me, it becomes something that's interesting to enough people.

Now, of course, you know, a lot of people will need

additional

features or services.

That's where other companies can offer value on top of that, and that's 1 of the things that my company does.

So, yeah, that's sort of the different things that you can do, but I think that most people can use it, and most people already use it. For yeah. I should mention that there is a Drupal integration

also.

And so Drupal is completely different technology stack. It's been built in PHP.

And so you already have plugins that make it possible to use it with Google.

In terms

of the

work that you're doing with Younomi and how you're using it in your business, I'm wondering if you can just talk to some of the ways that you've used it in different applications that you've built or some of the capabilities that you're layering on top of it. Younomi has, you know, like, all the services that we talked about, but there are a few services that are they're not available in You Know Me. And for example,

1 of the things is that we built on top and so we're still using, you know, the basic You Know Me that everybody can deploy. So we build extensions to that. So for example, we'll be adding,

we call custom items, which makes it possible to store other types of objects inside. And associated with that are recommendations.

So, basically, you're saying I want product recommendations or, you know, any kind of recommendations.

So that way you can easily use the platform

to offer that. And then the core, really, business that we try to do is a strong integration of content.

So any kind of content that you manage, whether it's text or data files or web content or any kind of of content and data.

So here, really, the idea is to make it extremely simple for, you know, just an example, digital marketers

to be able to provide personalized experiences.

And here, we use Younomi

as the main engine to do all the work, but it's tied into here you there are, of course, UIs and everything to make it very easy for end users to build personalized experiences and to have those evolve over time and reacts in real time to

what the users are doing.

Or, also, we also provide connectors

between Younomi and all the different platforms I talked about, like CRMs and

other types of data

sources. So it's sort of, like a complete package where people can just come and purchase the whole thing. But Younomi is really a key component of what's going on behind the scenes.

In your work of

building Younomi and working with the community and providing services on top of it, what are some of the most interesting or innovative or unexpected ways that you've seen it applied?

Yeah. So for me, 1 of the most surprising was definitely, I talked about it earlier about the tracking of people with addictions.

That came out of the blue kind of really. And suddenly, this person was talking to me and said, yeah. I'm working with this medical provider that is dealing with that, and they needed a very easy way to keep track and

send

reminders

and track what people are doing

for these dependencies that they had. That was really something

that felt quite good in a way because

not only was the project being used in a very

innovative way, but it was actually served, you know,

health. That was really, really interesting.

Yeah. There are other users out there, but, unfortunately, I'm not aware of all of them. I know that there are some users in China, for example, but I have no idea what they're doing with it.

So I wish they would tell me because it'd be interesting to hear back, but it's not an obligation.

Anybody can use the system as they see fit, and they don't need to tell anybody about it. That's been 1 use case that's been very interesting.

Yeah. Apart from that, it's pretty standard usage that we see. People building personalization

into

either mobile applications or into websites.

In your experience of building the project and helping to grow it and, again, interacting with the community around it, what are some of the most interesting or unexpected or challenging lessons that you've learned personally in the process? 1 of the lessons was that if you're building a project like that, that's basically API, first,

your documentation

and your onboarding has to be very strong.

And that's been a real challenge because we haven't had many contributors

directly get involved with that.

So we, that's been challenging because you have to

provide a big effort in terms of not only building the tech, but really, you know,

exposing its value and explaining how it works and stuff like that. So that's been a challenge. It's a lot better now, but I think there's still a lot of room for improvement there.

But, you know, as the source code is out there, a lot of people have been figuring it out on theirselves.

I'm always surprised, but but I think the project would be even more popular if it was easier

to, you know, just learn, I would say. So that's been a very interesting thing. The other thing that was also a good lesson was,

you know, if you see these edge use cases,

it's easy to dismiss them and say, do we really want to go there?

And

as you're trying to build a community,

you have to be very inclusive,

and you have to make sure that you're enabling people with as much empowerment

as they can get. That's the best way to build a community.

But it's not easy to do that with a project that you started. You know? It's like it's sort of like your baby, and you have to let go.

And so that's also a challenge. But now it's really interesting, and we've had a lot of contributions over time. And another thing that's also challenging with projects like this is you might have these bursts of interest

where suddenly you have a lot of people that are interested, and then suddenly nothing happens for, like, 3 or 4 months. And then it happens again. And so

you don't deal with this the same way you would a regular,

I would say, commercial based project where, you know, you're always very busy with things and stuff like that. And here, it it can go in waves. So

what's important is that as soon as there is interest

to make sure that

that interest is actually being addressed

and nurtured

very quickly.

Because, you know, people might move on to the next thing or they might get frustrated

and decide to build their own, which can be very frustrating.

I know for 1 case of somebody that was sort of running into issues, and he started rebuilding,

you know, me on the side, you know, his own kind of version of it. That's for me a waste of everybody's time in a way because

it's not a good idea. And we do a lot of examples in the open source world of these sort of forks of people going to different directions.

And it's a shame because

you're sort of splitting the community when you're doing that. So 1 of the things that's really important is to make sure that you're always enabling the community and always so that's a big, big focus, but it's also

it's an effort. So it's not something that necessarily comes naturally.

And for people who are interested in being able to have these personalization

capabilities

and build these aggregate customer profiles and integrate them into their broader software platforms, what are the cases where you know me as the wrong choice and they might be better served? I forms, what are the cases where You know me is the wrong choice and they might be better served either building something custom in house or leaning on something like a RudderStack or a Segment?

Well, I mean, there's no commercial company behind You Know Me. My company offers support on a product that has external I mean, additions. But in terms

of of support,

you know, there's no real as a commercial product would have, there's no commercial offering there. So if that's a a strong need, that would be, you know, something that's

not necessarily interesting.

If you're in a case where,

you know, you're looking for something like a pure profile storage system

and you're not interested

in things like personalization

or the rule engine and stuff like that, I definitely, you know, don't think it's a very good way of using it. Now in terms of you mentioned segment and RudderStack. I think that's if you have those needs, you should, you know, seriously consider You Know Me because I think it's definitely stands

in terms of providing the basics there.

Even though you will, of course, not have the whole UI and things that in terms of the engine,

I think it does stand on its own.

And as you continue to work with the Younomi system, what are some of the things that you have planned for the near to medium term or any projects that you're particularly excited to dig into?

Yeah. Well, there's a lot of things. I mean, You Know Me as a project that

can grow in a lot of ways.

So we have basically 2 releases coming up. 1 is the 1 dot 6, so that'll be sort of the natural, say, maintenance version

of the stable branch.

So that 1 will feature things like a Groovy action API. So that will be the,

I talked about being able to use Groovy to be able to build your own actions.

And along with the idea is we would like to sort of start an open source project

of Groovy connectors if you want action. So the idea is to make it very easy to develop and also feature on the Unomi website,

community builds, actions, and connectors. So the idea here is is really to sort of leverage the community

to be able to, you know, easily integrate.

Just give an example. Let's say you want to have an action that sends a message to Slack.

And so you could have somebody just build that movie action that would do that request to send it to Slack. And then they could say, oh, I want to share this with the community, and they could just have it featured on the UNLV website. And then anybody could just, you know, pick that up

and either use it directly or use it in this version to build their own custom connector and hopefully share it as well. So that's 1 of the things that I'm hoping, you know, will happen.

I'm personally really interested in that. And then we have the next version,

you know, me 2. So that 1 will address a lot of, you know, things we've seen from the experience of using you know me in production. So there'll be some data model improvements so that we can actually handle large sets of data more efficiently.

There's also going to be some improved ways to

guarantee data quality.

As you know, me is a very open system. There are some places where you can feed the data in,

let's say,

less structured ways than some might need.

So and you know me too. There'll be a way to say, okay. For example, I want to make sure that events

have such a structure, and you can enforce things and stuff like that. And that would also help some of the difficulties

around

querying some data. Because if you don't know what that data structure is, then it can make it challenging to actually use that.

1 of the biggest things is the GraphQL API. So right now, you know me is based on the REST API,

but the actual Oasis specification is a GraphQL specification.

And so the GraphQL specification I mean, GraphQL API implementation

will be introduced with, you know, me too. And then it's basically,

you know,

after that release, it'll be a lot of work around

focusing on

community feedback, of of course, what the community needs and wants.

Also, making the platform,

you know, grow in terms of capabilities

and as well integrates more easily with other

systems. Are there any other aspects of the UnoE project or the overall space of customer data platforms

and customer personalization that we didn't discuss yet that you'd like to cover before we close out the show? It's more like something I would like to

get back to.

My strong belief is that

it's okay to know some things about

visitors

if it can really help improve experiences.

So just to give you an example, if you

walk into a high end car dealership

and you've been there once or twice,

if the salesman

comes up to you and says, hello, mister Hoover. How are you doing today? How's it going with that car you bought and stuff like that. That's a very pleasant experience. That's something that as we as humans value

and

actually look for.

And it shows that in some cases,

knowing something about, you know, people

and using that to

help deliver experiences

is not necessarily negative.

Now, unfortunately, the industry as a whole has gone way too far with us,

shown us, you know, there's always that example of the target

personalization

where I think his father's dad was sent, you know, baby clothes without

her daughter even knowing that she was pregnant and stuff like that. I mean I mean, I'm not even sure if that's an urban legend or if it's actually true, but I think there's some truth to it. I think that's the biggest focus is really

trying to find that very delicate balance

of what's okay and what's not okay. And you know me being able to be

deployed on premise. And I've even done

demos of a laptop that's not connected to the Internet

where I can show personalized experience,

you know, completely

built locally

without me sending any data to a third party.

I think

that's

a use case in data management that

I keep in my head quite a lot because,

you know, I'm not saying that using cloud services is a bad idea, but I'm saying you have to be careful with these things and you have to keep,

you know, constant watch of these things. And that's part of what the project is about.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing or get involved with the You Know Me project, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

As an end user,

if that data concerns me

or concerns stuff I've done,

I would like to have some control over it.

And I would like to be able to

either know what's going on with it

and, at the worst, be able to

perform a request for deleting it.

Where things get pretty awkward nowadays

is

even us developers

are growing older.

And

as you pass away,

that data legacy is becoming an issue.

I think we're going to need,

you know, ways

and tools to help

in a reasonable way. You know? I'm not talking about the extremes now anymore.

But as we want to manage that data

and, you know, for example,

I know as a personal experience that it's way too hard to deal with data from a family member that's passed away

and that's still out there on the Internet. I mean, I'm still getting recommendations,

connecting with my father-in-law on Facebook,

and he's passed away. You know? And so I think there's definitely

something missing there. And I don't believe the industry is

deliberately not trying to address that, but I think it doesn't have a lot of visibility right now. And hopefully it's gonna be addressed in the future. But as a broader way, it's all the issues about,

data management for end users and privacy in general.

Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing on You Know Me. It's definitely a very interesting project and fills an interesting

and necessary

gap in the overall ecosystem for being able to help developers make their products more

accessible and more pleasant to use by their customers. So I appreciate all the time and energy that you and the community have put into that, and I hope you enjoy the rest of your day. Yes. Thank you, Tobias. It was really interesting discussion and questions.

Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site of dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts atdataengineeringpodcast.com

with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and

workers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Links