Summary
The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites and applications take that to the extreme and collect too much information. In order to make it easier for developers to build customer profiles in a way that respects their privacy Serge Huber helped to create the Apache Unomi framework as an open source customer data platform. In this episode he explains how it can be used to build rich and useful profiles of your users, the system architecture that powers it, and some of the ways that it is being integrated into an organization’s broader data ecosystem.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. The first 10 people to request a personalized product tour will receive an exclusive Monte Carlo Swag box.
- Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch.
- Your host is Tobias Macey and today I’m interviewing Serge Huber about Apache Unomi, an open source customer data platform designed to manage customers, leads and visitors data and help personalize customers experiences
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Unomi is and the story behind it?
- What are the goals and target use cases of Unomi?
- What are the aspects of collecting and aggregating profile information that present challenges to developers?
- How does the design of Unomi reduce that burden?
- How does the focus of Unomi compare to systems such as Segment/Rudderstack or Optimizely for collecting user interactions and applying personalization?
- How does Unomi fit in the architecture of an application or data infrastructure?
- Can you describe how Unomi itself is architected?
- How have the goals and design of the project changed or evolved since it started?
- What are some of the most complex or challenging engineering projects that you have worked through?
- Can you describe the workflow of using Unomi to manage a set of customer profiles?
- What are some examples of user experience customization that you can build with Unomi?
- What are some alternative architectures that you have seen to produce similar capabilities?
- One of the interesting features of Unomi is the end-user profile management. What are some of the system and developer challenges that are introduced by that capability? (e.g. constraints on data manipulation, security, privacy concerns, etc.)
- How did Unomi manage privacy concerns and the GDPR ?
- How does Unomi help with the new third party data restrictions ?
- Why is access to raw data so important ?
- Could cloud providers offer Unomi as a service ?
- How have you used Unomi in your own work?
- What are the most interesting, innovative, or unexpected ways that you have seen Unomi used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Unomi?
- When is Unomi the wrong choice?
- What do you have planned for the future of Unomi?
Contact Info
- @sergehuber on Twitter
- @bhillou on Twitter
- sergehuber on GitHub
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform. Go to data engineering podcast.com/linotetoday.
That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hi touch is the easiest way to sync data into the platforms that your business teams rely on. The data you're looking for is already in your data warehouse and BI tools. Connect your warehouse to Hi touch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for reverse ETL today. Get started for free at data engineering podcast.com/hitouch.
Your host is Tobias Macy. And today, I'm interviewing Serge Huber about Apache Unomi, an open source customer data platform designed to manage customers, leads, and visitors' data and help personalize customer experiences. So, Serge, can you start by introducing yourself? Yes. Thank you, Thomas.
[00:01:46] Unknown:
So I'm Serge Huber. I actually have 2 hats. I'm the CTO of a DXP company called Jaya, and I'm also the project management committee chair of the Apache NumPy project at the Apache Software Foundation. That's in a nutshell. And do you remember how you first got involved in the area of data? Basically, we were a CMS vendor to start with at Jai. Very quickly, we realized that a lot of people wanted to have data associated with what people were doing on our systems. So we started looking into that and trying to figure out what the best way would be to sort of associate data and content.
When we started looking at things that were out there, to see what we could integrate with or if we should develop our our own thing, And we didn't find anything really compelling, so that's sort of how I got started there. And we started 2 projects. We started first a specification around customer data exchange. So we did that at the Oasis Open Foundation. And there we did initially, it was called the context server specification. So we were around data, but we called it context because we were in the context of the web. And we're realizing that there was an emerging field called customer data.
So that's really how I came to it and got involved with this. The other idea was also to do things, in an open fashion, not using closed platforms or closed tools.
[00:03:18] Unknown:
In terms of the Younomi project, I'm wondering if you can give a bit of an overview about what it is that you're building there and some of the story behind how it got started and some of the early decisions that you had to make in terms of what the capabilities
[00:03:33] Unknown:
needed to be and how you wanted to architect it? So this that actually ties in pretty well with what I was just saying. We're basically where we were looking for a way to integrate data with our platform, but we're looking for open source tools because we're always based on open source. And we found a few things, but nothing that really fit the bill. And so the idea was we work on the specification, and the specification was really about trying to figure out a standardized way to access data and, you know, push it in and pull it out. This was pretty abstract and I was getting very confused by it, to be honest. So I started hacking around and that became the basis for, the You Know Me project, which started as a project at our company but very quickly we realized that it made more sense to do this in the open. And as we're dealing with data that can be sensitive, that can be specific to visitors, I mean, you have people that are coming onto your systems, and you know more and more about them.
We were really early in the concerns of data privacy, data management, and how to actually do something in the most transparent way possible. And that's where the idea of doing this in the open and then contributing to the Apache Software Foundation came to say, we want full transparency on what's going on with the data. And the best way do that is to have an independent project that shares the full source code of what's going on with that data. And, yeah, that's the basis of the Younomi project.
[00:05:08] Unknown:
As far as the overall aspect of customer data, there have been a few different systems that have branded themselves as customer data platforms, possibly the most well known being Segment and then some of the followers on, like, RudderStack. And I'm wondering if you can just talk to some of the kind of goals and target use cases that you're focusing on with Younomi and how that might compare to what the segments and Rudder stacks of the world are doing in terms of their kind of system design and architecture?
[00:05:37] Unknown:
Yeah. That's a very good question. So, I mean, you know, we really started as a tool to fit our needs. You know? And our needs were pretty basic. It was to have something that was capable in real time, you know, collecting events behavior of visitors that were coming into our system. And so in a way, from that starting point, it's very similar to what Segment or Roadster are doing. But the biggest difference was that we quickly expanded that to be agnostic to the incoming system. It doesn't need to be a a web platform. It could be native mobile application. It could be a a CRM system. It could be anything that can talk to, you know, me's API to provide context and information about who the user is, what he's doing, and how we can personalize this experience and provide more better data sharing between the systems.
So very early on, Younomi became a platform to unify all the data that's inside of different platforms. And we really focused on having something based only on an API first so that it would be really easy for people to push things in and pull it out. And the other thing, the data model is, I would say, more opinionated than most customer data platforms, meaning that it's not very restrictive. It can do a lot of things, but it does have a few basic objects that make it pretty easy to understand what's going on. So we do have, like, profiles. We do have, like, events, sessions, and then, of course, any kind of other data you need. But so, yeah, I think that's basically the difference, but there are similarities with those platforms as well. In terms of the
[00:07:31] Unknown:
types of information that you're collecting and some of the aggregations that you're doing for building the profile information. I'm wondering what are some of the challenges that developers often experience when trying to do this on their own without the support of a dedicated platform such as Younomi and some of the ways that the design of Younomi helps to reduce that burden and allow developers to integrate that information more fluidly and more seamlessly into the applications that they're building without having to spend a lot of cycles on building it all from scratch.
[00:08:05] Unknown:
Yeah. Again, that's a very important point because, I mean, the same way Be Built You Know Me to serve our needs so, you know, we wouldn't have to do this from scratch. I mean, we did start the project, but 1 of the ideas was really to help the platform improve with more people using it. And so the initial idea, if you know me, was to make it flexible enough so that it could address a lot of different use cases. And at the same time, that it would be easy enough to understand, integrate with so that it would be compelling in terms of not having to rebuild something from scratch, for the future needs. And this ties into 2 examples of, usage of, you know, me that really surprised me. 1 of them was a company that was actually already doing event collection and had their own way of collecting events, but then they realized that the way you could use Yyou know me to analyze those events and build real time personalization was more powerful. So they choose to do that, but you know me had some problems with doing that because they were doing something very strange with events. And most people consider events as unmutable data. I mean, data that's time stamped and, you know, stored and never touched again. In their case, they were actually modifying the event data over time.
That was not a use case that was, you know, compatible with Younomi at the time, but they contributed to the projects, a session that made it possible. And the idea of the project is to have that flexibility and to be able to adapt to use cases that might have not have been anticipated, and that way you can address a lot more people. And another use case that was very interesting was people actually not doing this with customers, but with patients. So we had an application where people were using Unomi to track people that have addictions, you know, like addictions to drugs or stuff like that. And these people you have to keep track of, otherwise, they will relapse and stuff like that. So they were using Younomi in that way. And, again, it's something that I don't think, you know, people would have done if they had to start from scratch or build something based on another platform.
[00:10:17] Unknown:
As far as the sort of architecture of Yonomi, I know that in terms of the sort of system requirements, it's actually fairly lightweight where lot of systems such as RudderStack and Segment require you to have some sort of cloud data warehouse in place. You know me instead orients itself around Elasticsearch as the kind of core architectural element. And I'm wondering what the thinking was in terms of the design requirements and system requirements and the target user group for Younomi that influenced your decisions to focus on Elasticsearch as that core storage location and maybe what the kind of technological landscape was at the time that you started it that also fed into that decision.
[00:11:00] Unknown:
Yeah. So that's a decision that was quite controversial when the project started. Because when we started the projects, as I said, the early project was, you know, just something that I was trying out as ideas. And using Elasticsearch as a storage system was, you know, dubious to say the least at the time. But, actually, over time, it's proven quite reliable, and we have a lot of people using this in production now. But, of course, Elasticsearch has greatly improved since the project started. But the objectives were always sort of the same. The system had to be highly scalable. The idea was really to make sure that we had targets. I mean, I can't say that it's always easy to hit those, but the idea I have in mind is always to be under a 100 to 200 milliseconds in terms of full request time. And in order to make that scalable, that means that your whole platform has to be able to scale so that you can always compensate for the extra load and not have bottlenecks, be problematic.
So that's why this very simple architecture was put in place because from previous experiences, if you had, like, state management, node to node communication, these types of technical challenges, that makes scaling and node independence very difficult. And so Elasticsearch did a pretty good job of implementing things like that. We were quite happy with that, and so that served really the purpose of having this back end that was, you know, high performance and scalable. And then Younomi added on top of that the the the rule system that was needed to offer the real time personalization and segmentation.
[00:12:40] Unknown:
So digging more into, you know, me itself, I'm wondering if you can talk to just the overall system design and the capabilities that it provides and how it might fit into an existing application architecture or somebody's data infrastructure and kind of the designs and decisions that have gone into sort of the API specification and how you think about sort of where to deploy it and when to use it?
[00:13:05] Unknown:
There are different parts of it. But if I take, let's say, the use case of integration with a web based system, So where you would start is we have endpoints in the API to collect events and to also track users or visitors. And so, basically, the main endpoint will be the 1 that's basically taking those things as input and keeping an identifier and building progressive profile of the person interacting with You Know Me. At the same time, we have a system where you can actually send queries to You Know Me to know if this user matches a specific set of conditions, and that's used for personalization.
So, basically, you can say the the basic use case is, is this user from a specific country? I mean, a really very simple condition. But you could also have more complex ones where you say, has this user come in the last x days and visited this page and whatever? You can build really complex things. And that way then you can build personalization based on that and you can access profile data at any time. So that's what, say, would be the basic API for real time personalization, tracking, and event collection. Then at the core of, you know, me's functionality is the rule engine.
The rule engine is basically a rule is a very simple you have the, condition, like when this condition, then you do a set of actions. And conditions always react to events. So basically, if you, for example, you have an event that comes in, so a user has clicked on something or he has viewed a specific screen, if you're in the case of a mobile application, or in a support system, he has opened an issue or whatever the event is, you can associate it with a rule. And if that rule's condition matches the incoming event, you can then execute actions. You know me has an extremely, flexible system of conditions and actions.
You can build your own conditions that you deploy to Unomi and as well as the actions. And so Unomi comes with built ins, conditions and actions, but you can provide your own. And this rule system is, you know, a very basic thing to sort of explain, but it's incredibly flexible, and the rules are already executed in real time. So as an event is coming in, it will immediately trigger rules. Those rules will perform actions in real time. So an action could be update the user profile with a specific property, or it could be call a third party system, for example, a CRM, and see if the CRM has any information about this profile, then pull that data back and perform any action you want with that. So the rule system can serve as an orchestration system, and it can all happen, you know, dynamically.
And, of course, other usual services provided by CDPs such as segmentation are, of course, also built in. And there's also, of course, import, export, and there's, all kinds of other services that you find in typical CDPs. Now 1 thing I should mention is there's a lot of confusion around the CDP term. I would describe, you know, me more as a CDP engine than of what a lot of people consider as a CDP because Yonomi does not have a UI. It's an API based platform. If people are looking for CDP as a full based product that has, UI, that's not what you know me is, at least not right now. Given the fact that it is API only and doesn't have that UI element to it, it's definitely
[00:17:06] Unknown:
very much a developer centric project. And I'm wondering how that focus has influenced some of the kind of feature decisions and prioritization as far as engineering effort and some of the limitations that it imposes in terms of where you might want to use it? That's where the community comes in, I would say.
[00:17:27] Unknown:
The nice thing about having a project at Apache is it's very visible, I mean, the software engineering world. So people will pretty quickly discover the project and then come with their use cases. So they've sort of done the work already at that point of, you know, saying, okay. Does this look like it's in the sort of field of my needs and will it match or not? And then that's where they will start a discussion with the other community members and say, okay. Can this work or not? Do we need anything that's not there? And, you know, that's something that works pretty well. 1 of the things that's pretty surprising about the community is that there's a huge silent community that we still want to talk to a little bit more because we just had our 1st meetup, and we had a lot more people show up at the meetup. But we have in our Slack channel over a 150 people that are constantly in the channel and that are mostly just, you know, not very active, but they're clearly following the discussions and things. So there's clearly a community of people, developers, of course, that are using the system, that are interested in it, and that are contributing, you know, to their own abilities.
That's definitely something that's happening. And now 1 of the challenges is that Younomi being a Java based platform, there are a lot of people using Younomi as an API, so they don't need to know any Java to use the system. Now if they want to contribute extensions, that's where it becomes a little bit more tricky if they don't have experience with Java. But we are working on that, and we are, for example, adding in the new versions, possibilities to, for example, build actions using other languages such as Groovy. I haven't seen somebody come to me and say, you know, I don't know how to use this. That's a rare use case, rare scenario. But we do have a lot of people saying, okay, You know, I have this usage.
Will that fit well or not? And in the case where I describe with the people doing these mutational on events, that took a little bit of work, but it worked out in the end. In terms of the
[00:19:32] Unknown:
kind of use cases for Younomi, Me, given the fact that you are building up these customer profiles and you could potentially use that information that's aggregated across these series of interactions to feedback into your application, to tailor the experience, maybe provide some recommendations to the user or help to remember some of their preferences, maybe across a suite of products that exist within an organization. Wondering if you can just talk to some of the types of user experiences that can be powered by the You know me platform in ways that you might tap into that as an application developer to pull in maybe some of those recommendations or use that profile information to enrich the context of the application that the user is interacting with?
[00:20:15] Unknown:
Again, I know the web case is the best because that's 1 of the ones that I see the most often. But for example, you could tie in, I mean, any kind of information coming either from behavior. So if you have the user interacting with the system and he's, you know, viewing content or editing content or, you know, he's putting stuff in a cart, the ecommerce scenario. For example, the ecommerce scenario is interesting because there you have another system, which is the ecommerce system, and you could, for example, say, if you see this visitor coming back and you know me's building this progressive profile, you could have you know me actually query the ecommerce system to say, okay. Can you give me more information about, for example, previous purchases?
That way I could either push some content to him or recommendations or whatever. In the case of, you know, sales scenario, you could also do the same thing instead of an ecommerce back end. You could have, like, a CRM such as Salesforce. You know me does have a Salesforce plug in that's already in the project. And in the case of Salesforce, you can just say, okay. I've identified this person with this email. So do you have somebody in the system with that same email so that I can augment the information that's inside of you know me? For example, if there's any kind of indication from the CRM about this person is hesitant to do a sale or something like that, then I could, for example, build a personalized experience for this person using Younomi that would be based on the information coming from the CRM.
Now if we look at some of the scenarios from other users of Yonomi, in the case of these people that were using these events, In their case, what they're doing is they're actually in the business of building reviews on products. And so they're using, you know, me to study the behavior of people on reviews, what they're doing, the type of language they're using in their reviews, and stuff like that. And that could be used from anything to detecting spam, to actually suggesting more reviews based on the products that they buy, for example, if it's tied in with the ecommerce system. So you can really with, I would say, you know, very little amount of work tie into a lot of systems. You know me as more dependent on external systems having, providing access than the opposites. Usually, you know me as an open book mostly, and so you will be able to do, whatever you want with the data that's in you know me. But if you're trying to get access data from other systems, you'll be more dependent on what those systems let you access.
[00:22:52] Unknown:
As far as the ways that people are maybe building up these different user experiences, if they're not already using Unomi, what are some of the alternative infrastructures or architectural and software patterns that they might lean on to be able to get a similar outcome for their end users?
[00:23:12] Unknown:
Well, we do see a lot of people building their own data solutions, if I could say that, ranging from many friend from log as an analysis or behavior analysis. But I think that sort of ship has sailed. Now with you mentioned RudderStack. You mentioned Segment. There are other players where, basically, instead of building the whole thing from scratch, you will look at at reusing things. But I still think there aren't that many purely open source solutions that will help you. So you could, of course, also use, you know, traditional CDP vendors to address that. But then again, you will be dependent on what those CDP vendors give you access to because some of them might be more open than others, and so it could be problematic.
1 of the worst to my knowledge in terms of data access is people using stuff like Google Analytics because you have very little access to raw data there or you have to pay I mean, I don't know the exact prices, but you have to pay quite a large amount of data I mean, a large amount of money to get access to the raw data. So 1 of the things I tell people is be careful what data you're giving to 3rd parties that will sort of lock you into the platform. Because once you've sort of gone that direction, kinda lose control of that data and you become dependent on what that provider gives you access to or how he monetizes it. So that can be a challenge. But, yeah, you don't have to start from scratch.
You know, we use you know me ourselves, so we don't see that many other systems being implemented. Although, on our customers, we have seen people, you know, coming with other systems that they have right now, but most of them are not already on CDP platforms. They're more on CRMs or analytics platforms like Google Analytics, something like that. So CDP is still pretty new in terms of adoption.
[00:25:16] Unknown:
To your point about the raw data access and how it might be limited or nonexistent if you're using a third party service, I'm wondering what you see as some of the important capabilities that are enabled by having that access to the actual raw events to be able to maybe reprocess them or add different logic or do multiple different aggregations across it? I would say I think the sky's the limit because
[00:25:40] Unknown:
we've identified some of those over the time that we've been working on it. But 1 of the biggest issues is privacy and regulations now. You have to have, especially in Europe with the GDPR, you have to have control over that data in some way or you have to ask the user's consent. But the problem is that, basically, as soon as you lose control of that data flow, things can become very fishy. So that's 1 problem. The other problem is analytics, basically, where you want to, you know, do other types of analytics than what the platform offers by default. 1 of where the way I explained that is if you're using Google Analytics to do analytics, basically, every single person has the same analytics.
So you can't use that as a competitive advantage. And that's a big problem because, basically, if you cannot use data freely to actually build a competitive advantage, then you it's a problem because you're missing a lot of people now. Some of the most successful companies use data analysis and data processing as a way to be very competitive and ahead of the competition. I mean, even politicians do that. I mean, a lot of people think that part a lot of the success of Donald Trump's election was due to the data processing and the data collection and stuff like that. So I think that not having access to raw data, yeah, is really cutting people off from a lot of opportunities and being able to do whatever you can imagine you could do with their data. And a lot of these cases might be, you know, things you identify at 1 specific point in time, but it they might also happen, you know, a few years from now. And so maybe, you know, like 4 or 5 years from now, you're like suddenly, okay, we would really want access to this kind of event data or this kind of profile data.
And if the solution you've chosen doesn't give you that raw access, then you have problems. And, of course, the worst case scenario is a company going out of business if you have, like, a big cloud. I mean, that's I don't see that as a problem with Google, but with others, you could wonder, you know, if if suddenly their platform disappears, you know, what happens? And so, you know, that's another case where it could be problematic. So and from an end user point of view, the idea was really to be transparent as possible. So having all the access all the way from production to the storage of the data and having that flow be completely open makes it possible for anybody to review it, anybody to understand it, and so that you can actually say, okay. I know what's happening with the data. You know, we don't have to believe anybody. We can actually trace the whole thing. In terms of the broader integration
[00:28:27] Unknown:
of Younomi into somebody's data infrastructure where because of the fact that everything is living in Elasticsearch and you can use it as an application developer, but a lot of other uses of that data might be in a broader data warehouse context or they might want to be able to do some of the reverse ETL or operational analytics aspect of pushing that data back into some of the sort of CRM systems or sales platforms, things like that. What are some of the patterns that you've seen in that regard for integrating Unomi into a broader data user applications, but also have it be a first class citizen within the broader data platform and have pipelines that feed off of, you know, me into the data warehouse or other downstream systems.
[00:29:17] Unknown:
We've seen people try to do this, let's say, the traditional way, which is handling batches of data and basically, you know, saying, okay. I have this data either coming from you know me or I want to send it to you know me. And, you know, so from the data warehouse or data lake or whatever. And we try to sort of, you know, go back to the basic use cases of what people are trying to achieve in the end because although it's possible, because with You Know Me, you can either talk to You Know Me at the API level, or you could directly go to search and directly access that layer and do whatever you need there.
So you can say, I want to do these batches or these ETLs or both into You Know Me or out of it or into Elasticsearch or out of it. But I think that you know me is really good. I think it really helps more when you can do whatever you need to do in, I would say, close to real time. And so a lot of the times, we sort of say, okay. So, you know, if you need to have a copy of the data that you know me is collecting, then either you know me can send it directly or in that case, we can look at batching scenarios. But if you're, for example, saying, you know we need something from the data warehouse or from another system, we would ask, is there any way that you know me can access that, you know, as quickly as possible in real time as the user is interacting with it? And although that does, you know, challenge some people sometimes because they're not necessarily used to working that way, in the end, it provides these experiences that are really compelling, because you can really as I mentioned, for example, the cases of ecommerce or support or whatever, we could imagine events scenarios of fraud detection, things like that, where you do have lots of data that's being processed, but and you know me will then be able to pick that up and immediately react upon it. So I think that's some of the use cases that were the most compelling. But as I said, we also have people that are, in the case of the reviews, they are actually using Kafka to inject batches of events into You Know Me, and then You Know Me is just processing that as it comes in in in batches and uses that to augment the profiles and stuff like that. So it's still possible, but I don't think that's where the biggest benefit comes from.
[00:31:44] Unknown:
In terms of being able to have ownership of the data and use this information in various ways, some of the other aspects that come into play, particularly when dealing with customer information, are these increasingly prevalent regulatory regimes of GDPR, CCPA, some of the other privacy considerations that might be, you know, up and coming. And I'm wondering what are some of the ways that You Know Me is designed to help manage some of those considerations and some of the ways that, people who are running You Know Me can easily delete data for people who request that.
[00:32:20] Unknown:
So that was 1 of the nice things about having a project that that independent and open is that as the GDPR was coming into effect, you could actually see all the companies getting really nervous and many things. And for You Know Me, it was pretty simple to actually directly add the stuff that's needed to, for example, anonymize data. So 1 of the things you can say is you can say inside of you know me. Okay. These properties are personal properties. So either they should not be stored or they should be erased upon you know, a request. So in the API, you can directly say, for example, delete all my personal information. And, you know, me actually went a little bit further than that is you know me also has a built in consent API.
So you can directly store consents inside of the profiles. So in a way, it's like a constant aggregator kind of thing where you can use Unomi and say, okay. For example, you know, if you have third party data that's being sent to 3rd parties, you could then have inside of the Unobee profile and then a consent that says, okay. We have this consent, and it will allow us to send the data to the 3rd party. Also, the consents were designed to fit with the requirements of the GDPR. I mentioned GDPR because I mean, I focused on that 1 because since it was on the the early ones, they went pretty far in terms of specifying things. And 1 of the consents and I'm still wondering how many people are actually compliant with this, is that consent is not infinite.
Consent has, if I remember this correctly, has a maximum duration of 2 years. I'd have to double check that. But the system that was developed creates a consent. For example, if you grant a consent for, having mail sent, the consent would automatically have an expiration date in 2 years from now. Things like that. So, yeah, I'm not saying it's perfect. A lot of the APIs were done very early on, and, you know, they are more tools than opinionated. For example, consents will not directly trigger actions. So if you change a cons you will actually need to set up a rule that says, if this consent is granted or not granted, then you perform such an action.
But what that also means is that you can orchestrate consent changes. So for example, you could say, I have a consent for 3rd parties. Somebody says, I want to erase my data from all these third parties, then you just need an action that will connect to all the third parties and say, you know, you need to get rid of the data of this person. So, you know, it's not something that is set in stone. It offers tools, and I think it will help a lot of people, you know, deal with these regulations because, you know, we've worked on being, you know, helpful for the GDPR, but more and more of these regulations are gonna appear because everybody's, you know, building some form of these regulations.
[00:35:19] Unknown:
Struggling with broken pipelines, stale dashboards, missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world's 1st end to end fully automated data observability platform. In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today. Go to data engineering podcast.com/montecarlo to learn more. The first 10 people to request a personalized product tour will receive an exclusive Monte Carlo swag box.
In terms of the customer consent and the information that's managed, as end users of a lot of these platforms, it can often be very opaque what information is even held by these systems. And I know that 1 of the capabilities that's built into Unomi is the idea of a customer dashboard to be able to see what does my profile look like so I can either request that it be deleted or request modifications to make it more accurate. I'm wondering if you can talk to some of the security and design considerations that go into that and what kinds of capabilities you have make that easier to manage for people who are running Unomi?
[00:36:48] Unknown:
Again, so Unomi is an API. I mean, it's not gonna directly have a UI that, you know, the end user can use. So we are relying there on whatever the integrators are offering. But the integrators have the possibility to actually show the full profile to end users should they choose to, and then the end user could, you know, make a request to say, okay. I want you to delete my profile. That's a very simple thing. And, actually, when you delete a profile, you can actually use the API to not only delete the profile, but actually go delete all the events associated with that profile. So or something that's acceptable in terms of regulation, and that's where the user has to sort of choose is say, I don't need you to to erase all the events, but I do need you to anonymize all that. So that's some of the things. And you know me, I was actually modified to collect less information than it used to initially.
Because, for example, in the GDPR, the IP address was defined as a personal identifier. So a lot of the information that you know me was collecting stored the IP address. So we changed that so that you were no longer doing it by default. You could still do it if you really wanted to, but by default, the system will try to avoid that. So, again, it's really offering a toolbox that makes these things much easier to do. We have these discussions a lot about, you know, what is acceptable, what is not acceptable to do with profile data and customer data. You know, I personally think that we should explain as much as possible to people what these systems do and how they're built, what they can get out of it, and what the company can get out of it. But the person doing the explaining is somebody you have to trust, and that person has to be able to trust the whole chain. And so, you know, me was part of that sort of thinking was making sure that the systems could be as trustworthy as possible. And then based on that, the trust could go all the way up to the end user so that he would be more willing to use the systems than not. So, you know, it's always difficult because, you know, we're now seeing all these exaggerations
[00:39:04] Unknown:
of people using too much customer data and abusing it. So it's always a challenge there. In terms of actually being able to run Unomi, as it stands right now, you have to deploy the infrastructure, deploy the application, build the integrations yourself, So it can be potentially a significant lift for people who want to just get started with it. And I'm wondering if you can just talk to some of the operational characteristics of deploying and running Unomi and what you see as some of the potential for it being able to be offered as a service either by an individual company or by various cloud providers or some of the ways that the specifications that you're working on as part of the OASIS Foundation can be integrated into other manifestations of similar systems?
[00:39:50] Unknown:
Yanomie has a Docker compatible deployment system now. So you can easily use Docker to set up, you know, Elasticsearch containers and, you know, me containers, and then deploy all that in a Kubernetes cluster and things like that. So in terms of actual deployment frameworks, you know, we're using sort of what a lot of people are using today, so that shouldn't be too strange. Of course, when you're talking about what's built on top of that, that's another story. But before I go into that, I want to address the service provider. So, you know, Apache, you know, is completely independent. My company doesn't own it. We contribute to it quite a lot, but, you know, it's really an independent project. So, you know, if any of the big cloud providers, Amazon or Google or any others wanted to provide, you know, me as a service, you know, there's nothing preventing them from doing that. And maybe there would be even an incentive at some point to to do that. If you know me, it becomes something that's interesting to enough people.
Now, of course, you know, a lot of people will need additional features or services. That's where other companies can offer value on top of that, and that's 1 of the things that my company does. So, yeah, that's sort of the different things that you can do, but I think that most people can use it, and most people already use it. For yeah. I should mention that there is a Drupal integration also. And so Drupal is completely different technology stack. It's been built in PHP. And so you already have plugins that make it possible to use it with Google.
[00:41:22] Unknown:
In terms of the work that you're doing with Younomi and how you're using it in your business, I'm wondering if you can just talk to some of the ways that you've used it in different applications that you've built or some of the capabilities that you're layering on top of it. Younomi has, you know, like, all the services that we talked about, but there are a few services that are they're not available in You Know Me. And for example,
[00:41:43] Unknown:
1 of the things is that we built on top and so we're still using, you know, the basic You Know Me that everybody can deploy. So we build extensions to that. So for example, we'll be adding, we call custom items, which makes it possible to store other types of objects inside. And associated with that are recommendations. So, basically, you're saying I want product recommendations or, you know, any kind of recommendations. So that way you can easily use the platform to offer that. And then the core, really, business that we try to do is a strong integration of content. So any kind of content that you manage, whether it's text or data files or web content or any kind of of content and data.
So here, really, the idea is to make it extremely simple for, you know, just an example, digital marketers to be able to provide personalized experiences. And here, we use Younomi as the main engine to do all the work, but it's tied into here you there are, of course, UIs and everything to make it very easy for end users to build personalized experiences and to have those evolve over time and reacts in real time to what the users are doing. Or, also, we also provide connectors between Younomi and all the different platforms I talked about, like CRMs and other types of data sources. So it's sort of, like a complete package where people can just come and purchase the whole thing. But Younomi is really a key component of what's going on behind the scenes.
[00:43:16] Unknown:
In your work of building Younomi and working with the community and providing services on top of it, what are some of the most interesting or innovative or unexpected ways that you've seen it applied?
[00:43:26] Unknown:
Yeah. So for me, 1 of the most surprising was definitely, I talked about it earlier about the tracking of people with addictions. That came out of the blue kind of really. And suddenly, this person was talking to me and said, yeah. I'm working with this medical provider that is dealing with that, and they needed a very easy way to keep track and send reminders and track what people are doing for these dependencies that they had. That was really something that felt quite good in a way because not only was the project being used in a very innovative way, but it was actually served, you know, health. That was really, really interesting.
Yeah. There are other users out there, but, unfortunately, I'm not aware of all of them. I know that there are some users in China, for example, but I have no idea what they're doing with it. So I wish they would tell me because it'd be interesting to hear back, but it's not an obligation. Anybody can use the system as they see fit, and they don't need to tell anybody about it. That's been 1 use case that's been very interesting. Yeah. Apart from that, it's pretty standard usage that we see. People building personalization into either mobile applications or into websites.
[00:44:39] Unknown:
In your experience of building the project and helping to grow it and, again, interacting with the community around it, what are some of the most interesting or unexpected or challenging lessons that you've learned personally in the process? 1 of the lessons was that if you're building a project like that, that's basically API, first,
[00:44:57] Unknown:
your documentation and your onboarding has to be very strong. And that's been a real challenge because we haven't had many contributors directly get involved with that. So we, that's been challenging because you have to provide a big effort in terms of not only building the tech, but really, you know, exposing its value and explaining how it works and stuff like that. So that's been a challenge. It's a lot better now, but I think there's still a lot of room for improvement there. But, you know, as the source code is out there, a lot of people have been figuring it out on theirselves. I'm always surprised, but but I think the project would be even more popular if it was easier to, you know, just learn, I would say. So that's been a very interesting thing. The other thing that was also a good lesson was, you know, if you see these edge use cases, it's easy to dismiss them and say, do we really want to go there?
And as you're trying to build a community, you have to be very inclusive, and you have to make sure that you're enabling people with as much empowerment as they can get. That's the best way to build a community. But it's not easy to do that with a project that you started. You know? It's like it's sort of like your baby, and you have to let go. And so that's also a challenge. But now it's really interesting, and we've had a lot of contributions over time. And another thing that's also challenging with projects like this is you might have these bursts of interest where suddenly you have a lot of people that are interested, and then suddenly nothing happens for, like, 3 or 4 months. And then it happens again. And so you don't deal with this the same way you would a regular, I would say, commercial based project where, you know, you're always very busy with things and stuff like that. And here, it it can go in waves. So what's important is that as soon as there is interest to make sure that that interest is actually being addressed and nurtured very quickly.
Because, you know, people might move on to the next thing or they might get frustrated and decide to build their own, which can be very frustrating. I know for 1 case of somebody that was sort of running into issues, and he started rebuilding, you know, me on the side, you know, his own kind of version of it. That's for me a waste of everybody's time in a way because it's not a good idea. And we do a lot of examples in the open source world of these sort of forks of people going to different directions. And it's a shame because you're sort of splitting the community when you're doing that. So 1 of the things that's really important is to make sure that you're always enabling the community and always so that's a big, big focus, but it's also it's an effort. So it's not something that necessarily comes naturally.
[00:47:44] Unknown:
And for people who are interested in being able to have these personalization capabilities and build these aggregate customer profiles and integrate them into their broader software platforms, what are the cases where you know me as the wrong choice and they might be better served? I forms, what are the cases where You know me is the wrong choice and they might be better served either building something custom in house or leaning on something like a RudderStack or a Segment?
[00:48:04] Unknown:
Well, I mean, there's no commercial company behind You Know Me. My company offers support on a product that has external I mean, additions. But in terms of of support, you know, there's no real as a commercial product would have, there's no commercial offering there. So if that's a a strong need, that would be, you know, something that's not necessarily interesting. If you're in a case where, you know, you're looking for something like a pure profile storage system and you're not interested in things like personalization or the rule engine and stuff like that, I definitely, you know, don't think it's a very good way of using it. Now in terms of you mentioned segment and RudderStack. I think that's if you have those needs, you should, you know, seriously consider You Know Me because I think it's definitely stands in terms of providing the basics there.
Even though you will, of course, not have the whole UI and things that in terms of the engine, I think it does stand on its own.
[00:49:06] Unknown:
And as you continue to work with the Younomi system, what are some of the things that you have planned for the near to medium term or any projects that you're particularly excited to dig into?
[00:49:16] Unknown:
Yeah. Well, there's a lot of things. I mean, You Know Me as a project that can grow in a lot of ways. So we have basically 2 releases coming up. 1 is the 1 dot 6, so that'll be sort of the natural, say, maintenance version of the stable branch. So that 1 will feature things like a Groovy action API. So that will be the, I talked about being able to use Groovy to be able to build your own actions. And along with the idea is we would like to sort of start an open source project of Groovy connectors if you want action. So the idea is to make it very easy to develop and also feature on the Unomi website, community builds, actions, and connectors. So the idea here is is really to sort of leverage the community to be able to, you know, easily integrate.
Just give an example. Let's say you want to have an action that sends a message to Slack. And so you could have somebody just build that movie action that would do that request to send it to Slack. And then they could say, oh, I want to share this with the community, and they could just have it featured on the UNLV website. And then anybody could just, you know, pick that up and either use it directly or use it in this version to build their own custom connector and hopefully share it as well. So that's 1 of the things that I'm hoping, you know, will happen. I'm personally really interested in that. And then we have the next version, you know, me 2. So that 1 will address a lot of, you know, things we've seen from the experience of using you know me in production. So there'll be some data model improvements so that we can actually handle large sets of data more efficiently.
There's also going to be some improved ways to guarantee data quality. As you know, me is a very open system. There are some places where you can feed the data in, let's say, less structured ways than some might need. So and you know me too. There'll be a way to say, okay. For example, I want to make sure that events have such a structure, and you can enforce things and stuff like that. And that would also help some of the difficulties around querying some data. Because if you don't know what that data structure is, then it can make it challenging to actually use that. 1 of the biggest things is the GraphQL API. So right now, you know me is based on the REST API, but the actual Oasis specification is a GraphQL specification.
And so the GraphQL specification I mean, GraphQL API implementation will be introduced with, you know, me too. And then it's basically, you know, after that release, it'll be a lot of work around focusing on community feedback, of of course, what the community needs and wants. Also, making the platform, you know, grow in terms of capabilities and as well integrates more easily with other
[00:52:08] Unknown:
systems. Are there any other aspects of the UnoE project or the overall space of customer data platforms and customer personalization that we didn't discuss yet that you'd like to cover before we close out the show? It's more like something I would like to
[00:52:22] Unknown:
get back to. My strong belief is that it's okay to know some things about visitors if it can really help improve experiences. So just to give you an example, if you walk into a high end car dealership and you've been there once or twice, if the salesman comes up to you and says, hello, mister Hoover. How are you doing today? How's it going with that car you bought and stuff like that. That's a very pleasant experience. That's something that as we as humans value and actually look for. And it shows that in some cases, knowing something about, you know, people and using that to help deliver experiences is not necessarily negative.
Now, unfortunately, the industry as a whole has gone way too far with us, shown us, you know, there's always that example of the target personalization where I think his father's dad was sent, you know, baby clothes without her daughter even knowing that she was pregnant and stuff like that. I mean I mean, I'm not even sure if that's an urban legend or if it's actually true, but I think there's some truth to it. I think that's the biggest focus is really trying to find that very delicate balance of what's okay and what's not okay. And you know me being able to be deployed on premise. And I've even done demos of a laptop that's not connected to the Internet where I can show personalized experience, you know, completely built locally without me sending any data to a third party.
I think that's a use case in data management that I keep in my head quite a lot because, you know, I'm not saying that using cloud services is a bad idea, but I'm saying you have to be careful with these things and you have to keep, you know, constant watch of these things. And that's part of what the project is about.
[00:54:27] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing or get involved with the You Know Me project, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:54:44] Unknown:
As an end user, if that data concerns me or concerns stuff I've done, I would like to have some control over it. And I would like to be able to either know what's going on with it and, at the worst, be able to perform a request for deleting it. Where things get pretty awkward nowadays is even us developers are growing older. And as you pass away, that data legacy is becoming an issue. I think we're going to need, you know, ways and tools to help in a reasonable way. You know? I'm not talking about the extremes now anymore. But as we want to manage that data and, you know, for example, I know as a personal experience that it's way too hard to deal with data from a family member that's passed away and that's still out there on the Internet. I mean, I'm still getting recommendations, connecting with my father-in-law on Facebook, and he's passed away. You know? And so I think there's definitely something missing there. And I don't believe the industry is deliberately not trying to address that, but I think it doesn't have a lot of visibility right now. And hopefully it's gonna be addressed in the future. But as a broader way, it's all the issues about, data management for end users and privacy in general.
[00:56:14] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing on You Know Me. It's definitely a very interesting project and fills an interesting and necessary gap in the overall ecosystem for being able to help developers make their products more accessible and more pleasant to use by their customers. So I appreciate all the time and energy that you and the community have put into that, and I hope you enjoy the rest of your day. Yes. Thank you, Tobias. It was really interesting discussion and questions. Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used.
And visit the site of dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts atdataengineeringpodcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and
[00:57:21] Unknown:
workers.
Introduction and Sponsor Messages
Interview with Serge Huber: Introduction and Background
Overview of Apache Unomi
Comparison with Other Customer Data Platforms
Challenges in Customer Data Aggregation
System Design and Architecture of Unomi
Integration with Existing Systems
Use Cases and User Experiences
Alternative Infrastructures and Solutions
Importance of Raw Data Access
Broader Integration into Data Infrastructure
Regulatory Compliance and Data Privacy
Customer Consent and Profile Management
Operational Characteristics and Deployment
Business Applications and Extensions
Innovative Use Cases
Lessons Learned and Community Building
When Unomi is the Wrong Choice
Future Plans for Unomi
Balancing Personalization and Privacy
Closing Remarks and Contact Information