Summary
The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience. The team at Prifina have built a platform that allows users to create their own personal data cloud and install applications built by developers that power useful experiences while keeping you in full control. In this episode Markus Lampinen shares the goals and vision of the company, the technical aspects of making it a reality, and the future vision for how services can be designed to respect user’s privacy while still providing compelling experiences.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!
- Your host is Tobias Macey and today I’m interviewing Markus Lampinen about Prifina, a platform for building applications powered by personal data that is under the user’s control
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Prifina is and the story behind it?
- What are the primary goals of Prifina?
- There has been a lof of interest in the "quantified self" and different projects (many that are open source) which aim to aggregate all of a user’s data into a single system for analysis and integration. What was lacking in the ecosystem that makes Prifina necessary/valuable?
- What are some of the personalized applications for this data that have been most compelling or that users are most interested in?
- What are the sources of complexity that you are facing when managing access/privacy of user’s data?
- Can you describe the architecture of the platform that you are building?
- What are the technological/social/economic underpinnings that are necessary to make a platform like Prifina possible?
- What are the assumptions that you had when you first became involved in the project which have been challenged or invalidated as you worked through the implementation and began engaging with users and developers?
- How do you approach schema definition/management for developers to have a stable implementation target?
- How has that schema evolved as you introduced new data sources?
- What are the barriers that you and your users have to deal with when obtaining copies of their data for use with Prifina?
- What are the potential threats that you anticipate for users gaining and maintaining control of their own data?
- What are the untapped opportunities?
- What are the topics where you have had to invest the most in user education?
- What are the most interesting, innovative, or unexpected ways that you have seen Prifina used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Prifina?
- When is Prifina the wrong choice?
- What do you have planned for the future of Prifina?
Contact Info
- @mmlampinen on Twitter
- mmlampinen on GitHub
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means? Our friends at Outland started out as a data team themselves and faced all this collaboration chaos. They started building Outland as an internal tool for themselves. Outland is a collaborative workspace for data driven teams, like GitHub for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams to create a single source of truth for all of their data assets and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.
Go to dataengineeringpodcast.com/outland today. That's a t l a n, and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $3, 000 on an annual subscription. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Pacaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Marcus Lampinen about Profina, a platform for building applications powered by personal data that is under the user's control. So, Marcus, can you start by introducing yourself? Thank you, Tobias, for having me, and hey, everyone. So my name is Marcus Lambdin. I'm
[00:02:11] Unknown:
a product engineer at the heart of it. I worked on various companies in the past with digital rights management and Fintech. And then now I'm essentially building a developer focused platform for building on personal data, like Tobias said. The way that we came up to this is that we look at the current ways of utilizing personal data as inefficient because we all have so much data out there, but we're just not really making the most of it yet. So we're here to change that. And do you remember how you first got involved in data management? I guess I've been working with data in 1 way or another my entire life, starting with, for example, content rights. I mean, basically, what you're looking at is you're logging views and listens and plays around the world, and then you're applying essentially how those came to be just different types of distribution chains online.
In Fintech, I started working with all sorts of the different types of datasets that relate to financial marketplaces. But But I think the thing that kinda led to perfina was this realization that I'm a data geek myself about my my own body and my activity and and so on and so forth. And we just kinda looked at the explosion of all of the different smart devices that we have. Like, for example, a smart watch, a smart ring, smart clothing, shoes, and and whatnot. And we started realizing that they don't really talk to 1 another. And just personally trying to kinda glue them together in different types of dashboards and different types of utilities, then that was kind of 1 of those maybe moments that maybe it's not just me trying to kinda get all of this to actually work more or less cohesively that maybe there's others out there. That was 1 of those moments for Profina kind of from a personal data point of view. But I think it was also like a personal thing for just how we just as individuals kind of can actually get more power out of this data that we have. But, you know, it's funny. Like, you know, looking back, you're able to glue together different types of bits and pieces of your history, but I'm not sure that ever really occurred to me at the time. But now it just makes perfect sense.
[00:04:13] Unknown:
And so that brings us more into what you're building at Perfina. So can you give a bit more of an overview about some of the specific goals of the business and some of the sort of technological focus that you're taking and how the overall project came about? So starting from, like, let's say, a macro point of view, then
[00:04:31] Unknown:
essentially realizing that we all have so much data basically that we have access to and that we've been given the right to, like, for example, data and different types of services that we use, wearables and devices and whatnot, but is sort of out of reach. Then we started looking at, like, you know, how do we expect the Internet and the data market overall to develop in 10 years? Is it really feasible that we only have a couple companies here in the US practically dominate the entire data market? Or can we actually expect to see something that's a little bit more of an open market? From there, we started looking at, like, the wearables and different devices that we have started thinking that, okay. If you think about it from an enterprise point of view, then companies, they have all of this tooling around data. They have their data lakes. They have their dare data warehouse as if they have different types of unifiers and parsers and whatnot for actually utilizing this in different product lines. We, as individuals, we have all sorts of different data locked away in different silos, but we don't have anything. We don't have any piping. We don't have any infrastructure.
So that was really where we started looking at. How can we actually get this data out of its silos? How can we get it into a state of flow? Because 1 of the basic hypotheses that we have is if you can actually unlock data and get it to a state of flow, then it actually does increase the overall utility and the overall value as long as it's essentially not static. So that's where we kinda started. And then what this means is that we have all of these sensors. Like, all of the smart devices that we have, they're just glorified sensors at the end of the day. So we need to get the data out, and that's what we started building. So we started building connectors to all of these different types of data silos. So primarily APIs, but then there are different types of more clunky data sources such as file downloads. I think of, like, the Google takeout, which has really rich data, but it's behind a archived file.
We started writing those connectors, and then we started writing all sorts of different types of parsers to get that into a uniform data model. But then we ran across this issue that, okay, we need to put this somewhere. And rather than essentially put it under yet another company's umbrella, in this case, that would have been Profina, then we realized that it's not ours. We don't want to have it. We want the individual themselves to have it. So then what we started doing is we started programmatically creating personal clouds.
So initially, AWS instances in our users' names, and then that meant that we could essentially take the data that we connected to and then map it into their own s 3 bucket in their own AWS account. So that means that it's under their own name. And they if they say that, hey, Marcus. This was a cute project, but I wanna go away, then they could just sever that connection. They can take that cloud and take all the data that that's with them. And they, by the way, keep that that private key. So us as Profina, we would have no way of actually accessing that data to begin with. Alright? So now we're at the stage that we have these connectors, and we have the parsers, and we have a way of unifying some of these devices into a data model. We also have this storage, which is then this personally held cloud instance.
Of course, that cloud is not the s 3 bucket. It's the entire instance. So not only can you store data there, you can actually run apps as well in this distributed cloud network just as a caveat. But then essentially, the third thing that we started thinking, okay, now we got this data. We've got it into a ready to go format more or less. You guys know how much work that is. So that's an ongoing effort, herculean task if you will. It's also open source because it's something that we realized that the over 20, 000 developers that we worked with, they always end up adding their own data sources and their own parsers and and whatnot, different types of kits for how they want to utilize certain type of data sets. But then the third thing that we really needed to do was that okay now what and that was essentially then the framework for how to actually build the apps.
So thinking about what I just kinda laid out, the connectors and then that personal environment for storing the data, then we created essentially this framework where we have a container app in React, which then houses different apps within that 1 app. So, basically, it houses React components within this React app. And that means that, for example, in a desktop environment, we can run these apps in a way that is completely serverless. We can render these apps as remotely rendered dynamic components, and then that also means that they end up connecting to the user's own personal data instance. So in that type of, setup, then the app is delivered without any type of server, and it's delivered without any type of a log, which means that the actual processing ends up on the user's device, and the usage ends up being completely local, so to speak, or local with an air quotes because part of it is going to the user's own cloud.
Well, the processing isn't the processing is on the the device. So those are kind of the 3 elements. And we we talk about, like, a personal data engine or than a data trinity as we're talking about these. On 1 side, you have the switch data. On the second side, you have the the individual's control. And then on the third side, you have essentially this environment for actually building these apps that essentially utilize this framework. But if you think about this from a bigger picture point of view, then ultimately, at the end of the day, what we're building is an open market. So we've got this personal data engine framework for building these apps on personal data that the users get to keep. But at the end of the day, what we're really saying is that this is a framework where anybody can build on personal data that you do not have to be a Internet giant. You don't have to be have 2, 000, 000 users, but you can directly build an application and offer it to end users that find it valuable in a very, very simple way. And this is kind of the thing just kinda tracing back to looking at how we got started in this. Then we really started thinking that, okay, in in 10 years, is it really a future we want where essentially most of these apps are just built or owned by a couple companies?
Or can we really open that that data market? And I think by being able to create a simple way of getting started for developers, like building an app on this framework, it's not a lot of effort because in a way, the users come with their own back ends. So you're just building the front end of the app and then connecting to these data connectors. But in a way if you can really open that up, then I think that that's 1 of the things that we as a team are we are very, very passionate about that. We could not only open up the data market to the extent that anybody can build, but we might actually be able to open up the consumer market as well. You can get different types of applications, not just the ones that, for example, the hardware manufacturers build, but maybe hundreds of applications that are built on the hardware's data. But at the same time, they're built by a community of developers.
So that's a little bit of how we got started. And we've been working on this for a couple of years now. And I would say, overall, it started off as a very high flying vision, but now it's kind of coming down to a very, very practical practical stage where we have a number of developers and companies building all sorts of different types of apps. And now we're essentially just looking at, okay, which are the things that resonate the most and how can we add more of those not just data sources, but also data libraries that just make it
[00:12:01] Unknown:
incredibly easy to get started as a developer kind of coming into building a personal data app. Yeah. Definitely a lot of interesting things to dig into there, and we'll start picking into it as we go through. But 1 of the first things that I think is worth discussing is that there has been a lot of interest, particularly in recent years, with the increased availability of things like wearables and, you know, the cell phones that we all carry with us that track in interesting information and the overall idea of the sort of quantified self being able to use all the data that we're generating and collecting to be able to understand more about ourselves and our daily habits and how we might, you know, live a more fulfilling or healthy lifestyle.
And so there have been a number of projects that have also aimed to do some of the sort of similar things of what you're building with Profina of collecting all the different data sources into a single location, usually on the end user's laptop and then different scripts or visualizations that you might run across all this information. But most of the ones that I've run into are usually intended to be used by somebody who has a fair amount of technical acumen to be able to figure out how to get this all installed, downloaded, integrated, build their own scripts and applications on top of it. And I'm wondering what you see as some of the elements in the sort of quantified self movement and these different projects and applications that was lacking in sort of the overall ecosystem and some of the gaps that you're looking to fill in and some of the sort of user experience improvements that you're looking to provide with what you're building at Perfina? That's a fantastic question, and there's also many layers to that. I think you're right in stating that the quantified self and kind of playing with these data silos, it is quite limited to those that have technical proficiency,
[00:13:45] Unknown:
but it's also not just software. I would say it's also the data that both of those like, you have lots of brilliant software engineers that don't have a data background. And working with something like a 23 and me, what is it like, 11 gigabytes of genome marker data? That's not trivial. Like, how do you get something out of that? That requires almost, you know, a background in working with data or at least a number of years of experience in in working with them. So I think on that side, you're absolutely right. So in a way, I think 1 of the things that's been lacking is that it's been incredibly high friction. Like, if you wanted to build a dashboard to combine, like, your Peloton bike and your smart scale and different things like that, then, I mean, that's a lot of work. Because a lot of the things that we're doing to be honest around the data layer itself, it's not trivial.
We have to do a very silly things like we have to essentially look at how do we standardize the different data sources like you have some wearable that has measurement intervals and 50 milliseconds and 1 that has measurement intervals in 1 second. I mean, those are not the same and you cannot treat them the same. So you have to make some type of an approximation in the process without losing the data integrity. And then we have different things, like, you know, that go into more of the enrichment category. We spend a lot of time adding time stamps to different types of data sets because the time stamps themselves, they might be ununiform or they might be missing or they might not even follow, the right type of a standard.
So a lot of these types of things are incredibly high friction. And then even for somebody that has, you know, the technical capability of actually working with these data sets. I mean, that's a lot of wasted effort, to be honest. So that's something that if we can give that right out of the gate as a toolkit in terms of of course, we also have the raw data so that you can use the unaltered, the completely original source data. But at the same time, if you want to use something that combines, let's say, 15 devices, then that should save you a bunch of time.
But then kinda going beyond that, not every end user that could have utility out of these apps is the builder. And I think that that's a very key distinct, let's say, realization that at the end of the day, there probably are a couple orders of magnitude more potential users than there are builders to these apps. And most of these folks that have built different types of dash which that's myself included, they're not really going to anybody else. We're kinda building them on a personal need basis. And, yes, you might share something via get hub, but that's not really an end user type of a channel. So I think this kind of idea of an end user market is really missing. If you wanted to use, let's say somebody's combined a Aura Smart Ring to measure sleep with their Peloton bike and their smart scale. For example, some of the smart scales, they also measure air quality. And if you have people in your bedroom, then that's a fantastic app already to kinda look at, like, your holistic sleep and what impacts it and so on and so forth. That would be something that I would guess a lot of folks have an interest in, but we haven't really had that type of a channel. And that's part of why we see that this is not just a, let's say, a developer enterprise problem. This is also an end user problem That if the in the current market, if the developers and the enterprises, the builders of these apps quantify itself or not, if they can't get to the end user before they have millions of users, then that's a lot of things that are, you know, left off of a table, so to speak. Because especially in quantify itself, like, there's hundreds of different types of micro use cases, if you will. Like, just kind of looking at different things in terms of, like, that I've I've seen build. It's funny. You start kinda looking at them and you start thinking that, you know, how many of these are universal and how many of them are, like, you know, very, very specific individualized use cases.
But I would argue that most of them are actually rather universal, but they just don't really find their audience because the audience doesn't know that there is a way to actually find these. So that's kind of those 2 aspects. And 1 is that the cost of building has been incredibly high in terms of time and effort overall. And then the second 1 is just the accessibility for these solutions that we don't really have any way of, you know, getting these gadgets and widgets and dashboards to that stage that, you know, that would actually meet the demand. We know that there is such a option for them to find those types of tools for for their daily lives. Maybe as a third thing about quantify itself, it's also been rather primitive, to be honest, if I can say that. I was listening to somebody, I think it was during South by Southwest, talking about when we actually get our personal AIs.
And this is, you know, rather a theory old topic, but it is 1 that I've also been wondering. And it seems like maybe the health and wellness area is the first 1 that actually has that type of a promise that you could go beyond just widgets and dashboards and reactive things to maybe some type of end user agency. So some type of empowerment about, like, let's say that you have a goal of running a marathon in February or maybe you have a big game coming up in 3 weeks, then rather than just looking at, like, you know, how you slept last night and how you're doing today, maybe those devices could optimize your performance for that game in 3 weeks, for example. We don't have anything like that now. And I think that that's certainly missing that these devices, they need to go a little bit beyond where they are going now. But I think it links back to those first 2 things that unless we actually have the market functioning, it's very hard to build those types of things right away because, obviously, you need to train them and you need to test before you can actually get into, let's say, longer term predictive things.
[00:19:38] Unknown:
Continuing on the subject of some of the potential applications for this data, once it is all collected and aggregated and cleaned up, what are some of the
[00:19:48] Unknown:
applications that have been either most compelling or some of the ideas that have been most interesting that you've either seen brought to fruition based on what you're building at Profina or they have come up in conversations with developers and end users of the system that you're building? Just kind of starting back, 1 of the things that we first did is that we just opened the kimono for all of these developers, and we said that, hey, here's a lot of these different connectors and libraries and build what you want. Like, go for it. Just go nuts. And we got lots of really, really interesting things and a lot of data from that process.
So 1 of the things that we noticed first is that there was a lot, especially year and a half ago, probably still true today, There's a lot of interest in social media data, like the data that Facebook has on you and Google and so on and so forth. And then, you know, following a lot of developers and data scientists, they would essentially take those datasets and they would start building applications. But, actually, it ended up in an odd state because turns out there's not a ton of things that you build on them. Because as far as datasets go, I mean, it's like yearbooks, It's like photo books, locations, check ins, and private messages, which is not maybe the most interesting thing to build on or the most, you know, the best data set overall.
So we did a lot of exploration with the community around those types of data sets. We looked at some other things too, like, for example, ride sharing and, like, the gig economy dataset and and so on and so forth. But 1 thing that really started emerging was this notion of, like, what do we as individuals really care about? And it turns out that at least from our community, then it turns out that it's really of our own well-being, our own health, our own stress, our own sleep, our own productivity, and these types of things. Now we kind of frame health and wellness in the broadest possible sense that if you think about yourself as an individual, then what actually impacts your state of mind? What impacts your sleep? It's not just your heart rate. It's, like, everything in your life. It's your stressors. It's the things that you do, the things that you watch, the things that you read, and and so on and so forth. So there's a lot of datasets that go into those. But we did see that, and we have gone deeper into this, that health and wellness overall is something that if you can build an app that helps an individual sleep better, for example, that has incredible value for that 1 individual.
I mean, it doesn't have to have a 1000000 users. If the first individual can just kind of plug in their own data and they can get the spoke recommendations, for example, for how they optimize, how they feel the next day. So these types of things seem to resonate quite a lot. But then for essentially what type of data you plug in to, for example, optimize sleep, then that's something where the developer creativity comes in because somebody built an application that uses Shazam in terms of looking at what is, like, the ambient noise and ambient music that you listen to or that you hear, you don't consciously choose and how that impacts, for example, your stress level as measured by the HRV factor in different types of wearables or than your heart rate, for example.
There's folks that have taken environmental allergens, so things like pollution and noise pollution and different types of things around their neighborhoods, tied them to GPS locations and then time stamps, and then looked at, like, what was your heart rate when you walked around these pollutants, for example, or in these neighborhoods. And based on that, they can tell you that if you're taking an evening walk to de stress and maybe avoid these neighborhoods, maybe go here next time. So it's a lot of things like these that are incredibly personal, but it's interesting kinda just to reflect on that hyper localization in a way.
It is incredibly personal, and it is incredibly individualistic. What actually stresses me or, you know, increases my sense of restfulness, for example, it's not gonna be the same for the next person. But at the same time, the application itself and how you essentially drive those insights, that can be universal. It's just essentially when the individual is powering it directly with their data, then the end experience ends up being very personal and this is actually to your question in a way I think about it as like keys to a car that you're basically taking the keys, which is your data, and then you're just turning the app on with those keys. And then that app can deliver a very personal experience to you, and it can help you essentially in whatever goal you're trying to achieve. And then once you're done, you remove the keys and you remove the data. So there's almost like this separation of apps and data that the app itself there's no reason in our model that the app retained the data because it comes with you, and then you're just plugging it in to different types of apps that bring you value.
At the end of the day, it's no different than any other type of an app. It's just that the core underlying architecture, it's flipped, meaning that you and your data are not going to the app. It's the app that's coming to you and your data. So in a way, whenever you're using the application itself, then it is actually you as an individual using the app or your device using the app. The data is not going anywhere. Whereas the app itself, it would be pushed to you into your device, and then it would just run there. From a user's point of view, hopefully, the app is better, and most likely the, you know, end user doesn't really think about this. But 1 of the things that we have from an end user experience point of view is that if you're given the choice, would you rather send your data to a third party company server, or would you just retain it yourself?
Then our hypothesis that, you know, unless the user is getting some very distinct value, then why won't they just keep the data themselves? It's sort of something of not necessarily even a privacy point, but it's also like value delivery point that if you can actually deliver greater value while retaining your data, which is argument that you can because the individual ends up with the richest possible data set, then there isn't really a scenario for needing to share it, not at least with everybody.
[00:26:05] Unknown:
Struggling with broken pipelines, stale dashboards, missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need to look no further than Monte Carlo, the world's 1st end to end, fully automated data observability platform. In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem with broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL and business intelligence reducing the time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today. Visit dataengineeringpodcast.com/impact today to save your spot at Impact, the Data Observability Summit, a half day virtual event featuring the 1st US chief data scientist, the founder of the data mesh, the creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data.
The first 50 people who RSVP will be entered to win an Oculus Quest technical challenges and some of the systems that you're using to build out the platform at Perfina, I'm wondering if you can give a bit of an overview of the system architecture and some of the particularly complexities that you're facing in dealing with the sort of sourcing and collection of the data and the data modeling to be able to provide a useful sort of unified, at least, core schema that enables developers to be able to join across all these disparate data sources with their own ideas about how things should be modeled and recorded?
[00:27:46] Unknown:
So that's definitely a big task, and it's also not 1 that 1 company can just do. So this is why it's also open source for us, that core data schema. So that developers come in and they can say that, hey. I I really love, you know, the data connector, but I've created this really clever thing that combines fit bit with my air quality. Can I use this air, you know, data connector? And then that's, like, short, you know, just fork it over, and we'll add it in. Because if somebody else finds utility from something that somebody else has built, then fantastic. So that's part of essentially the community effort. Not only is it, you know, more of an actual community, but it's also transparent. That data layer itself, it's not a black box.
But it is not without its challenges. Like, we're using GraphQL there, and we think that GraphQL is a fantastic new tool or new ish tool for interacting with data overall, but the sources themselves and the fact that they are in different types of formats and silos, that doesn't change by virtue of using GraphQL. I think 1 of the challenges that we have is that, you know, some companies are very API first, and they have very great ways of using allows to get into their API endpoints and, you know, retrieving data or allowing the user to retrieve their data. Some of them have, let's say, less structured datasets. And especially for some of these datasets that are coming in as data archives, then they are still a huge issue in the industry overall.
I would also argue that there are probably a matter of time because I read in a study that they cost about $1400 per request to actually process. And I can definitely see that because they're typically very clunky and especially the large companies that deal with them. Then that data may not be the most trivial to retrieve you even for it to begin with. But those are things that we're optimistic for that we might be able to get in more structure around the data and then be able to retrieve it with just in a more dynamic fashion overall. The second thing that we have is around the enrichment of this this layer. I wouldn't say that's necessarily a challenge, but it is a huge opportunity that we see. That, for example, instead of every single developer doing their own own computation for, you know, coming up with a mean heart rate or, you know, the average steps per month or something like that, then just adding these types of preprocessed libraries and different types of things that the developers can just pick out, and then they know essentially the way that they work. And they can kinda have a quick start in terms of building their app so that we can take building the app from days to hours as our vision goes. Because at the end of the day, the way that we look at this is it's, the search problem. It's a marketplace search problem, and there's 2 elements to essentially this really taking off 1 is that the cost of building has to be absolutely minimal And then second 1 is that, essentially, the apps have to be, you know, the right ones, so to speak, the ones that the end users really resonate with.
So those are kind of on the data modeling side. Those are things that we have. There are some other things, like, that we've tested out in terms of being able to do, for example, organize the data into an Athena table on AWS so that you can run different types of SQL types and tax on it and things like these. I wouldn't say that they're necessarily issues or rather more things that how can we make the experience that data scientists and people that have built on data, How can we make that familiar in this environment as well that they can do the things that they're used to doing more or less?
But maybe kind of the biggest thing is really around just thinking about the way that we organize data overall and kind of getting your head wrapped around this. So if you think about, like, the typical, you know, with an air quotes, a typical data science problem or the problem setting, you'll have a big population and then you'll have a very narrow set of data like you'll have a big population and then across that population, you'll be looking at like heart rate data, and send some type of a context. Let's say a 1000000 users and then heart rate data, and then you're filtering it for socioeconomics or age or whatever.
In our model, if that's not the case, in our model, basically, what you're building on is 1 person's data, but that data is incredibly rich. So you have 1 person's data, but you have, like, you know, maybe their heart rate data for 9 years in that sense and measured in like half a second intervals. So you can't approach it in the exact same way or you can within certain type of caveats. But, you know, you have to kind of understand the architecture of this environment first to begin with. Because this is also something that if you start thinking about, like, segmenting or clustering or filtering, then if you have 1 person's data, then you can't filter it by age, for example. Like, that doesn't make any sense. So that's why things like the time stamps and the metadata that we have becomes absolutely critical because you can filter the data, for example, with data that was updated between 1 and 2 PM on Sunday, 14th, or something like that. And then you can look at, okay, during that point in time, that same time stamp, the user was at this GPS location.
These are, you know, their vitals, for example, from the wearables. This is their activity log from their wearables. This is the music that they're listening to at this point. And then based on that, you could already build, amazing experiences. But it's sort of thinking about this in almost like a reverse way that instead of having, like, a huge population with narrow data, you have individual populations with very rich data. Now it's not that black and white, and it's not that flipped because then, of course, we have different libraries where you can build and you can test out and train your models in our sandbox environment, some synthetic libraries and some, you know, stripped actual data that you can utilize. But then when you actually do deploy it, then, I mean, that is the way that the app works. That in order for the user to be able to power it like that piece to a car analogy, then your app is gonna be looking at that user's data as opposed to looking at, you know, a 1000000 users and then segmenting a population and then matching them to that. They're rather gonna be looking at, okay. I have these types of connectors, and then the user comes in and turns it on. And then they populate the experience with, you know, their own data cloud, so to speak.
This is also 1 of those analogies that we like that thinking about this way of architecting the data problem, then what we're basically saying is that instead of having 1 organizational data lake, you end up with personal data lakes that individuals have. And then building on those personal data lakes or data clouds, if you will, then that just requires thinking about the problem slightly differently.
[00:34:40] Unknown:
And continuing on the idea of these being individualized data lakes, 1 of the ongoing challenges in the overall data ecosystem is the data collection and integration aspect and particularly for these applications that are being used to reflect back to the user information about themselves and their habits in ways that they might, you know, augment or improve them, A lot of the thing that they're going to be looking for is ongoing access to the data as it's being generated. And so there's this sort of update cycle and, you know, in some cases, potentially near real time aspects. And, you know, there are some platforms where that's easy because they have APIs. You can just hit the API periodically to update your data. And then there are systems like Google Takeout, which you hit it, and it gives you all of your data for, you know, either all of time that they have it for or for a particular block of of history, but it's very unwieldy.
And so then you either have to try and figure out how do I do incremental updates based on what I already have and say, you know, I only wanna pull this chunk of data, so maybe I can get it in a more timely fashion, or do I just do, you know, the complete batch style of I just take everything and override everything that I already had with this new data, and then it becomes the user's responsibility to go in periodically and say, okay. Give me all my data now so that I can put it over here. And then now I can see all of my data up to whenever they started the export, then stay up to date with the information as it's being generated. And in a way, this stay up to date with the information as it's being generated.
[00:36:22] Unknown:
And in a way, this is all part of the piping. Like, when I talk about data connectors and parsers and, you know, all of these different libraries that we have, then this is exactly it. So for the end user, this has to be incredibly simple, and this has to be taken care of kind of for them because at the end of the day, they wanna use the app. They don't wanna think about their data. Like, yes, some are data nerds like myself and like playing with the data, but we have to realize that most are not. Most just want essentially the value out of the the application. So it has to be incredibly simple. And then for the developer, it just has to work. Like, it just has to be refreshed, and it has to work. So there's a couple ways that we do this. So like you said, the APIs are rather simple. So we sync them either periodically, let's say, once a day depending on the data source, or then we allow the app to trigger syncing them if the data source or then the attribute itself they're looking for is something, you know, can be retrieved rather quick. But then in our connection, it's the app that's pinging the user's endpoint in their own cloud, and then the connector goes off there in their own cloud. The data comes into their own cloud, and then after that, the application can read it. So it's not actually the app getting the data, so to speak, or not the app developer, certainly in that regard.
But yeah. So we essentially do that. We have the entire system for how we update, which things we actually do right, which things we reject. Do we kind of, you know, keep every single batch bulk download, or do we just update the things that that have been updated? So we've solved those kind of piping issues. But at the end of the day, you're also right that some of these things like a Google takeout, then they are things that they're difficult. They're clunky. Google takeout is probably 1 of the better end of data archives that you can get because it comes in a couple hours, and I think it's in different types of CSVs in a folder structure that's rather simple to read. I think some of the streaming platforms, it might take 30 days to get that data. So, I mean, that's like a huge onboarding issue that if you have an app that's built on Netflix data, for example, and the user has to wait 30 days to actually use that, that's a no go. So then, essentially, we have this. I mean, granted, you only have that problem once for 1 app, and then after that, you have it there, and then you can just plug it in into different types of sources. But that's kind of the area where I'm hoping that we see a little bit more structure, and I'm quite optimistic that we will. And then, I mean, at the end of the day, I don't think anybody really likes sending zip files. I think using a structured channel will, anyway, be hopefully everybody's preference going forward.
So we do that, but in a way, if you think about it, then that's mainly an onboarding issue. Once you got the 1 experience and you plugged in 1 dataset, then, I mean, you can use that same dataset for the next thing and the next thing. Because once it's there in this 1 place in our environment, that's the user's own cloud. Then after that, they have the option of using that for whatever utility that they've got, like plugging it into different types of sources. I think 1 of the things that we realize is that this has to be quite experience driven in that sense that we have maybe about 15% or 20% of our community that is very in the data, and I expect that as we get more and more of a community, that percentage is gonna get lower because most of the world's population is into the experiences rather than the data itself.
So we can't really kind of approach it from a data first point of view. We have to approach it from, like you know, there's an app that is helping you sleep better, for example. And then the app tells you that, hey. We're using Profina so that you can manage your data and get value from it. We're gonna need you to connect your Fitbit data or whatever. And then it just essentially triggers a connection. And, you know, if that connection is right away available, if you already have it, then you just click, you know, go and then you just power it. If it's not, then it has to sync in the background. It's a mechanical problem at the end of the day, but this is 1 of those things that we're hoping to solve for all of the developers that they don't need to worry about this, that this is something that we can provide or their user can provide with our tools, and then they can essentially just plug these in with the freshest possible data.
This is 1 of those things also that you mentioned earlier about some of the approaches, especially in quantified self when you're storing stuff on your own machine. That, yes, hardware has developed and then the storage has developed on your machine. But some of these datasets are they're not trivial in size. They're actually quite large. So storing 100 of gigabytes of data on your local machine, let alone your phone, that's just a no go, but that doesn't really work. So that's where essentially this idea of having it in the cloud, that's just our solution to the problem because you need to have it there. You need to have it somewhere, and that seems to be the most feasible place. And then at the same time, you need to have some in your local storage, but not all of it. Depending on what type of apps you're running, you don't need to store store all of your Google timeline history on your local storage. So things like that would be you know, they would just be uneconomical in the long term.
Mobile is even more difficult because it's not just storage. It's also then, for example, like, the data transfers between the phone and then wherever you're sending it to or, you know, sending it from. That then introduces more lag, and then it also introduces costs because you're not always on Wi Fi. Those are kind of some of the things that I see as very much our role. Those are the things that we should be able to solve for the developers. But this is also where that community aspect comes in that developers can tell us that, hey. Like, I have this big data archive that I want the user to bring, but, you know, in the onboarding, I don't need all of it. I can take this small snippet to provide the first experience, and then the user will be more incentivized essentially bringing the whole thing because they've already got some value. So delivering that value for the user as early on as possible, that's absolutely crucial in these apps because then if the user knows that the app is valuable and they wanna use it, then after that, it's a very easy effort for the user to go in and just bring in the datasets that they need because then they know that the value is there. And then after that, once they've done that once, then it's there for them to use. Digging more into the cloud aspect of it, you mentioned that you're creating individual accounts and instances
[00:42:45] Unknown:
and s 3 storage locations for the data for the end user. I'm wondering what are some of the other technologies that you're using for managing the sort of storage and querying and access to the data and also for being able to power these different applications that are getting deployed into the end user's personal cloud? How do you approach the sort of polyglot nature of the developer ecosystem so that you're not constraining the potential of, you know, new experiences
[00:43:15] Unknown:
to people who are familiar or comfortable with a specific language or runtime? I think the reality is that we also have to focus from our side. We have to start somewhere, and then we'll kind of broaden up first. So that's where we focused on React and just using the entire React environment and AWS Lambda to its absolute extremes. So that's how we're able to do essentially the server less rendering of these apps. We can do the apps in apps or, you know, react components and then react app as, like, the apps that end up being deployed for the end user. And I think, yes, we want to expand that over time. That is not just react apps or React that we're using, but then also that people are able to write applications in in different environments. Now mobile is, of course, different.
In the mobile environment. The architecture is also slightly slightly different. But I think some of the other things, like, we've talked about, let's say, these traditional apps that I think 1 of the things that I also wanna highlight is this notion of headless apps. Because if you think about the the user's environment, we have essentially the user's AWS instance and all of the AWS services that they have in that environment because AWS has they have so many things that you can utilize there in that environment that you can write apps for. Then 1 of the things that we've seen actually be quite powerful is this notion of building background apps, The apps that, you know, they don't run-in your mobile and they don't run-in your browser, but they actually run as Lambda's in the user's own cloud. And then they look out, for example, certain type of triggers.
So what that would mean is that you have essentially the connectors running in a cloud and the connectors, which are basically just node scripts and and Lambda's bringing in different types of data from different types of APIs. But then that type of, headless app can essentially monitor something. Just as an example, the most like, a very crude example is looking at, for example, forecast for your home and looking at package deliveries that you get and then saying that, hey, you know, there's a package outside and it's raining. And that would be something that it's telling you this as a text message, for example, or some type of, you know, some type of notification that you get through some medium that you use. And, actually, these types of headless apps, we've only really started exploring now, but we realized that for a lot of these things, like talking about this personal AI, so to speak, then a lot of these things could actually work in this type of an headless environment that it's almost always working for you.
And it's kind of there in the background. It's looking at changes in the data that you generate as it comes in. So that could be, like, stressors. That could be, for example, like I don't know like, for example, watching your fatigue levels and that kind of informing you about like you know your own body's behavior if you're not noticing it yourself as an example. And I think these types of things could actually be like, when we're talking about that personal AI, they could be like that nascent form of a personal AI that it's effectively something that you run yourself in your own environment, something that's actually running on your data and something that's actually you know, it's effectively running on your behalf for your benefit, so to speak. So it's not running on behalf of you know a large corporation that's optimizing something for you that they're selling, but it's rather essentially just looking out for certain triggers in your own behavior for your own benefit.
It would be, at least in my head, a rather linear track from there to start thinking about, like, how they could become, let's say, more powerful in a way that they could, let's say, understand some of your goals, for example, in 3 weeks, 3 months, or whatever. And they could tell you about conducive behavior or different changes that impact your ability to reach that type of goal. And now I'm not talking about like saving money for retirement or anything like that. But thinking about, for example, like say that you have some type of I don't know corporate retreat that you wanna be well rested for, or you have some exam that you wanna be well rested for, or you have a game that you want to train for and optimize your fitness for, you know, just an algorithm in your own environment thus looking out after different types of changes and trying to optimize them. I think that's pretty cool. And that's 1 of those things that we're really starting to kinda get into that. We're opening that as a possibility for the developers as well that it's not just essentially the user installing apps, but you could actually deploy some of these Lambda apps effectively or Lambda scripts into the user's environment, and then they can have them run there without effectively seeing them.
But in this environment overall, if you start thinking about it that you have all of those tools that AWS has, then, of course, we're not automatically, you know, taking everything to into use, but we could. Like, AWS has really, really cool OCR tools in their clouds. So you could use OCR for, for example, scanning certain type of pictures of the user uploads or you know whatever you might actually come up with doing. I actually think we're already doing that or tagging them. But just thinking about this, like, the technological environment that AWS already provides out of the box and every single instance that they create. That's all there, and that's all that you know, if you have a use case for those, we can activate them. We're not necessarily gonna activate them for every single user if, you know, they're not using those apps, but, you know, that's something that we can do.
And then, I mean, that's, like, the environment that some of these apps run-in, or then they run-in that React environment for now. Yes. Later, we'll have to expand from React to other things. But for starting out, I think React is also something that there's quite a lot of experimentation in that ecosystem. And I think there's experimentation also in the GraphQL ecosystem. So those are both things that are growing quite nicely and developing quite a lot. So I'm not seeing a huge limitation there, you know, at this stage. But at the same time, certainly, we want to make that more open. I should also mention that at the end of the day, we're not a data storage company, that we're using AWS, you know, as the environment because it's the best in class for here. But we are also effectively testing out a couple of things that are around blockchain in terms of distributed data storage.
They don't have the entire AWS infrastructure there, but that would be essentially the user's choice in terms of where they want their data to be. And then certainly in the future, it could be not just AWS, but it could be, you know, somebody who likes to use Google Cloud or something else, and that would be, at the end of the day, the end user's choice. But for now, AWS seems to be working fine as well. But those are things that they're very easy to see on the road map that we want to give more optionality for the end users as well as the developers if they have certain requirements for their use cases. I mean that's where we're listening to the community, and we're kinda working together with them to just find the right balance for whatever they're building.
[00:50:05] Unknown:
In terms of the sort of user education and the just general sort of raising awareness of the project that you're building, what are some of the sort of cognitive barriers that you've had to overcome as you're trying to help people understand the utility of what you're building and the potential for what it can be used for ultimately, and then also some of the threats that you see to the model of being able to access all these different data sources given the amount of value that is captured by these larger tech companies by holding all of the users' data? I think the biggest thing is really just making an awareness of the users' data to begin with with the end users.
[00:50:44] Unknown:
Because I think it's only in the last couple years that we started realizing how much data we actually have. Being able to empower individual users that this is actually your data. It's not somebody else's. It's yours. And you have access to this, and you have the right to it so that you can actually get it and you can and should use it. You know, everybody else is using it too, so why not yourself as well? That's 1 thing that, you know, just raising the awareness. Now granted this is rather hidden in the UX of our product that, you know, all of this is rather managed for you.
But it is something that we see as a huge opportunity in terms of really being able to empower the individual with their data, that they understand that this is now a model where it's actually them utilizing their data in a way that they want to. That it's not something that's happening to them, but it's something that they're rather proactive with. As far as the the broader ecosystem, then I think this is 1 of those macro things that's going in 1 direction, and it's not linked to 1 company or, you know, 1 industry or anything like that. It's already kind of going in a very, very clear model where it's not just exploitation of all data at all costs, that that's not the prevailing model anymore, but there are risks associated to utilizing data. There are data breaches. There are different sanctions for unauthorized use of data. So we're already going into a more opt in type of a framework.
I'm originally European, and I traveled to Europe every summer, and I oftentimes make this half joke that going to Europe with GDPR means that, you know, the Internet sucks because you have so many pop ups and things that you kind of answer. And, you know, I don't have a law degree, so I have no idea how to answer those. And granted the lawyers themselves, they don't have any better idea either. So it's sort of thinking about frameworks like GDPR and CCPA, which we have here in California, that like, those types of cookie notifications, that can't really be the end goal. Like, you know, we have to be better than that. So it's kinda just thinking the direction that what actually comes beyond GDPR. Like, what is the next step?
And in a way, we're also seeing that processing at the edge. Like, for example, the ebbs and flows of Google's flock project and everything there. And I'm not an advertising guy. So, I mean, for me, it is far more interesting not just to utilize advertising use cases at the edge, but, you know, actually utilizes entire apps and AIs and so on and so forth in in a way that we're trying to do. So in a way, I think that we have this direction that's going in 1 direction already. And then, essentially, if you flip it on the inverse and say that, you know, maybe in the future, it's not just 3 companies that all all of the world's data. Maybe it's 1.
And that just seems ridiculous, and it seems like completely an unrealistic assumption that we would go into more data centralization or data monopolies. That's not a trivial conversation either, by the way, the the concept of data monopolies because data as a business is anyway a rather new thing from a legal point of view. So that's why there's so much back and forth in the discussions around, like, you know, who actually is a data monopoly and and, you know, should something happen there. But I think the direction is very clear. We have some of these use cases that are billing built by some of our larger clients and larger partners, and they oftentimes they cannot themselves hoard more data like if they would their regulators would come out after them and they would have incredible backlash for like you know, starting to collect users' heart rates and things like that. So they see that, you know, allowing their customers to actually utilize their data themself for some type of value that may be linked to the company themself, then that's an interesting alternative.
For example, if you take an insurance company, let's take a life insurance company, then all of the wearable data that their underwritten populations have. I mean, obviously, that would be a fantastical use for them, but they can't use it because it's an ethical and legal, you know, minefield. But then this notion is that could they essentially empower their individuals to live healthier lives so that essentially their underwriting would get cheaper because now their population is healthier. And that's something that if the insurer themselves isn't actually collecting or utilizing that data in any way, if it's their population, then models like that tend to hold a lot of promise because if it's the end user's choice and it is their decision what they do based on essentially how they use their own data and there's no sharing, Then I think that that's something of a viable alternative in many of these use cases.
Now I'm a huge believer in an open ecosystem, so I think 1 of the things that I kinda look at is just the proliferation of open source technologies and just developers overall. If you can empower developers to build better experience for then their end users, then that can become like a free market in itself. So I'm not so focused on what the data giants themselves are doing. I'm much more interested in what do these developers do if you give them effectively all of this infrastructure on a silver platter and just say that, you know, build whatever your users find valuable or build whatever makes your users happy, then I think that that's 1 of those things that we'll see a ton of really cool innovation and use cases from.
[00:56:18] Unknown:
And then in terms of the ways that the Perfina platform There's so many because just allowing that free range,
[00:56:32] Unknown:
There's so many because just allowing that free rein from the developers. I mentioned that example, like ambient music and looking at how that relates to your heart rate and your stressors. Somebody has done the reverse in terms of using your heart rate to build Spotify playlist, for example, or to build movie recommendations. Speaking of movies, you could, for example, do, like, movie engineering based on heart rate data, which is both a terrifying and super interesting thought, like creating horror movies based on the reactions in your target audience's heart rate. These are things that relate to IoT in your home. Like, for example, here in California, we have these horrible wildfires.
So when the air quality, you know, goes down or air quality deteriorates, then you can get different types of advice from your devices saying that, hey. You typically do an afternoon run, but this today, it looks like it's gonna be pretty bad. So why don't you, you know, time it a little bit earlier? And then there's some things that kinda come from more enterprise use cases. It's like there's a company that's doing a driver fatigue type of an app, which is a headless app that just measures based on heart rate data. You know, is the driver fit to drive or not? And then it just gives them the test. You know, on a scale from 1 to 10, then you are a 7. And 7 means that you're okay, but, you know, be slightly careful. So it's sort of like a readiness score. Whereas, you know, 1 or 2 would say that, you know, hey. Take a nap before you go or something like that. But I think some of the interesting characteristics of the use cases are really those where there is some type of a broader, let's say, benefit, like, for example, driver safety in this example.
But then in order to do that, then the actual usage of data should be at the end user side. Like, it should be the end user actually getting the information, getting the responsibility, and also making the decision what they do based on this information. I think some of those, let's say, edge use cases or really kind of processing at the edge nodes, I think that they're fantastic because not just from a technical point of view, but also from, like, let's say, a business practice point of view that in a way, what you're doing is well, you are equipping the end user to essentially take the next best action based on certain type of information, but you're also accepting that you're not gonna see it, that it's gonna be the end user's choice, and it's gonna be their responsibility for whatever they do with that information.
But, to be fair, I I look at this as a very early marketplace. But if we can kind of really give developers the ability to build anything they want, then we are in the early innings. And then I think the the realization has to be that these first apps, they're just examples, and we'll kind of get to, you know, cooler things in the second and third and the fourth iterations. And that's where, you know, just making it super easy to build comes in that it just has to be something that's seamless to get started, but also then really the kind of seamlessly iterate on based on that real world data that the users bring, not just from their devices, but, also, obviously, from their own usage. Here's actually a fun data thing here that when you think about data, then we've been talking about like the data sources of the data connectors, but there's actually a lot of runtime data and apps as well. And that's something that you can actually use because we store that or in our model, the user stores that runtime data from the apps that they use also in their own cloud. So they end up with different types of usage data that they've got. And this is something where, obviously, we don't have a lot of data yet in terms of how this is utilized, but this is something that we see as also quite unique to our model that in a way, it's not just your apps runtime data that's there. It's all of the apps that they use. So that also democratizes the apps, you know, built in data that they have and how that could be utilized.
[01:00:24] Unknown:
And in your experience of building out the Perfina platform and the business, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process? I think 1 of the lessons that we learned very early on, we were very cautious about how users perceive
[01:00:39] Unknown:
the notion of utilizing their data and all these, you know, intricate discussions around privacy and security and all of this. I think the realization is that end users are incredibly utilitarian, that if they get value, they will just click click, you know, until they kind of get whatever they're after in good and bad. I mean, you wish that they would be maybe a little bit more cautious with some use cases that they've got. But I think at the end of the day, that's also an opportunity for us and also in a way our role that we just have to make those defaults and those best practices as user friendly and beneficial to the end users as possible so that they can behave in that type of a fashion.
I think that's 1 thing. But maybe the second thing is that 1 of the greatest realizations that we've seen is that actually there isn't really a problem around access to data. Like, there's so much data that individuals have and getting it, it's actually not that difficult. Like, yes, there's formatting and so on and so forth. But it's really kind of getting it to the right form and kinda getting it to the way that it can actually be built on right away. This is not, like, for every data engineer. This is not news. Like, you know, actually cleaning the data and getting it to a usable format, that's incredibly tedious, and it's needed in everything.
But it's something that just thinking about, like, how much time we, as Profina, spend on that. And then just thinking that, you know, every single developer and engineer, they spend their own time doing that if they're writing their own quantities of scale in terms of how much friction there actually is, then it was the core hypothesis that we have, but it's just incredible seeing that, you know, it is a nontrivial challenge actually getting this data into the right type of format. Those are a couple of things that come to mind right off the bat.
[01:02:26] Unknown:
And so for people who are interested in being able to make use of their own data that they're generating,
[01:02:33] Unknown:
what are the cases where Perfina is the wrong choice and they might be better suited by building it themselves or using 1 of these open source projects and running it on their laptop? Like, if it's only a use case for you, like, if it's something that nobody else has any interest, you know, you're completely certain that it's just something that, you know, you care about, then, you know, why would you build it on something? Like, I could arguably say that it might be quicker building up for you now anyway even if just for you. But if something like that that you want to have it on your phone laptop and you don't want to utilize any of these other elements of this larger infrastructure that we built, then maybe there, it doesn't really matter. 1 of the things that we see in the market is this entire discussion around data monetization, like, how you could actually sell your data and make money. This is not the right framework for that. And, generally, that's not really something that we believe in because in a way, the intrinsic value of using applications on your data, our argument is it's gonna be orders of magnitude greater than actually the monetary value of selling your data.
Because we've seen that selling your data aspect and that discussion, it's been cyclical in the data market over the last couple decades, and it's just incredibly complicated. Like, how do you value your data? And then all of the models that we've seen, the end user experience just isn't that impressive because if you're actually selling your data or you're getting paid to watch ads, then you're making something like $15 a month that might even be overreaching. It might be less than that just to do something that you inherently don't want to do And to be honest, that's not really a feature that we think is sustainable or even interesting for the end users. Whereas if you can get an application that actually helps your everyday life, it might be a free application. It might even be an application that you end up paying for, but the value should be far greater than just essentially trying to monetize your data in some transactional format.
[01:04:30] Unknown:
And as you continue to build out the Perfina platform and bring on more developers and more end users, what are some of the things that you have planned for the near to medium term or any projects that you're particularly excited for? So what we're doing now is we're really trying to create as many really cool examples for the developers, like example apps, example widgets, different example
[01:04:53] Unknown:
libraries, datasets, utilities, you know, just all of the above so that you can come in and then you can just have, like, you know, a to zed, different types of things that you can start playing with right away. There's a data playground where you can interact with the data model directly. You can query it. You can, check how you can combine different types of objects together. And we have the app market, which is where you essentially find different types of apps as the end user. And then we have the comparative for the data developer. So we basically have this data directory where you as a developer can go in, and you can look at the different objects that are out there and the different types of attributes within those objects. You can look at different types of libraries, and then you can essentially have those combined in different types of ways as, like, a suggested brainstorming session for, like, what could you do with this dataset, for example?
So those are things that we're quite excited about actually just making, as for example, that latter 1 live. It's not live yet, but it's something that we wanna kind of boost that creativity even further. And I would say that just going deeper into the examples and just having more of those showcases visible for the developers, particularly, then I'm really looking forward to that creativity coming in. You know? For example, I might build an app that creates, like, a playlist based on GPS data points, And then the developer might come in and say, oh, that's silly. I'll do 1 better, and it'll create a better 1. And I think that that's, like, the most interesting nature that you start having. You know, you start having this open market where these, you know, builders, they essentially compete for the best value for the end user. And that's really what we want at the end of the day. But if the end users, they choose some of these applications for the biggest utility, and then that's where, ultimately, entire ecosystem wins.
We also have some developer competitions that we're doing in the near term. I think we'll announce the dates soon for them, and they're they're it's really about taking this open consumer data market and putting it on steroids. So just giving the the tools for all the developers and saying that, hey. You have a couple days, build anything that you like. And then the winner that has the most users interested and signed up, and then they get x, y, and z. And I think that's 1 of those things that I'm really interested in seeing because just looking at what's come out of previous hackathons and developer competitions that I've seen, then that's something that always ends up amazing. So those are a couple of things right off the bat, but what I'm most excited in is is really what the developers will do. Like, just giving them more of that power to build and then just seeing what they build and being able to showcase that, then that's the biggest priority that we've got.
[01:07:25] Unknown:
Are there any other aspects of the Perfina platform and the technology that you're building there and the overall ecosystem for applications driven by users' personal data that we didn't discuss yet that you'd like to cover before we close out the show? Yeah. I think we probably covered quite a bit, so I'm not really worried about that.
[01:07:43] Unknown:
Maybe just as a note and that we have a bunch of different things on our website at perfina.com, and then, you know, you can find our Slack from there. And you can ping me directly if there's something really cool that you're thinking about building. I think 1 of the interesting things, if you haven't already played around with the datasets that you've got, we have a public page called your data on the Profina website, and it's really intended as, like, a quick start place where without any commitments to our projects and without even, you know, having to register, you can just essentially download some of the datasets that you've got yourself in different accounts. We've made it intentionally very cross platform, So there's all sorts of different things. So you can download, like, your Spotify data, your Peloton data, your Fitbit data, and then your Netflix data and so on and so forth. So I would encourage everybody to just think about, like, what datasets you actually have or what you could build on that. And if you come across something super cool, then, you know, our job is to help you make it happen.
So that's something where I'm always interested in discussions around, like, hey. Could we do this? Or, you know, especially if you think it's impossible, then, you know, that gives you 3 extra points because impossible things are always fun to try and see if we can actually work something out. But beyond that, I think the biggest thing for me is really that, you know, our job is really to take care of the piping and allow anybody to build cool applications, and that's where we need the community's help. Like, what are some of the coolest applications that, for whatever reason, we haven't seen yet in the market, and how can we actually build those? Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think it's just data sources. We we don't have all the data sources yet. That's something that, you know, that's just gonna take time. The good news is that we have contributor guidelines, and this is open source.
So if you look at our data model and you're like, you know, this is really cool, but I've got this device or I got this data source that I wanna use, then there's contributor guidelines, and you can just send us whatever type of things that you want to use. And then odds are we're gonna add those because at the end of the day, you know, we want to cover health and wellness in the boat most prod sense possible, and that will include all the devices that individuals use, not just the 1 that we're kind of prioritizing. Because as a small team, we're kinda going in a systematic yet sometimes eccentric order priority, But we do want to make sure that we have that critical mass in the data layer that it's not just essentially x amount amount of data sources, but there are things that, you know, you can start any type of project from there. So that would be be probably the biggest gap that we've got. It's an addressable gap, but it also 1 that is very collaborative because at the end of the day, if there is higher demand for some type of data sources than others, then we're happy to add those because at the end of the day, we want that data to flow, not just have a connector that exists, but nobody's using it. Thank you very much for taking the time today to join me and share the work that you're doing at Perfina. It's definitely a very interesting platform and an interesting approach to a problem that has been
[01:10:54] Unknown:
present and attempted to have been solved in a number of different ways, but I'm definitely excited to see what you're building and the applications that are being developed on the system that you're providing. So appreciate all of the time and effort you're putting into making people's data usable and valuable for themselves. So definitely appreciate that, and I hope you enjoy the rest of your day. Yeah. Thank you, Tobias, for having me, and thank you guys overall. So this is great. And, yeah, thank you so much for the chat. Listening. Don't forget to check out our other show, podcast.init atpythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used.
And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave review on Itunes and tell your
[01:11:56] Unknown:
friends and coworkers.
Introduction and Sponsor Messages
Interview with Marcus Lampinen Begins
Marcus Lampinen's Background and Journey
Overview of Profina and Its Goals
Challenges in Quantified Self Movement
Interesting Applications of Personal Data
System Architecture and Data Modeling
Data Collection and Integration Challenges
Technologies and Developer Ecosystem
User Education and Market Dynamics
Potential Applications and Use Cases
Lessons Learned and Challenges
When Profina Might Not Be the Right Choice
Future Plans and Exciting Projects
Closing Remarks and Contact Information