Delivering Your Personal Data Cloud With Prifina - Episode 225

Summary

The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience. The team at Prifina have built a platform that allows users to create their own personal data cloud and install applications built by developers that power useful experiences while keeping you in full control. In this episode Markus Lampinen shares the goals and vision of the company, the technical aspects of making it a reality, and the future vision for how services can be designed to respect user’s privacy while still providing compelling experiences.

Monte Carlo Logo

Struggling with broken pipelines? Stale dashboards? Missing data?

If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform!

In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’ monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today!

Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!

 


Atlan LogoHave you ever woken up to a crisis because a number on a dashboard is broken and no one knows why? Or sent out frustrating slack messages trying to find the right data set? Or tried to understand what a column name means?

Our friends at Atlan started out as a data team themselves and faced all this collaboration chaos themselves, and started building Atlan as an internal tool for themselves. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more.

Go to dataengineeringpodcast.com/atlan and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription.


Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $100 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!


Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
  • Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!
  • Your host is Tobias Macey and today I’m interviewing Markus Lampinen about Prifina, a platform for building applications powered by personal data that is under the user’s control

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what Prifina is and the story behind it?
    • What are the primary goals of Prifina?
  • There has been a lof of interest in the "quantified self" and different projects (many that are open source) which aim to aggregate all of a user’s data into a single system for analysis and integration. What was lacking in the ecosystem that makes Prifina necessary/valuable?
  • What are some of the personalized applications for this data that have been most compelling or that users are most interested in?
  • What are the sources of complexity that you are facing when managing access/privacy of user’s data?
  • Can you describe the architecture of the platform that you are building?
    • What are the technological/social/economic underpinnings that are necessary to make a platform like Prifina possible?
    • What are the assumptions that you had when you first became involved in the project which have been challenged or invalidated as you worked through the implementation and began engaging with users and developers?
  • How do you approach schema definition/management for developers to have a stable implementation target?
    • How has that schema evolved as you introduced new data sources?
  • What are the barriers that you and your users have to deal with when obtaining copies of their data for use with Prifina?
  • What are the potential threats that you anticipate for users gaining and maintaining control of their own data?
    • What are the untapped opportunities?
  • What are the topics where you have had to invest the most in user education?
  • What are the most interesting, innovative, or unexpected ways that you have seen Prifina used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Prifina?
  • When is Prifina the wrong choice?
  • What do you have planned for the future of Prifina?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Liked it? Take a second to support the Data Engineering Podcast on Patreon!