Summary
The proliferation of sensors and GPS devices has dramatically increased the number of applications for spatial data, and the need for scalable geospatial analytics. In order to reduce the friction involved in aggregating disparate data sets that share geographic similarities the Unfolded team built a platform that supports working across raster, vector, and tabular data in a single system. In this episode Isaac Brodsky explains how the Unfolded platform is architected, their experience joining the team at Foursquare, and how you can start using it for analyzing your spatial data today.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
- Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
- Unstruk is the DataOps platform for your unstructured data. The options for ingesting, organizing, and curating unstructured files are complex, expensive, and bespoke. Unstruk Data is changing that equation with their platform approach to manage your unstructured assets. Built to handle all of your real-world data, from videos and images, to 3d point clouds and geospatial records, to industry specific file formats, Unstruk streamlines your workflow by converting human hours into machine minutes, and automatically alerting you to insights found in your dark data. Unstruk handles data versioning, lineage tracking, duplicate detection, consistency validation, as well as enrichment through sources including machine learning models, 3rd party data, and web APIs. Go to dataengineeringpodcast.com/unstruk today to transform your messy collection of unstructured data files into actionable assets that power your business.
- Your host is Tobias Macey and today I’m interviewing Isaac Brodsky about Foursquare’s Unfolded platform for working with spatial data
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what the Unfolded platform is and the story behind it?
- What are some of the core challenges of working with spatial data?
- What are some of the sources that organizations rely on for collecting or generating those data sets?
- What are the capabilities that the Unfolded platform offers for spatial analytics?
- What use cases are you primarily focused on supporting?
- What (if any) are the datasets or analyses that you are consciously not investing in supporting?
- Can you describe how the Unfolded platform is implemented?
- How have the design and goals shifted or evolved since you started working on Unfolded?
- What are the new constraints or opportunities that are available after the merger with Foursquare?
- Can you describe a typical workflow for someone using Unfolded to manage their spatial information and build an analysis on top of it?
- What are some of the data modeling considerations that are necessary when populating a custom data set with Unfolded?
- What are some of the techniques that you needed to build to allow for loading large data sets into a users’s browser while maintaining sufficient performance?
- What are the most interesting, innovative, or unexpected ways that you have seen Unfolded used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Unfolded?
- When is Unfolded the wrong choice?
- What do you have planned for the future of Unfolded?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Unfolded Platform
- H3 Hexagonal Map Tiles Library
- Carto
- Mapbox
- Open Street Map
- Raster Files
- Hex Tiles
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Unstruk: ![Unstruck Data](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/J3_WeYmj.png) Unstruk Data offers an API-driven solution to simplify the process of transforming unstructured data files into actionable intelligence about real-world assets without writing a line of code – putting insights generated from this data at enterprise teams’ fingertips. The company was founded in 2021 by Kirk Marple after his tenure as CTO of Kespry. Kirk possesses extensive industry knowledge including over 25 years of experience building and architecting scalable SaaS platforms and applications, prior successful startup exits, and deep unstructured and perception data experience. Unstruk investors include 8VC, Preface Ventures, Valia Ventures, Shell Ventures and Stage Venture Partners. Go to [dataengineeringpodcast.com/unstruk](https://www.dataengineeringpodcast.com/unstruk) today to transform your messy collection of unstructured data files into actionable assets that power your business!
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans could focus on delivering real value. Go to data engineering podcast.com / atlan today, that's a t l a n, to learn more about how Atlan's active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork, and Unilever achieve extraordinary things with metadata.
When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to data engineering pod cast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Isaac Brodsky about Foursquare's unfolded platform for working with spatial data. So, Isaac, can you start by introducing yourself? Thank you for having me on the podcast today. I'm Isaac Brodsky. I'm a software engineer
[00:01:43] Unknown:
with Foursquare. I've been working on Unfolded for a couple of years now. Unfolded started as an independent startup end of 2019, and about a year ago joined into Firstware.
[00:01:55] Unknown:
Do you remember how you first got started working in data? Absolutely.
[00:01:59] Unknown:
For me, that started a little bit out of college. I remember before that, I think that most of data that I had worked with was, like, some small MySQL databases or something like that, so not big data. Out of college, though, I joined Uber and was working on their marketplace data teams. And this was, you know, quite a big shift for me because I went from working on, like, small data problems, smaller applications to data applications that had, of course, you know, very, very large volumes because it's all of the supply and demand indications in the Uber marketplace.
So I started there. I started working on kind of 1 aspect of that system, 1 of the query systems. And over the next, I wanna say, roughly 4, 4 and a half years at Uber, it kind of expanded what I was working on to other parts of the data ecosystem. So I started from query systems and then moving into storage and database systems and moving into kind of a a little bit of a niche area around this system Uber was built in called h 3, which is a spatial indexing system, this hexagonal grid system. That was something that we were subsequently able to open source a few years after that.
[00:03:19] Unknown:
And so you mentioned a little bit about what the unfolded platform is. I'm wondering if you can just give a bit more of the story behind how it came to be and why it is that this is a problem area that you're interested and motivated to continue working in?
[00:03:33] Unknown:
Absolutely. So Unfolded kind of came out of this experience of me and my unfolded cofounders at Uber. We were there working on geospatial data in kind of all different forms, me on the back end and indexing side, couple of my cofounders on the visualization and front end side. I'm 1 of them also on the data science side of things. We spent, you know, several years just thinking, breathing, living, geospatial data, and we had a lot of ideas for where we can take geospatial geospatial data next and how we can continue developing some of the open source software that we had all been involved with at Uber, you know, in the the open source ecosystem. That led us to start in unfolded at the end of 2019 with this idea of we wanna take how we work with geospatial data, how we evolve that open source ecosystem to the next level.
That was really kind of the founding premise, I think, of unfolded. Over the next roughly, year and a half, we built up this platform, which we call unfolded studio or, you know, unfolded, you know, platform, you know, released that publicly and built up a geospatial data tooling. So something where you can bring your data, work with it in our Fold Studio, view it, view it very performantly directly in your browser, and hopefully get some useful insight out of it and make some business decision based off of that data. And we we built up that product over that that year and a half.
Like I said, roughly a year ago, we joined it to Foursquare. And to some extent, that's that's changed things a little bit now that we have kind of this broader geospatial organization around us. But to some extent, it's also kept things a little bit the same because there's still this focus on, obviously, very great focus on geospatial data, and there's been a great opportunity to continue developing with the Unfolded Studio platform.
[00:05:31] Unknown:
In terms of being able to work with geospatial data and build analytical use cases or kind of mapping or visualization use cases around it, what are some of the core challenges that are often presented to people who are trying to engage in that type of information and use it for various purposes?
[00:05:49] Unknown:
Absolutely. I think there's a few really main challenges that I see come up again and again. 1 is just that the scale of geospatial geospatial data can be really significant, and that can be a really big impediment to getting started with it. If you think about, like, you know, I'll take 1 example, which is, like, satellite data, like, Landsat or raster data. It's kind of remote sense, and that's what we call it in geospatial parlance. These datasets can be absolutely massive. They can be on the scale of, you know, terabytes, hundreds of terabytes, something like that. And working with these data, even with big data tools, can be quite a challenge.
And you can think that even kind of basic things that you want to do with this data, like visualizing it, can be challenging. That's kind of an extreme example of it when you're working with, like, satellite data. But even for a lot of companies that have geospatial data in some form, this data can often be a challenge to just visualize and just see in a very basic level what you actually have in your data. It can be a challenge because it's large, or it could be a challenge just because it's not in a form which is really readily viewable on a map or plottable on a map. So an example of that would be a case where a company has, let's say, a customer list with ZIP codes or addresses or something like this. There's some kind of geospatial aspect to it, but, obviously, ZIP codes somehow correspond to a place in the real world.
But at the same time, unless you're really familiar with how to use ZIP codes, it can be a little difficult to actually plot this on a map and actually see, does this data set does it make sense? You know, are are these ZIP codes all in the same state or metro region or something like this? What are the outliers? This is something that unfolded studio does pretty well since we allow you to just click a button, shapeify, and get the geospatial aspect of your data in quite, quite easily. Another area I think that is kind of important to working with geospatial data is understanding where that data came from and what the geospatial implications of it are.
I would say a good example of this is whether that data came from like satellite imagery or a GPS log or from a state, you know, boundary database or something like this. And all these behave a little bit differently. And I think you have to be kind of cognizant, at least a little bit, of of where that data came from and what that means for it. And when I say what that means for it, an example I have is, like, for GPS log data, which we usually think of GPS as just being the location that you're at and, you know, providing is exactly where you are on the earth. And the reality, of course, is that there is always some kind of error that gets attached to that data. Right? If you're in, like, a dense downtown area, there are radio signals bouncing around, And this means that GPS just might not know where you are very well. And I think that's something that you have to be cognizant of working with geospatial data that, you know, for however much you want to understand the data and you understand what's happening in the real world, there's always some limits to that. As far
[00:09:08] Unknown:
as the other interesting element of the geospatial sources that you're working with, a lot of times there's also some time element attached to it that you need to factor into how you're building your analysis or how you're using that data. Because if you have, for instance, satellite imagery, if it's from 30 years ago, it's probably not gonna do you much good if you're trying to figure out where you want to, you know, figure out a building site for right now because there's a good chance that somebody has already taken that building site because your satellite imagery is out of date. But I'm wondering how that kind of chronological and time element factors into some of those relevant geospatial spatial data challenges.
[00:09:45] Unknown:
Absolutely. It's actually a really good point that the the time dimension and the time domain can be a really challenging aspect of it. Like you said, there's always this push for having the most recent and up to date data. I think the example that you gave is really spot on that all of these use cases really did not have an up to date data. There's another aspect of it as well, which is that the time dimension is very challenging when you have things that are varying over time. So if you wanna look at, let's say, demographics or patterns of business transactions or things like this, they're varying over time.
You know, for non geospatial data, I think we have kind of okay or, you know, a little bit more tractable tools for this because you can plot it on a chart and you can kind of see what is that trend over time. For geospatial data, this becomes quite challenging because not only do you have this time dimension, which introduces, you know, a much larger dataset, but you have it in geospatial dimensions. So now you're multiplying it by, you know, perhaps, 2 more dimensions or, in the worst case, even 3 more dimensions. And this just leads to an explosion of the amount of data that you need to work with and that you need to if you're in a a full studio, for example, that we need to load into the browser.
For some cases, this leads to a situation where people have geospatial data. They have geotemporal data. They have this combination of spatial and temporal data, but they can't, like, even visualize it or work with it because they don't have the tools to even bring it into a tool for for viewing that or for analyzing that. And they look at, like, very constrained slices, which I think is a very limiting way of looking at the data and working with the data.
[00:11:33] Unknown:
As far as the existing tooling for working with these types of information, I know that there are other products such as the OpenStreetMap project. There are things like Mapbox and Carto that are very focused on the kind of geospatial analytics use case. I'm curious what the state of the geospatial ecosystem looks like and what was missing that made the unfolded platform a useful and valuable entry to that ecosystem. You know, looking back kind of a few, products, what geospatial
[00:12:08] Unknown:
kind of meant to a lot of people, products, what Geospatial kind of meant to a lot of people was more of this cartographic and, I would say, GIS use case. And this is something where we're we're doing things more like plotting, you know, boundaries, plotting locations on maps, and it's a very large industry, of course. It's a very important 1. Obviously, a lot of utility companies and and governments need to know exactly where things are in the real world. And I think that's kind of what GIS meant to to a lot of people when they heard these terms like geospatial. More recently, though, there's been this kind of bringing this to big data systems.
And Advent Technologies like H3 and and S2 before that have made it possible to have big data systems that contain analytical geospatial and geotemporal data. At the same time, we kind of only had a lot of the big potato tools around that. So we had notebooks, we had Spark, we had kind of all these tools that we were using to process this data, but these are very low level tools, I would say, for working with this. As somebody who wants to get insights out of a geospatial dataset, you would need to go into these low level tools and figure out exactly what your query is, figure out exactly how to get your data out, all these different things, and you're not working with it in a kind of spatial first, visually first way of working with the data.
More recently, there's been the shift, like I said, to the this analytics data. And some of the things that we noticed at the start unfolded was kind of a need to be able to bring in these large analytics datasets into the browser for analysis. We, of course, wanted to do large analytics datasets because these are kind of the datasets that we feel have a lot of tremendous amount of value, but they were really difficult to work with. So this led to the creation of some technologies like hex tiles, which allow us to bring in the appropriate set of analytics data into the browser at the right time and in a format that we can use for analytics.
And it led to the focus of the product as being this browser based application of oldest studio in order to allow for quick iteration on the data, in order to allow for actually visualizing what geospatial data is it that you're working with. As far as the actual H3
[00:14:41] Unknown:
library and the hex tiles format, I'm wondering what are some of the types of metadata and the data modeling elements that go into the information that are contained within 1 of those tiles or cells or however you want to think about this individual unit of information that you're pulling into the browser and exposing for being able to analyze and manipulate?
[00:15:03] Unknown:
So when we think about modeling hexagon data and textile data, it is a little bit of a different way of modeling data than I think we normally think about. We think about, let's say, points or polygons or things like this. We kind of have a little bit of a more immediate sense of what, might represent. When we think about a hexagon, it's a little bit more abstracted way of working with the data. When we talk about having a hexagon worth of data, we're talking about this hexagonal shaped grid area on the earth and associating some attributes with that. So if you look at, for example, the unfolded data catalog that we have of just, you know, freely available datasets that people can bring into their analytics.
Some of them include things like census demographics. It includes, you know, land land use and land cover status and improvement status you know, human improvement status. And it also includes some things like whether there are roads there, and you can also imagine a lot of other datasets that you can bring into this that are things like political boundaries and admin boundaries, which is, you know, which state this is, which country this is, which city this is. It's a little bit of an abstract way of working with the data, though, like I said, because you're associating this property, this attribute.
For example, what is the population of this area with a hexagonal cell? And, when you're doing this, you're saying that in this area, essentially, this is the amount of population that I'm going to attribute to that particular cell. So we're going from this geometry as the census defines it, which are these, you know, roughly block areas into this hexagonal grid. In particular, there's kind of a natural question of, okay, why do we want to bring it into this hexagonal grid? And I think there's kind of a couple reasons that that way of looking at the data, that way of modeling the data, are advantageous. The first is by having it in a common grid system.
It makes it tremendously easier to bring together different data sources. So for example, we can bring together census data, GPS log data, remote sensing information, things like this, all into the same grid system and join based off of that. And that allows us for to have a clearly defined join between these different layers of data and bring together data that comes from very diverse sources, for example, census data and remote sensing data, bring it together in the same analysis. And it's, of course, possible to do that with other methods, but this 1 has this unique advantage of having a common join key and being able to then add on additional layers of information as you see fit. I think another advantage of this approach of projecting things into a common grid system is that you do have this common grid system then.
My background is more in, like, computer science and computer software engineering. And so to me, certainly, it seems really comforting to have this this grid system because I can define how I want to work with my data in terms of this grid system and in terms of this nicely abstracted mathematical construct over the data Rather than worrying about exactly where these points are and does this point relate to this other point, how exactly that happens, I can instead think about it as the grid and how I work with data in a grid form. And so this allows me to step back from some of the really low level details of the data and get more of a high level sense of what's going on in my dataset and what are some of the interesting, you know, analytics and conclusions that I can draw from it.
[00:19:02] Unknown:
Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it's often too late and the damage is done. DataFold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. DataFold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying. You can now know exactly what will change in your database.
DataFold integrates with all major data warehouses as well as frameworks such as airflow and DBT and seamlessly plugs into CI workflows. Visitdataengineeringpodcast.com/datafold today to book a demo with DataFold. And so now that you have the information in this common grid system, you have a way to easily abstract across multiple different sources of data and put them into a common working space to be able to join across them. What are some of the additional capabilities that you built on top of that with the unfolded platform and some of the primary use cases that you were driving towards with the kind of initial build out and now as you continue to progress the product, some of the additional use cases that you're looking to add into it?
[00:20:26] Unknown:
Absolutely. So this is kind of low level infrastructure. It's in kind of foundations that we need to do these really interesting geospatial analytics. Just this capability of unifying your geospatial data together is that foundation. Like you said, now that we have that foundation, we're building out these analytics capabilities on top of it. The basic form of this is you can bring your datasets together and the immediate next step is, okay, now I want to do a good example of this, I think, is something like site selection, which is, tremendously important to the retail marketplace as somebody who's, you know, operating or wants to operate or is, you know, in any way thinking about retail, of course, the location of where that retail establishment is is of really importance.
And working with geospatial data gives you this opportunity to see, you know, is this a good location to open a new store or to invest in 1 of your 1 of your stores? Having this strong foundation of being able to bring different datasets together makes that a stronger analysis because you can bring in, let's say, demographic information along with transaction information or maybe footfall patterns, patterns of traffic in that area. Then on top of that, you can do kind of simple analysis of, okay, maybe I want to weight these different variables in different ways to come up with my score of, you know, whether I wanna select a site or not, then even more advanced reporting and analytics capabilities that you might wanna build on top of that to communicate your results and to, you know, communicate this to other stakeholders in your business.
And, like I said, even think of more advanced analytics use cases that you wanna build on top of this that might be using more more involved or more advanced analytics approaches involving either machine learning or, you know, other advanced, stats methods.
[00:22:24] Unknown:
And as far as the machine learning element, you mentioned that the unfolded platform has a browser focused capability as 1 of the use cases, but I also know that it has a Python SDK available to it. And I'm wondering if you can talk to some of the ways that you can factor those geospatial elements into an existing machine learning workflow that an organization might already have set up.
[00:22:47] Unknown:
So I think there's kind of 2 aspects to that. 1 is, you know, as you mentioned, the browser based workflow, and I think that as of a few years ago, everybody would have heard browser based and would have said, okay, it's kind of less capable, and there's not as many APIs available, and you can't access some of the, you know, cool advancements in hardware acceleration. And that's just completely inverted as of, you know, today or maybe as of a couple years ago because browsers are now able to actually access graphics hardware and graphics acceleration.
So there's been this tremendous proliferation of tooling, which works with browsers and which takes advantage of, modern capabilities. So that's 1 area that, you know, I I don't have something where I can say, like, okay. We we can do this, this ML pin in the browser today, but there's, I think, a very clear path of getting there now. And you mentioned this other aspect of UnfoldEdge as a platform for managing your data, which is, of course, the SDKs and APIs that we build around the unfolded platform. And this is a, you know, important part of unfolded as well because it's 1 thing to just be able to fit your data into a map, but it's another thing to be able to, as you said, manage it and bring it into other workflows in your in your organization or in your use case.
For that, we've built up this set of SDKs and APIs that allow you to connect the unfolded platform to your current data sources so that you can just simply run a query and get your data into a map as well as bringing your data in and out of the unfolded cloud. So if you want to do that via a browser, that's great. But if you want to do that same operation via an API or via an SDK, for example, in, in Python, that's also readily available to you. And this means that the same tool that you're using for visualizing your data and for doing your human driven analytics, you know, the part where you're actually at the mouse and keyboard and looking at the data, you can bring that same workflow into data pipelines or into automated analysis where you're plugging in instead that Python SDK and you're accessing those datasets.
I mentioned accessing, like, text file data of census, for example, and you can also bring that in through through SDK. So you're not just looking at it in a browser, but you can also be looking at that in, for example, in in online ML model where you wanna use that as 1 of your variables. Use that as kind of a for your machine learning use case.
[00:25:31] Unknown:
Now digging more into the unfolded platform, can you talk through some of the overall implementation approach and the architecture that it is built around to be able to power these geospatial use cases and also being able to be performant when you are working in that browser environment?
[00:25:49] Unknown:
Absolutely. So there's a couple of aspects to it. I think that the driving idea that we had when we started and 1 of the things that it really allowed us to scale globally here is around a sort of hybrid front end, back end system. Yeah. And this was really important to recognize the both the strengths as well as the weaknesses of of both of these aspects of working with geospatial data. You know, browsers are tremendously fast and performant of working with data that they can hold in memory, but they do have these weaknesses of not being able to load an extremely large dataset directly into the browser.
This was kind of the the guiding principle, I think, that we started with. We built up from there. So we took the the strengths from browser implementations where we can bring in data and we can work with it very quickly and we can give very quick feedback to a user. We use technologies like WebGL and WebGPU or using the graphics hardware on somebody's laptop or desktop computer, and now even on phones, of course, to accelerate how we visualize this data. But we also pair that with a cloud back end where we're able to work with larger datasets. So an example of that could be somebody who could upload in a dataset to the unfolded platform, but maybe that's too large to comfortably load into into a browser system.
They can upload that to our cloud platform, which is using we have standard cloud services. And we can put it through a process with hex tiling, which is a, essentially data preparation job that runs on our cloud environment and preprocess it into a form where it can be served piece by piece, tile by tile into the browser for analytics. Having this combination of a front end and architecture, I think, was really important to our ability to work with a variety of customer datasets.
[00:27:54] Unknown:
Now that you've merged with the Foursquare team, I'm wondering, what are some of the additional opportunities and new constraints that are provided by being part of that larger organization and some of the ways that that influences the ways that you think about the goals and the specific areas of focus for the platform as it continues to grow and evolve?
[00:28:16] Unknown:
So joining into Core Square has really expanded our ability to work with, I think, some of the large and interesting datasets that we want to use to power these analytics. So we've been able to release more datasets as part of our data catalog, and we're always working on more data, both open data as well as proprietary Foursquare datasets that we can deliver to customers through through that platform. And I think that that's been a great advantage of joining into to Foursquare is because the company has this overall focus on geospatial and geospatial data that goes together just tremendously well with having a geospatial data tool because, of course, you do need to have geospatial data available to you. You asked about constraints, and, you know, obviously, joining into a company always brings its own, you know, new challenges and rewards and just ways of working with people. But at the same time, I really felt over the last year or so that we've been with Foursquare that we've been able to continue developing the unfolded studio product and, you know, continue developing this in a direction that that makes sense to us as as people are excited about geospatial technologies. So I'm really tremendously excited about what we've been developing and releasing over the last year, and to me, I see that as kind of extension of what we were building with unfolded before.
So if you look at, like, our release notes on the unfolded website, you'll see that continuing cadence of release and features and release and updates to our platform. So from that, since I don't see a huge set of new constraints that get added, you know, Foursquare being independent and being geospatial focused already, I think there's a very good match for what unfolded has been trying to build.
[00:30:12] Unknown:
And so for somebody who is interested in being able to build analytics around geospatial information, either assets that they already have or assets that they want to be able to use from your existing data catalog. Can you talk through what the overall workflow would be from, I have an idea. This is something that I want to either figure out, or I want to build a project around this geospatial information through to using the unfolded platform and its various capabilities to actually delivering that in a production context.
[00:30:44] Unknown:
So I would bring in unfolded really on it early on in that journey of understanding your data sets and what you can do with it and productionizing that. The reason for that is that I think it's of really, really high importance to, as early on as possible, see that data on the map and see what is the distribution of the data. As I'm sure, you know, your listener knows, the quality of data and what the actual underlying data is matters tremendously when we're building data products or we're building things around data. And you really, really have to know what is happening in that data, where did that data come from, what does the data represent in order to do anything useful with it.
And I think that getting a good understanding of that really starts with visualizing it and exploring it and understanding what is in that dataset. So I would say that workflow really starts with bringing the data in, just, you know, dragging and dropping it into the Unfold Studio product, and seeing that auto map and seeing what is the distribution of data. And you might learn something surprising at that point even of, like, oh, maybe I don't have data from a certain state or I don't have data from a certain country or something when you thought that you did. Once you've been able to do this kind of exploration phase and understand what's in your dataset, then I think that there's kind of an iteration phase of, you know, trying hypotheses, trying analytics on top of this data set, you know, to the extent possible, seeing what you can learn from it and what are the actual conclusions from this data set. This is kind of this loop of, you know, visualizing, analyzing, trying to derive different statistics from it, see what's what conclusions you actually read from the data. And this can all happen in the Unflow Studio product and by doing things like looking at different aspects of the dataset, projecting different metrics, or projecting different analytics from that dataset.
The next kind of step in this workflow, I think, looks something like determining how to share these results and how to communicate results to stakeholders and to others that you need to present this to or you need to convince. So this could look like reporting or charting or things like this that you add on top of the geospatial form of the data in order to make it clear what is going on in that dataset. And the next stage of that is really sharing it. So that could be if you're working with, like, an open data set or with public data that could be publishing your map or publishing your map to your stakeholders. It could be embedding that map into an application or dashboard, something where you're taking, like, the unfolded studio map or you're working with 1 of the SDKs we mentioned earlier in order to actually put that into your own website or into your own dashboard or something like this so you can see this map in the appropriate storytelling context or in the appropriate reporting context so you can see it alongside kind of the other data points and numbers that help to bring that map into context.
[00:34:00] Unknown:
Once somebody has the information in the unfolded studio, they're analyzing it, they're working with it, what are some of the, maybe, additional libraries or tools that they might want to pull in to bring alongside it to be able to, you know, maybe juxtapose the map that they're building up in the studio with, for instance, you know, a leaflet map or an open street map to be able to say, okay. These are all of the maybe political features that I'm building up from the census information that I have, and I would then want to overlay that on top of the kind kind of real world geographical map that I'm used to looking at or just be some of the other, you know, ways that they might think about working with that data and being able to bring understandable context for themselves and for the people that they're presenting it to.
[00:34:44] Unknown:
What you mentioned, I think you can do in Unfold Studio as it is without going out to other mapping providers, or you can print those data providers into Unfold Studio. So I often talk about, okay. I'm gonna drag and drop a file into unfold studio, and I'm gonna view that on a map, paper him over a bunch of other capabilities in the Studio product to bring datasets in. So I talked about bringing files in. I talked about bringing this data in as this hex tile format, which which we've been working on. But a full studio also has a number of other capabilities for bringing datasets in that could range from data connectors or, you know, datas from your your enterprise data warehouse, let's say, in Snowflake.
It could range from external files that are just being brought into the browser as, you know, the URL. And it could also include data which is coming from sources like vector maps or raster maps that bring in this kind of, you know, natural mapping context to what we think of more as, like, the mapping context around the data. So that could be, for example, the boundaries of states or cities, or it could be from Raster. I'm thinking of something like where where is the land or where is farmland or or things like this to be brought into Unfolded Studio.
So a lot of that could be brought into Unfolded Studio. I I think you have a lot of good examples of different types of data that you'd want to use to put your map into context. I would say there's kind of 1 other aspect to this, which is other tooling that you wanna have available alongside this. And this kind of gets us again, the the SDKs that I mentioned before, for example, the Python STPA, or maybe you want to work with this in a notebook environment so you have both your Studio and visual tools for working with the data, alongside some of the more traditional data science tools that you might also be familiar with, that you'll be working with in more of a notebook environment and more of Python and Pandas or Spark or something like this and making the data available alongside it. If you're familiar with those tools, that can be a really powerful combination.
[00:36:58] Unknown:
Unstruck is the DataOps platform for your unstructured data. The options for ingesting, organizing, and curating unstructured files are complex, expensive, and bespoke. Unstruct Data is changing that equation with their platform approach to manage your unstructured assets. Built to handle all of your real world data from videos and images to 3 d point clouds and to geospatial records to industry specific file formats, unstruc streamlines your workflow by converting human hours into machine minutes and automatically alerting you to insights found in your dark data. Unstruck handles data versioning, lineage tracking, duplicate detection, consistency validation, as well as enrichment through sources including machine learning models, third party data, and web APIs. Go to dataengineeringpodcast.com/unstruck today. That's unstruck, and transform your messy collection of unstructured data files into actionable assets that power your business.
Now going back into to the data modeling question, but from the other side, as you are actually preparing your data for being able to load into unfolded or preparing it for being able to be analyzed in that context and in that environment. What are some of the considerations that go into thinking about what are the attributes that the data needs? What is the structure that it needs to be in? How do I think about, you know, partitioning or segmenting the data to make it performant or making it easy to be able to join across different data sources and ensuring that maybe the chunking is the right size and just some all of those kind of considerations that go into being able to work effectively with that spatial information?
[00:38:37] Unknown:
My view of that is that hopefully you have to make very few of those modeling choices. That's really the thing that I would love to see come out of our development on this product. An example of that is if you have, let's say, customer data, which has associated ZIP codes. And in that case, you have this kind of sort of geospatial aspect, which is your ZIP codes. But unless you have gone and figured out exactly how you map ZIP codes into geospatial areas, which itself is a little shaky, that's kind of its own ramp that I could get into, You're not gonna have those coordinates. And what I would love to see, and we've made some progress in this in Unfold Studio, I'm happy to say, is something where you can just bring that data in, and the product of the system walks you through how to get that from whatever suite you have it in into a state that's visualizable on a map. So for ZIP code, that's something where we can provide the coordinates or boundaries or, you know, geometry to you and, okay, it's just a click of a button, and you've solved that problem. How do I go from my identifier to the actual shape behind it?
So that's, you know, that's kind of my overall goal for all of these things is just that these issues of partitioning or of having the appropriate identifiers go away and are solved by the tool because I think that for a lot of data professionals that are working with this data, that isn't their primary concern. Their primary concern is usually more of a business 1 of what decision can I make from this data and not something of how do I go into these low level details? And, you know, that being said, I do have a lot of opinions of of how that data should be modeled and what are some effective ways of doing that.
1 methodology that I'm really quite optimistic about is 1, using h 3. Obviously, that's a project that I've been involved with for a number of years now, and so it kind of makes sense that I'm gonna kinda say, like, I think this is a good way of working with the data. But I do think that it's a really strong abstraction, and it's a really strong way of working with data to project us into a grid system and do analytics on top of that. So that's kind of the 1 of the guiding principles that we use for how we guide our users along in terms of in terms of getting their data ready to work with on a map and ready to work with that geospatial way.
[00:41:12] Unknown:
This might be a bit of an aside, but in terms of the h 3 library, I know that there are different kind of hierarchies that you can get into as far as what scale of the region you're working with. And I'm curious if there are any considerations or challenges that users need to think about as they're maybe trying to project into different scales of region and then being able to join across those scales where they might need to kind of, you know, resegment the data as they go from, you know, 10 mile scale down to, you know, 1 mile scale, for instance.
[00:41:44] Unknown:
So as we work with this H3 printed data, like you mentioned, it is a hierarchical grid system. So a cell or, you know, hexagonal area at a finer resolution has this logical and some kind of spatial relationship to a cell that is larger, that's at a coarser resolution? Yeah. This is a question that I think comes up quite often because we have this choice of what resolution to put the data at. There's kind of a natural question of, okay, which resolution should I pick? Yeah. I don't think that there is a really hard and fast rule that you can deploy for which resolution is appropriate for your your dataset. I think that you can come up with some good kind of stats ways of doing this and some good data science approaches to how to choose that.
I would say that the best way that I have to explain how to to choose that resolution really depends on the use case of the data that you're working with. So at Uber, we're working with data which is essentially cars moving around cities. That's kind of the general primitive that that Uber has to deal with. And in that case, I think it makes sense that Uber chose to use a resolution which is geared roughly to, like, a city block. This is roughly around resolution 9 in the the h 3 hierarchy. And this allows for a grid which maps really well onto the the business problem that Uber is dealing with in their marketplace.
As I look more at data which is at finer resolutions, which might be more like footfall data or traffic to individual stores or retail locations. Sometimes what I'm looking at is at finer resolutions. So so I'm thinking more like 12, 13, 14, something like that, because this is happening on an even smaller spiel or even finer spiel, I should say, with smaller cells than that Uber use case. This involves something more like somebody walking along the street or crossing the street where which side of the street you're on has a tremendous impact. You think, of course, for a car, which side of the street you're on doesn't have quite so much impact, So you maybe don't need that level of precision or level of resolution in your analysis.
You know, if I have something which on the in the other direction is not about cars or people moving around cities, but it's maybe around movement between cities or between regions, Maybe in that case, even a coarser resolution is gonna make the most sense for your use case. So I think that this kind of use case informed way of looking at which resolution to choose is a, you know, probably the best approach that we have right now to to choosing things. I would get 1 other thing to that, which is that it is okay for that resolution to be a little bit more coarse than you might think. I think that people kind of have a natural tendency to wanna go for something which is really, really high resolution.
So you could, of course, just choose to use what's the highest resolution that's available to you in the library. But as we know with GPS data or whatever data source you're getting it from, there is going to be some kind of error that's introduced into that data. And there's going to be some kind of uncertainty that's introduced into that data. And I think that we do ourselves a disservice if we view this as extremely high precision data when that's not actually the case. Just in terms of data collection, it might not be that high precision. And in terms of data analytics, it might not make sense to look at it on that scale. It might make sense to
[00:45:27] Unknown:
assume that there's a little bit more error and account for that in your analytic rather than trying to pretend that the data is more specific than it really is. Yeah. And for 1 more point in this direction, 1 of the other things that I'm thinking about as far as, like, how to pick the appropriate resolution for your information is when you're dealing with maybe more statistical measures, and you're talking about maybe population density or, you know, number of houses within a given spatial area, and how if you choose a resolution that's too coarse and you, you know, apply all of the information to that region and then you try to then zoom in further, then being able to figure out how do I redistribute that calculation appropriately based on this finer resolution, you're gonna end up losing a lot of precision in that case. And but then there's also still the issue of, like, okay. Well, if I go too fine, then I'm going to spend all of my time just doing data entry and figuring out how to allocate these statistics to these, you know, fine detailed areas and just, you know, figuring out what is the appropriate balance for the kind of level of statistical uncertainty that I'm willing to accept at finer resolutions.
[00:46:31] Unknown:
And in the hex tile system that we're working with, it's not only a h 3 based system and hex system for working with the data, but it's also a hierarchical 1. And I think that that really helps with the use case that you're mentioning where maybe you have demographic information or census information that you wanna join into your analysis, and you might need that at different resolutions. Data, I think, is a really good example of this because there are methods that we can use either for disaggregating the data. So projecting the data at a finer resolution and saying, okay. This is the areas within a census block that have population and that we want to attribute population to, as well as for aggregating that data to coarser cells. So if you want to look at that on a really fine level, we have that data.
And getting access to the the coarser data. It's just a matter of zooming out and having the application load in that coarser form of the data, which is just aggregated up from that raw data or finer data, I should say, and having the choice of which resolution that you want to bring into your analysis.
[00:47:39] Unknown:
As you're talking to people who are first encountering the unfolded studio and unfolded platform. I'm curious if there are any sort of conceptual challenges that they run into as they're trying to figure out how do they think about the problem that they're trying to solve for, how do they understand which features and which capabilities are most useful for their specific application. I'm wondering what are some of the elements of customer education and kind of messaging around the system that you've built that you run into as far as how to help them understand what the capabilities are, how they compose together, how they integrate with the broader ecosystem of their data platform and the the information that they're trying to work with. I think customer education is a really important
[00:48:24] Unknown:
aspect of this, and that's something that I've definitely learned over my experience with with Unfaulted and and even before that at Uber. We are presenting, I think, a new newer way of working with geospatial data, and that takes some getting used to. So we we do have some capabilities which are not, you know, fully oriented around this h 3 grid system, but that's where a lot of our unique capability and really strongest capability comes from. And like you said, that does require education and training resources so that people can understand what that data looks like and understand what they can do with it and understand why that might be a useful way of working with their data. I think that some of the main educational questions we've got, 1 is around just this hexagonal grid system and this way of working with analytics data. And that's something that I usually recommend people play with it a little bit, project some of their data into it, understand what this relationship between the underlying data is and what the grid system is, and understand some of the things that you can do with your data in the grid system in order to get an appreciation for why you wanna go through this kind of modeling problem that we were talking about of understanding how to get your data into the grid. So once you're you have that data in the grid, understanding, okay, I can do this analytic on it. I can do this joint on it. I understand kind of what's on the other side of that that modeling mountain, so to speak, you know, helps give you the impetus for getting through that.
I think the other thing that is customer education matter with Unfold Studio is that there are a lot of companies and there are a lot of use cases that have geospatial data in some form, but they don't necessarily realize that it's either geospatial or geotemporal or that they can work with it in this way. They can do something but they can get useful analytics out of it. So they might already have customer lists or another geospatial dataset list of retail locations, but they don't necessarily see, how can I actually plot that on a map in a So maybe they've done something that's kind of like a vanity project of just what does this look like on a map, and does that look cool enough to to share with some people? But they haven't seen their data presented in, really a geospatial first way, in a way that makes them aware that they can actually do decision making based off that data. Maybe they don't have, kind of, the associated datasets that are needed in terms of transactions or demographics in order to enrich that analysis, or maybe they just haven't taken a look at that data in a geospatial or geotemporal first way.
[00:51:11] Unknown:
As you have been building the unfolded platform and working with end users, what are some of the most interesting or innovative or unexpected ways that you've seen it used? So I think some of the ways that I kind of expected. And, of course, coming from the mobility sector and Uber,
[00:51:27] Unknown:
I always expected mobility and transport and things around people moving around or vehicles moving around. And coming from kind of this business background, you also think about, okay, retail and restaurants and things like that where there's, I think, a very clear business connection between the location and the use case in the business. The things that I think were more surprising to me were less on those business use cases and were more on, like, the humanities uses of this. So 1 that I can think of off the top of my head was with a partner Contour. This was their global health fires map and this was, an early project using our our Hex Tile technology.
And this was both technically challenging because we need to make this map over the entire earth, and it needs to be temporal as well. But it was also really educational because being in California, I'm aware that, okay, wildfires are, you know, are a serious issue, but you don't kind of see that in the context of the entire Earth. You don't see that in the context of what is the the pattern of wildfires in, let's say, in the United States versus somewhere in Asia or Africa or, you know, another part of the world. So this was something that was exciting and kind of different to me because it served as an educational tool. And it served as something where I can just kind of flip that map back and forth and see how are wildfires developing in Africa over the last 12, 13 months.
Another use case that I was kind of found extremely unexpected was around the the COVID pandemic. So it unfolded starting in, like, the the tail end of 2019 was kind of in the right place to see all of the the COVID pandemic and response to it unfold over the next you know, especially over the next year of kind of that initial response. And at the time, you know, I recall thinking very pessimistically of how are retail establishments going to do, how are businesses going to respond to this, and how are people going to respond to it, how are people going to change their habits or their patterns, and which areas are being affected, things like this.
And over the next, you know, year, year and a half, I recall seeing a number of really interesting maps and visualizations come out of that in terms of things like, okay. What are the effects of the COVID pandemic, especially in the United States? What are some of the things that correlate or do not correlate with either effects of COVID or on various measures of things like social isolation and things like this and kind of understanding the impact as well as what things go along with the response to the COVID pandemic. I I thought it was a really unexpected, of course, use case. Again, also a really educational 1 and and 1 that I felt that I learned, you know, some really interesting things from. And
[00:54:23] Unknown:
in your experience of building the platform and growing it and going through acquisition and now continuing to evolve the product, I'm wondering what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:54:35] Unknown:
So going through that process over the last few years has been a really interesting and educational experience. I feel tremendously grateful that I was able to start a company and that I was able to just go through that process. That was my thought process going into it was I've seen part of the industry, part of the engineering career, things like that, over my my prior few years at Uber. But starting a company and building up a product from that really provides a wonderful educational experience in other aspects of professional development and of building a business.
And that was the idea that I went into that with. I was very lucky to be in a situation where I can say, I'm gonna give this a shot. I'm gonna learn whatever I've done from it. I'm going in with the understanding, of course, that, you know, whenever you start a company, there's a significant risk that that company is not gonna work out. And going into it with that mindset and that approach, I think, was really personally enriching. I had the opportunity with Unfulton not only to do engineering, backend engineering, data engineering, these kind of things that I previously been doing, but also to work more on the business operations side of things and just the kind of day to day tests of what's needed to actually keep a business running and just have benefits programs and things like this. A technical audience that can very easily be, like, kind of boring sounding thing, but at the same time, it's really important to understand that how the business actually operates and how the business gets things done.
And coming in from that perspective, I think, to a lot of technical audiences, that's something that should be really exciting and really interesting to understand, you know, because this is understanding how the system works in a different part of the system in addition to, you know, the the part that we're looking at as technologists in terms of a system that we're building. How does the rest of the the organization get things done? What are some of the concerns that they need to deal with in order to have financial reports or benefits or things like this? So this was, I think, 1 of the, to me, just most enriching aspects of it was not technological but was more of kind of the other aspects of running the business.
[00:56:59] Unknown:
And so for anybody who is interested in working with geospatial and geotemporal data, what are the cases where unfolded is the wrong choice and they might be better suited with some open source libraries or a different commercial platform or, you know, maybe going a different direction and deciding that maybe the geospatial aspect of their project isn't actually necessary?
[00:57:20] Unknown:
So a few things that have come to mind. 1 is coming back to what I said about the the history of some of the geospatial industry, I think there are cases where it's not about geospatial analytics. It's not about business analytics, but it's more about what I would call this GIS or cartographic use case. And this is things where it's the apt position of coordinates is of supreme importance. So examples of that could be things like plotting parcels or plotting, you know, utility locations, things like this. These are, of course, really important use cases, but they have a different set of concerns than what we've been talking about today. What What we've been talking about has been much more around business analytics and trying to derive approximate locations that you might wanna open a store.
And whether that store is in, you know, 1 corner or a storefront down from it might have secondary importance to, you know, what is the overall traffic in that in that neighborhood. In these use cases, which are much more cartographic, I think that users are gonna be served better by other tools because it's not a focus of the unfolded platform. It's not kind of the background that we're coming from, and it's not the way that we're approaching the product. Another area that I think, you know, we're we're relatively early on on is when companies have really significant data infrastructure built up, they have a way of working with that data, which is not going to integrate well with a third party tool.
So companies that are extremely invested into their own way of processing the data or their own set of queries really heavily built up, they're gonna have a harder time integrating with a third party vendor and a third party solution. That's not to say that it's not possible to do so. You can still bring unfolded in as an SDK and use it for, let's say, for a visualization or for, you know, part of your analytics workflow. But if that's not the analytics workflow that your data scientists or your data analysts are gonna be comfortable with, then I think you need to be cognizant of which part of that workflow you're bringing it folded into. And if that's something like visualization or kind of reporting, that might work well. If that's something where you're gonna try to move them from a set of tools that they're really familiar with and they're really comfortable with to a new set of tools, that's something that you have to be a little bit more careful with and provide. You know, you have to get a lot more certainty that's going to work out.
[00:59:55] Unknown:
So as you continue to build and grow the unfolded platform, what are some of the things you have planned for the near to medium term or any particular problem areas that you're excited to work with?
[01:00:06] Unknown:
So in the the near to medium term, I think we wanna continue on a few of the themes that we've been developing over the last year or so. 1 is bringing in all the data that somebody needs in order to do their analytics. So this looks like developing additional data connectors and database connectors so that this does slot into the workflows that they have today and so that it can be brought into those workflows. And the relevant data that they have can be brought into the product and so that data can be brought into the product in a way that can actually be used for analytics. So today, what we wanna bring in, like, let's say, a hex tile dataset, we have some catalog ones, and we have some capability for bringing customer datasets into that format. And I'd like to see that develop into more of a feasibility for a lot of the datasets that we work with to be transparently brought in in that format and transparently joined into your analytic.
That access to data is an important 1, and and making the product accessible to more users that don't have that geospatial background, I think, is also a really important 1. So something where you can run-in your data and have it displayed on a map without having that really in-depth understanding of how geospatial data and how geotemporal data should work or, you know, does work, but something something that we handle for you, I think, is an important important line of development that we're working on as well. This is something where I think it's really important for us to continue developing the capabilities for users to bring their data in. We handle a lot of these technical details under the under the hood, essentially, as a product.
So being able to bring in an even wider variety of data, which, again, is maybe not gonna have latitude and longitude points associated to it but might have other kind of implicit geospatial or geotextemporal aspects. Bringing that into the product, I think, is kind of another secondary focus. And then the 3rd focus that I'd mentioned is building out these analytics capabilities even more. So I mentioned some things that we wanted to do a little bit earlier on in terms of more advanced analytics or advanced ML use cases. And I think that there's a tremendous opportunity for us to continue developing specialized analytics for different use cases. So, for example, in in retail or QSR or something like this, where they have, you know, site selection and anomaly detection and things like this, developing these as just packaged analytics that they can run on their their datasets as I think are really important focus area for us going forward.
[01:02:52] Unknown:
Are there any other aspects of the unfolded platform or the use cases for geospatial analytics or the overall ecosystem around that that we didn't discuss yet that you'd like to cover before we close out the show? I think we covered quite a lot of the really interesting
[01:03:08] Unknown:
aspects of that ecosystem that we've been working on. You know, these workflow aspects of how do I get my data in, some of the analytics aspects of, okay. I I have geospatial data. How should I model it? How should I make decisions using that? You know, what kind of methodology? That's h 3 or another methodology for working with that data. And then how do I communicate that out are the systems that we've touched on today? And I think this is kind of the the core loop of of working with geospatial data. So I I think we've we've touched on some of the main points.
[01:03:42] Unknown:
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[01:03:57] Unknown:
I think there's a couple of gaps that I would identify right now. 1 is kind of around data cataloging tooling and platforms for managing datasets that the companies have. This is something we've been working on at Unfolded in terms of our our dashboard product and how we manage datasets that customers are uploading to our system or datasets that customers are working with. This is something that I think is, you know, relatively early on that is seeing continued investment in. But as an industry, we don't have as many kind of standard tools or conical tools that we would slot in as this data catalog and data management applications.
The other area that I think I'm very optimistic to see developments in, right, and really love to see developments is maybe another way of putting it, is around the management of data quality and understanding what is going into a datasets. Is data flowing through our pipelines the way that we expect it to? You know, what are the statistical attributes of datasets. I've seen some interesting developments of this in the last year or so in terms of how people approach exploring datasets and how people look at the the quality of datasets. I really wanna come back to this point that a lot of the the data tool that we're working with as data professionals is really only as good as the quality of that data and the strength of the data that we're putting in it. If we're making data driven decisions, but we're putting in low quality data or data that we don't have confidence in, I find it very difficult to find that we're gonna have confidence in the results.
And so I think a focus on how do we evaluate the quality and how do we track the quality of the datasets that we're working with is of very great importance to us as data professionals.
[01:05:54] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing with the unfolded platform. It's definitely very interesting problem space. It's always great getting to catch up on what's happening in the world of geospatial analytics. So I appreciate all of the time and energy that you and your team are putting into that, And I hope you enjoy the rest of your day. Thank you so much for having me on and really appreciate it and talk with you today.
[01:06:19] Unknown:
Thank you for listening. Don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest on modern data management, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you learned something or tried out a project from the show, then tell us about it. Email hostspythonpodcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Isaac Brodsky: Introduction and Background
The Unfolded Platform: Origins and Development
Challenges in Working with Geospatial Data
Geospatial Data Tools and Ecosystem
Capabilities and Use Cases of the Unfolded Platform
Technical Implementation and Architecture
Impact of Foursquare Acquisition
Workflow for Building Geospatial Analytics
Data Modeling and Preparation for Geospatial Analysis
Customer Education and Conceptual Challenges
Interesting and Unexpected Use Cases
Lessons Learned from Building the Platform
When Unfolded is Not the Right Choice
Future Plans and Focus Areas
Closing Remarks and Contact Information