Knowledge graphs are a data resource that can answer questions beyond the scope of traditional data analytics. By organizing and storing data to emphasize the relationship between entities, we can discover the complex connections between multiple sources of information. In this episode John Maiden talks about how Cherre builds knowledge graphs that provide powerful insights for their customers and the engineering challenges of building a scalable graph. If you’re wondering how to extract additional business value from existing data, this episode will provide a way to expand your data resources.
Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $100 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on great conferences. We have partnered with organizations such as ODSC, and Data Council. Upcoming events include ODSC East which has gone virtual starting April 16th. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host is Tobias Macey and today I’m interviewing John Maiden about how Cherre is building and using a knowledge graph of commercial real estate information
- How did you get involved in the area of data management?
- Can you start by describing what Cherre is and the role that data plays in the business?
- What are the benefits of a knowledge graph for making real estate investment decisions?
- What are the main ways that you and your customers are using the knowledge graph?
- What are some of the challenges that you face in providing a usable interface for end-users to query the graph?
- What technology are you using for storing and processing the graph?
- What challenges do you face in scaling the complexity and analysis of the graph?
- What are the main sources of data for the knowledge graph?
- What are some of the ways that messiness manifests in the data that you are using to populate the graph?
- How are you managing cleaning of the data and how do you identify and process records that can’t be coerced into the desired structure?
- How do you handle missing attributes or extra attributes in a given record?
- How did you approach the process of determining an effective taxonomy for records in the graph?
- What is involved in performing entity extraction on your data?
- What are some of the most interesting or unexpected questions that you have been able to ask and answer with the graph?
- What are some of the most interesting/unexpected/challenging lessons that you have learned in the process of working with this data?
- What are some of the near and medium term improvements that you have planned for your knowledge graph?
- What advice do you have for anyone who is interested in building a knowledge graph of their own?
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email firstname.lastname@example.org) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Commercial Real Estate
- Knowledge Graph
- RDF Triple
- Google BigQuery
- Apache Spark
- Entity Extraction/Named Entity Recognition
- Spark Graph Frames
- Graph Embeddings