A Practical Introduction To Graph Data Applications - Episode 144

Summary

Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. In this episode Denise Gosnell and Matthias Broecheler discuss their recent book, the Practitioner’s Guide To Graph Data, including the fundamental principles that you need to know about graph structures, the current state of graph support in database engines, tooling, and query languages, as well as useful tips on potential pitfalls when putting them into production. This was an informative and enlightening conversation with two experts on graph data applications that will help you start on the right track in your own projects.

Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $100 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!


Datadog LogoDatadog is a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog delivers complete visibility into the performance of modern applications in one place through its fully unified platform—which improves cross-team collaboration, accelerates development cycles, and reduces operational and development costs.

Try Datadog in your environment with a free 14-day trial—and get a complimentary t-shirt if you install the agent. Go to datadog.com/dataengineeringpodcast to get started!

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt.
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host is Tobias Macey and today I’m interviewing Denise Gosnell and Matthias Broecheler about the recently published practitioner’s guide to graph data

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by explaining what your goals are for the Practitioner’s Guide To Graph Data?
    • What was your motivation for writing a book to address this topic?
  • What do you see as the driving force behind the growing popularity of graph technologies in recent years?
  • What are some of the common use cases/applications of graph data and graph traversal algorithms?
    • What are the core elements of graph thinking that data teams need to be aware of to be effective in identifying those cases in their existing systems?
  • What are the fundamental principles of graph technologies that data engineers should be familiar with?
    • What are the core modeling principles that they need to know for designing schemas in a graph database?
  • Beyond databases, what are some of the other components of the data stack that can or should handle graphs natively?
  • Do you typically use a graph database as the primary or complementary data store?
  • What are some of the common challenges that you see when bringing graph applications to production?
  • What have you found to be some of the common points of confusion or error prone aspects of implementing and maintaining graph oriented applications?
  • When it comes to the specific technologies of different graph databases, what are some of the edge cases/variances in the interfaces or modeling capabilities that they present?
    • How does the variation in query languages impact the overall adoption of these technologies?
      • What are your thoughts on the recent standardization of GSQL as an ANSI specification?
  • What are some of the scaling challenges that exist for graph data engines?
  • What are the ongoing developments/improvements/trends in graph technology that you are most excited about?
    • What are some of the shortcomings in existing technology/ecosystem for graph applications that you would like to see addressed?
  • What are some of the cases where a graph is the wrong abstraction for a data project?
  • What are some of the other resources that you recommend for anyone who wants to learn more about the various aspects of graph data?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...

Liked it? Take a second to support the Data Engineering Podcast on Patreon!