Scaling Analysis of Connected Data And Modeling Complex Relationships With The TigerGraph Graph Database

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linode

today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer friendly data catalog for the modern data stack. Open source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga, and others. Acryl Data provides DataHub as an easy to consume SaaS product, which has been adopted by several companies.

Sign up for the SaaS today at data engineering podcast.com/accryl.

That's

acryl.

Your host is Tobias Macy. And today, I'm interviewing John Herc about TigerGraph, a distributed native graph database and how you can use it for your applications. So, John, can you start by introducing yourself?

Yeah. My name is John Herky. I'm a developer evangelist at TigerGraph. I've been here for about 2 years. Do you want my about full background? I was a computer networking engineer in the military for for 8 years. Worked at a big giant company called Optum United Health Group. I was a networking engineer there until I got really bored, copy and paste. There was a lot of configuration changes.

I explored something new which was being an entrepreneur in residence at this healthcare company, was building startups.

And in about August 20 16, we started seeing emerging technologies really disrupting other industries, and we didn't want to really get left behind. So we created a new division. We broke off of the startup incubator

and created a new division solely focused on emerging technologies. So if you think about when the Internet came out, people are like, what's this thing called the Internet? Do we need this thing called a web page? I don't think we need a web page. A web page is a fad. These are the same questions that are getting asked about today's technology, which is what is blockchain? Do we need blockchain? Is blockchain a fad? What is graph? What is AI, computer vision, natural language processing, machine learning, quantum computing, Internet of Things? So we had worked hand in hand with the business units to apply these emerging technologies

in the healthcare space. Then I got recruited to doing a lot of things in the community

around graph since my background was networking engineering.

I went really heavy into graph around 2018

and started to build the world's largest healthcare graph at that time

and started to do different events in the community,

and got a call from Todd. He says, hey. Can you do what you're doing, but do it for us? Interface with developers,

help developers out, activate developers, engage developers,

grow grow a community around you. So

And do you remember how you first got started working in the space of data, and what it is about that that has kept you interested and engaged?

Yeah. So I guess I'm the nontraditional route. You know, I went from networking

to building companies

to ultimately getting into data.

And so I think the thing that's interesting about data is it's an area to explore,

to understand things that might not be

totally obvious when you first explore it. So to me, it's the creativeness

of identifying a problem and then trying to find an answer to that problem.

There are some really interesting things

from my history, and I'm I'm pausing because

my mom got diagnosed with cancer.

You know, I'm a database guru here at Optum. I have wealth of information available to me.

So this was a what can I do as a

engineer of data? What I could do is look at people that are similar to my mother, maybe preconditions,

maybe have

this age of 39, let's just say that, female,

on x y z medication, has had this and that in the past. If I can run something to find similar patients and people like her, then I can identify maybe a better drug treatment path and identify a route to give her

an opportunity at life. And so that's just 1 story, but there's many stories like that. How to improve the healthcare system,

how to improve people's lives.

So that's really what data means to me, is making a difference, making an impact,

finding things that you wouldn't be able to find at scale. That's sort of been my journey.

Sorry to hear about that. Thank you for sharing, and I'm glad that you're able to do something to help in that. Yeah.

And

so that brings us now to what you're doing at TigerGraph.

You mentioned that you ended up there to help grow the community around it, And I'm wondering if you can just describe a bit more about what it is that

the business is building there and some of the types of community

effort that you're doing to help sort of grow the ecosystem around it and engage with developers?

As a developer, when I was working at UnitedHealth Group Optum,

1 of the things that was very hard for me is that there wasn't enough tooling around the ecosystem for

integration. There was no tooling around for

more of the traditional DevOps things as far as building, constructing,

automating, deploying,

your your solutions.

There wasn't syntax highlighters. There wasn't all these different tools that you sort of take for granted when the technology first came out. Again, there also wasn't a community,

so there was the ability to read the docs, which are phenomenal,

but it's hard to maybe troubleshoot small things where you don't want to bother the TigerGraph engineer that developed the product on how to do a certain simple select statement as a developer. That's something you probably don't want to go and ask the core engineer, but you might want to ask the community.

So when I got hired, the first thing I was trying to do was build that foundation, build a place for the community members to go to,

including the forums and a Discord group.

After that, it was a lot about building the foundational assets. So there would be people in our community that are already working on syntax highlighters,

build tools, Gradle, automation, deployments.

So we'd work together in the community and pair up with developers to build some of these core technology integration components to the product themselves.

And so I've actually

had a conversation

on this podcast about 4 years ago with Todd Blaschka, who you mentioned is the person who brought you into TigerGraph.

And we had a very brief conversation about

what the core offering is. And I'm wondering if you can just talk through some of the ways that the project and the product has evolved over those past 4 years.

Yeah. I would say probably since 4 years ago, the community, the ecosystem has grown. The product itself has grown in the sense that it's being challenged by all of our customers at TigerGraph

in different ways that weren't imagined before. So maybe in some cases, they would have instead of a 100 vertices or vertex in their schema, they would define a couple 100. Maybe there instead of just having

1 1 sort of schema, they'd have a multitude of schema changes over time. These sort of edge cases where there's lots of developers integrating with the product, push the product

beyond where it was.

Beyond that, there's a lot of

security things that were put in place. So vertex level act access control, there was the ability to manage user groups to the different graph solutions,

and more recently, they're really focused on scaling, so deploying and scaling

using the auto elastic scaling with Kubernetes.

In that same time period, the overall ecosystem

around

graphs and applications of it and particularly for machine learning use cases has been growing and scaling. And,

wondering how you've seen that ecosystem

change in terms of the

usage and application designs and systems integrations

where graph

problems are being applied and that actually require this core graph engine to be

something that is natively available rather than something that has been

added as an abstraction layer on top of something like a relational database.

And going back to my experience at at Optum. So let's say you have, you know, 40, 60, 000, 000, 000 vertices, and you have

a 100, 000, 000, 000 edges.

So in the health care system,

you have patients, you have claims, you have Rx.

Rx claims, there's a bunch of different type of claims. You have provider's information, you have nurses' notes, you have calls, you have all of these different touch points. So when your data is super complex

and you're trying to find insights in real time, this is where Techgraph really shines. So in 50 milliseconds, we could pull back all of this data together in real time. So if you have a lot of different joins

throughout your solution, you might wanna look at graph databases.

So going back to having complex data that you're trying to retrieve in real time,

there might be insights around your data that don't exist traditionally. So you're looking at the shape of the graph, the relationship between different data elements to derive new features. I guess the simplest example might be

if you have in your data a person named John, and we have another person called Larry,

and we have a relationship called father.

And if you want to infer the relationship of the the father's father, that would be the grandfather. So you can look at the patterns of the relationships to infer different

elements in your graph that you can use then for your machine learning models.

Other things are you can run algorithms, graph data science based algorithms to derive new features, and you can use the

traditional features. You can use the relationships between different data elements, and you can use these derived outputs from these graph based algorithms to then use

to train your models.

As you're working

with end users and potential customers of TigerGraph,

what are some of the points of confusion or misconceptions or misinformation

that you encounter

and some of the types of that you find necessary to help them make best use of the

capabilities that something like TigerGraph offers and some of the ways that they can think about

how to

shift their overall design process to incorporate graph

algorithms

and graph concepts into the way that they're approaching a problem?

Yeah. I suppose there are some design differences when you're designing the graph. So there might be

things that you might want to pull out when you're designing a graph. For example, if we talk about patients again, well, 1, you could store the attribute

of their sex and their patient vertex.

However, if you're gonna be doing a lot of different searches

on that particular attribute, you might want to break that out into its own vertices.

So when you're accessing the graph, let's say you have 50, 000, 000 patients, you don't have to read through every single

patient in the database. What you can do is start with a,

let's say, female, and you could traverse the edges that are associated with that attribute called female,

and you can find the patient. So instead of having to read through

database in a nontraditional way because the

ability

to navigate the edges and get to the related element

is probably not a concept that

is regularly thought about in traditional database design.

RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control.

Their SDKs make event streaming from any app or website easy, and their state of the art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up for free or just get the free t shirt for being a listener of the data engineering podcast atdataengineeringpodcast.com/rudder.

On the website,

the main tagline says that TigerGraph is the only scalable graph database for the enterprise. And I'm wondering if you can just unpack that a bit and explain

what are the pieces that are necessary for a graph database to be suitable for enterprise use and some of the capabilities

that are required as you scale up on usage and

organizational complexity?

The biggest, most important thing as a customer was the ability to do deep link analytics, doing the traversing,

to be able to

find the insights in a relatively quick amount of time.

Also, security is 1 of the most important aspects from an enterprise perspective as well. How can you define security down to not only the graph level,

but the vertex level, and then have different user group securities based off your background, why are you accessing what data.

So I'd say from a security perspective,

TigerGraph has had a major focus in that. But more importantly is the way that the database itself was built. So it was built in c plus plus It was built to be massively parallel processing. It was built

to do computations at the vertex level.

It was built to scale horizontally.

So 1 of the criteria is when we're looking

at the existing graph market back when I was at Optum in UHG

was what can scale beyond just vertically scaling, what can scale horizontally.

So as our data grows, we can we can also grow our cluster.

When you're scaling horizontally, there are a lot of complications that could arise, especially

if your solution wasn't designed to scale horizontally.

For example, you could have many different databases with the data

residing in each sort of separate cluster.

And then as a end user, you need to identify how you can get that data out, and you have to write a query that goes to a specific machine

to pull that data out. I think 1 thing that Tiger Graph did well was designing it with the end user in mind to simplify all that. So instead of having to

understand exactly where the data is, how to get access to the data, and then writing queries in different ways,

you can essentially write a query and all that sort of handled by the TigerGraph platform itself. Yeah. And I know that with graph databases in particular, being able to scale horizontally can be challenging because you need to understand

where and how to partition the graph, particularly if you have super nodes. And I'm wondering how much of that is exposed to the end user and how much is able to be pushed down into the core engine so that the end user can just write the graph data. They don't actually have to think about what the partition structures are going to look like as you scale out horizontally and how much of that you need to do in terms of upfront design and how much you're able to

just defer to the core engine to handle for you. As an end user, that was something that was very nice. I didn't have to worry about that and how to partition the data, where the data resides, and, you know, in what node across what machine is the data persisting. When I write the algorithm or the query, how to access that data, that was all removed as an end user.

So as an end user, I could just focus on accessing what I need to understand about the data itself. I don't have to worry about the complexities under the hood. There is the ability to go in there and to make changes to some of that logic as well. But as a user, you don't have to understand all the complexities under the hood of of TigerGraph.

In terms of the modeling aspect of

it with relational databases,

engineers are used to being able to start with a particular structure and then be able to create a migration to add or modify tables or change columns, etcetera. And I'm wondering

what the equivalent process looks like when you're designing a graph structure

where maybe you start off with a certain core set of objects that you want to model. So in the patient example that we've been using, you know, I wanna be able to model a person that has attributes of name, age, gender,

geography,

and then maybe I also want to add in another core object of

medical care facility, which is going to have its own attributes of geographic location, staff,

the sort of facilities that are available to it. So maybe this 1 has a X-ray department, whereas this 1 has a, you know, radiology department and just how you're able to

mutate and modify and expand the graph structures as you dig deeper into a problem space.

1 of the functions or features of TigerGraph includes the schema change job. So to make changes in your graph solution is writing a a simple block of code that goes in and alters the graph.

So as a end user, you don't have to do much other than, hey. I have this new use case. I have this new data.

I have this new way of looking at the data itself, and I need to reimagine the model itself. You don't have to drop all the data, only the the data that's related to

modification or change. It's very easy for the user to be able to go in there and create a a modification

with very little cost as far as maintaining the solution.

In terms of the

ecosystem that exists around TigerGraph specifically, but also graph problems in general, I'm curious what you've seen as some of the types of tooling or

material for educational purposes that are available.

You know, for example,

with application development frameworks, they're typically

shipped with an object relational mapper for being able to translate between your program logic to the relational database engine. I'm wondering what you've seen in similar cases for,

graph engines for being able to incorporate them into application designs.

I know that the machine learning community has been investing a lot in being able to build

graph algorithms into their different machine learning and deep learning libraries. And I'm curious what you've seen as far as integrated support for being able to work with things like TigerGraph as the storage and computation engine for being able to power those machine learning applications.

Yeah. So we have talked about our ORM in in our community quite a bit and creating 1. We just haven't gotten to the point of building 1. I think 1 of the biggest enablers for people that are building on top of TigerGraph is the ability of every single query or logic that you write inside your Graph solution is compiled down into a REST endpoint. So you can instantly

call to the logic of what you wrote. So if you have input parameters, you can call to that. In some cases, you might have 1 query that has a subquery of some function, and they can

call the subfunctions, and then you can retrieve the information. We have, like, a GraphQL connector as well.

We have different connectors based off the tech stack that you're using.

I would say the most important part is just the focus that TigerGraph has had on enabling the REST services that they have in their product offering. So a lot of the different things that you could do,

such as if you want to retrieve the metadata, you can call a REST endpoint. If you want to

up cert data, you can call REST endpoint. If you want to

call the query, of course, you can call the REST endpoint. There's a lot of different actions that you can interface with from a REST perspective,

which makes it really easy to integrate with.

Another question that comes up a lot when you're dealing with specialized storage engines is the question of polyglot persistence versus using a particular engine as the system of record.

And I'm wondering what you see as the

sort of decision path that people go through when they're

figuring out, do I want TigerGraph to be my system of record? And that's actually where I'm going to interact primarily with my data

versus using Tiger Graph as an optimization

for

specific subset of problems where maybe they're using a Postgres or a MySQL as their system of record for the application,

and then they're either replicating some of that data or storing a subset of their data into TigerGraph for being able to optimize for those graph algorithms

in the cases where that makes sense?

I think it's a little bit of both. I see both use cases. It depends on what it is that they're trying to do. So, for example,

if you have an application that is needing to pull together the the information from many different sources

in real time. What you can do is set up that streaming from this the downstream source systems to TigerGraph

and pull that data together in real time. So as soon as it's entered into Tiger Graph, all the relationships are built, and then you're traversing the graph to

extract that information.

So when

end users are deciding that they want to incorporate a

graph system as part of their overall architecture

and

set of capabilities,

I'm curious what you have seen as some of the overall system architectures

or supporting systems that they will build

to be able to work alongside TigerGraph,

either for feeding data into it or

being able to query from it and just some of the types of use cases and applications that are built on top of it?

Yeah. So as far as architecture

wise, majority of the use cases use Kafka streaming data in directly. There are some nontraditional

data sources that drop files into a certain zone and you want to pull and extract that data together

to import into TigerGraph.

But I would say it mainly, it's via streaming through Kafka is the primary way. As far as the architecture

in at least some of our use cases, we'd use Jenkins as the orchestration service. We built our own sort of Gradle plugin to build scripts to sort of

execute scripts in certain ways, and to create logic within the code itself to reproduce the graph based solution that we wrote. So

as soon as we would create different scripts for the health care solution, we would then send them to GitHub and then would be trigger off a bunch of different jobs that would go and test and run through our unit test and then deploy to a test server.

So as far as some of the monitoring ones, there are some Elk is a monitoring service that TechGraph supports out of the box. They have a file b configuration file that's generated for your login. So as an end user, you wanna understand the the queries and optimize for the queries and what's the memory that's taken up when you're writing the queries.

And so you're gonna look at oftentimes

the different logs through either Elk Stack or Datadog was another platform that we had used as well.

In terms of the

sort of community aspect, which is where you've been spending a lot of your time and sort of your prime directive when you joined the company,

I'm curious what you have seen as far as overall feedback, some of the growth in terms of interest and

investment by the community

in building up different tooling and design patterns around TigerGraph.

When developers are able to contribute and put their time into building something open source,

It's not something that we take for granted, and so it's something that we want to

highlight and encourage the community and help them out with building whatever it is from a syntax highlighter to maybe an ELK Stack configuration,

to a full stack example, to how to use Plotly Dash and integrate it as a data scientist into your project.

So we see a lot of activity that the developers are building different applications,

different tooling, different ETL tools, how to

pull data out of x y z system and push it into. 1

example of that is is, like, Node RED, which is a nontraditional

orchestration service. It's really designed and built for

the Internet of Things. But when you put a Node in there, you can you can pull data from say, you have Twitter and you wanna send it to AWS's

natural language processing solutions and then put it into Tegraff.

That is some of the things that the the community is currently and actively building is these tooling around TigerGraph.

From a use case perspective, that can be everything from

knowledge graph systems are really popular. So we might have some source data. For example, during COVID 19, there was a bunch of articles that were published around

COVID itself.

And so 1 of the challenges is you have a bunch of text, how do you wanna process this text? Maybe run it through the machine learning model to do entity extraction,

and then model those entities inside of TypeGraph,

and then you're traversing the graph to find and derive different papers that are related to certain topics.

Another big thing around that is

concept maps. So if you have related concepts, even though the entity extracted is a certain word, there might be a correlated words with that entity.

And so we see a lot of things around knowledge graphs.

We see a lot of things around supply chain and use cases around supply chain. So when you wanna track, let's say, a part being created,

and then you want to get that part all the way to the end product, which is a car, What happens if a boat gets stuck in a a canal and disrupts your whole supply chain? What happens if, you know, this huge global crisis that's going on affects your supply chain? So we're seeing a lot of different supply chain use cases.

Other use cases include, because you can do deep link traversals,

is crypto space. You know, cryptos are pretty popular right now, and so we see a lot of people in the community building

transaction tracing. And so they can look at address to address to address to address and do deep link tracking. There's also things around geospatial

and looking at the distances and calculating the distances, which if you think about the maps of intersections on the roads and the intersections, and you think about that intersection as a vertex itself and the road itself

as an edge, there's a lot of things with logistics and maps, geolocation mapping.

Are you struggling with broken pipelines,

stale dashboards,

missing data? If this resonates with you, you're not alone.

Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end to end data observability

platform.

Trusted by the teams at Fox, JetBlue, and PagerDuty,

Monte Carlo solves the costly problem of broken data pipelines.

Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, DBT models, airflow jobs, and business intelligence tools,

reducing time to detection and resolution from weeks to just minutes.

Monte Carlo also gives you a holistic picture of data health with automatic end to end lineage from ingestion to the BI layer directly out of the box.

Start trusting your data with Monte Carlo today. Go to dataengineeringpodcast.com/monte

Carlo to learn more.

Another interesting aspect of

any data storage system, but particularly 1 that is for a sort of specialized data model,

is the question of being able to

build up sort of data sharing and collaboration

communities, which is an area that companies like Snowflake have been investing in. And I'm wondering what kinds of capabilities TigerGraph has for being able to

build and share and expose and collaborate on different data structures to be able to

grow community in that way, to be able to say, here's the dataset that I've built, you know, based on

translating OpenStreetMap data into a, you know, set of vertices and edges that consists of all of the road networks in the continental United States, for example,

and then being able to say, okay. Now I'm going to hand this off and, okay, somebody else has actually done that for Canada. So now we've got an interconnect, and now we wanna be able to say, what is the optimal path for being able to get from, you know, Dallas to Ottawa, for example?

What are the companies doing together to achieve this?

Right now, I think the biggest difference between the graph space and the more traditional relational space is you have SQL. So you have a standardized language in which you

can communicate with all your your database solutions.

1 thing that's happening right now is the standardization.

The same committee that got together to create the SQL standard is now creating a new standard called GQL.

Once that comes out, I believe in the next year and a half, 2 years, there will be more of a standardized language and a standardized way to interact with database solutions. I think once we have

that standardized approach of interacting with databases, we'll have more sort of

shared resources between different companies.

Now there are a couple of different committees out there. There's this LDBC

committee council

that is looking at sort of standardizing the approach of how do you measure the the speed and accuracy of maybe not the accuracy, but the speed of which you can derive information based off certain questions that you want to ask your data. And so there, they have right now basically, like, a social media

dataset in which all of the graph database companies are going on there and being able to have sort of a standardized

way to sort of measure the speed at which you can drive these questions that it's asking of the social media data. Now they're also looking at adding additional data sources as well. 1 of the ones that we're contributing to as far as a shared data source is the Cynthia Medgraf.

So I come from the health care space again, and 1 of the things is you don't want to give everybody health care data to explore. So we

are using a synthetic health care solution and model the solution

and provide ways to look at that data just as you would if you're a health care provider.

There are other synthetic based solutions, but I think that's somewhere that we could play nice together as far as all the different companies is coming up with not only the standardized language of how to interface with the database, but also some

northwinds type of examples of how you can actually look at the data or query the data.

In terms of the applications

that you've seen users build on top of TigerGraph, what are some of the most interesting

or innovative or unexpected ways that you've seen it used?

The most interesting to me, maybe not the most interesting to everybody else coming from the health care space, was learning about them doing energy grid management systems. So they're recalibrating the energy grid system in real time. So I thought that was really neat where they're able to compute

the power consumption of all the different transformers and from this route to that route. To me, that was a little bit untraditional, but because it's an analytical engine, you're able to compute that when you're doing this traversal. So I think that was the most interesting use case for me that I ran into when I was exploring TigerGraph.

In your experience

of helping to build and grow the community around TigerGraph and working with the graph engine and in that whole ecosystem, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

The most challenging

is it's a new language. Right? When you have to learn a new language, it's a little bit tough. 1 thing that is really nice about the G SQL language of TikGraph

is it follows a similar syntax to SQL. So if you're an SQL developer

and you come into

a brand new language, on day 1, you can basically read the code itself, but I would say

learning a new language is probably 1 of the toughest parts of

any of our graph solutions or graph companies out there right now. I think that's where graph adoption will pick up once there's a standardized language. Yeah. I think 1 of the most interesting things coming into the graph space is learning how to

design

and how to traverse

and the optimization

of doing different actions within your query logic will affect the way that you can

not only achieve the results. So just a small modification

of your query could impact the performance

by a hundredfold. So if you, for example, do a for each loop outside

and have your select statement inside the for each loop, then that could hinder the performance, but it does what you're trying to do versus taking it as a function itself. So you write a subquery,

and then you're able to call that subquery as a function.

Just being able to think about how

to write queries has been very interesting

and something that is also difficult for sometimes people to grasp is the query performance. So, usually,

we do have a lot of great material out there about schema practices, best designs,

but that's something that does come up a lot where they're doing something very inefficient as far as traversal, and there might be a lot of memory that might be being consumed on that traversal, which could cause issues as well.

On that question of query optimization,

is there something

analogous to the sort of explain, analyze query that will show you what the query plan is going to be based on the specific statement that you entered and then being able to understand, okay. Well, because I did this select statement inside of a for each loop, it's actually going to greatly impact the performance of my query versus if I were to do this as a, you know, function or a subquery.

Yeah. Unfortunately, there isn't a query planner that they have integrated into the product, so I don't have the exciting answer of, yes. We have that

currently. There are some other features, including the visual query builder. So let's say you're not writing the query itself,

but let's say you're just an analyst, you're not very comfortable with SQL.

What you can do is use the visual query builder, which is a no code solution to basically draw what you wanna do. So it's not necessarily like a query planner that will go through execution,

but what it will do is take what you're trying to derive out of your graph solution

and then write a query based off what you're trying to extract,

that query that's been optimized.

And so for people who are interested

TigerGraph is the wrong choice and maybe they're better suited with just an abstraction layer over a relational dataset or

a different graph engine or just actually sticking with tabular datasets and not trying to get involved with graph and incorporate that into their problem?

I think if you have a traditional system that's just, you know, transactional and you're just retrieving things, there's no connections,

you should stay with what you're currently working with. But if you have highly interconnected data and you need to retrieve data across different source systems

and you need to pull that back in real time and there's data that's

constantly

being streamed to your solution, and so the data is dynamic and it's not just static.

If you have those problems where you're running a bunch of joints and you have sort of like this step by joints

and trying to retrieve this data, I think that's when you want to use TigerGraph.

Of course, you could use it to read

just a single vertex, but where it's really powerful is when it's combining a bunch of different data elements together, and you're trying to find insights inside of that data.

In terms of the

future direction of TigerGraph,

what are some of the

areas of focus for the near to medium term or any particular

community engagements that you're excited to

be involved with or just some of the things that people can look forward to in the months years to come?

Currently, right on the road map is,

again, the elasticity

of scaling. So to be able to scale up, scale down,

deployment is really important.

Right now, building out more data connectors and ecosystem, product supported ecosystem components so you can integrate

with core products. Integrating into different cloud service provider tooling

is also really important right now. The other things that are exciting, we're doing a $1, 000, 000 challenge. So

I think that's pretty exciting concept in which we're asking the

participants of the challenge to come with their background, their skills, their domain knowledge,

and to be able to solve some of the toughest challenges.

And so that's actually wrapping up in April, and the winners will be announced in May. So really the $1, 000, 000 challenge is really about making the world a better place, and so

we're using this not only to activate our community, but to engage the community, to drive brand awareness.

And we plan to do many different other activities in the future,

including other hackathons,

other challenges that will get the community excited, get developers excited, get developers to

understand how to use the technology.

Yeah.

Are there any other aspects of the work that you're doing at TigerGraph or the overall space of

graph applications and graph storage that we didn't discuss yet that you'd like to cover before we close out the show? Yeah. I think something we probably didn't touch upon

is the language itself. So

G SQL is specifically designed to be turning complete, which means you can put that complex logic inside your query solutions.

But what also that means is for our graph data science library,

if you want to be able to optimize

or create or change or modify your algorithm,

you can do that within the language itself.

Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tool in our technology that's available for data management today.

The biggest gap working from an enterprise perspective was

the metadata management. So just every source system is different. The types of data is different. The date time formats are the most annoying thing that I ever ran into

is everyone has a different date time format. I think having more standardization

around the data in itself and maybe that logic of data

to help the end users, the users that are building the solutions

on top of maybe these these different data sources.

That was the most challenging thing that I ran into from

a data management perspective was the

different data types. And so

I think that's 1 area that I'd love to see some really interesting creative solutions. And so 1 day, hopefully, there will be no data cleaning. There'll just be magic, and data will be cleaned.

1 day, there'll be consistency.

1 day, there'll just be types normalized.

1 day, there'll be everything sort of automatically done for you, and so you can just focus on what you're trying to derive as the analytical

insights that you're working on versus always being in the data. Thank you very much for taking the time today to join me and share the work that you've been doing at TigerGraph. It's definitely a very

interesting product and interesting problem space, and it's great to see different companies investing in graph

analytics capabilities

because it is something that is absolutely necessary as we scale up the

volumes and types of data they're working with because of natural interconnectedness that exists in the universe. So appreciate all of the time and energy that you've been putting into helping to grow the community and ecosystem around that, and I hope you enjoy the rest of your day. Thank you so

much.

For listening.

For listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways that is being used.

And visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave review on Itunes and

tell

your

friends

and

coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links