Bringing Business Analytics To End Users With GoodData

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

What are the pieces of advice that you wish you had received early in your career of data engineering? If you were to hand a book to a new data engineer, what wisdom would you add to it?

I'm working with O'Reilly Media on a project to collect the 97 things that every data engineer should know, and I need your help. Go to data engineering podcast.com/97

things to add your voice and share your hard earned expertise.

When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With their managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar and Packaderm.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linode

today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Good data is revolutionizing the way in which companies provide analytics to their customers and partners.

Start now with good data free that makes our self-service analytics platform available to you at no cost.

Register today at dataengineeringpodcast.com/good

data. Your host is Tobias Macy. And today, I'm interviewing Sheila Jung and Philip Farr about how Good Data is building a platform that lets you share your analytics outside outside the boundaries of your organization.

So, Sheila, can you start by introducing yourself? Hi, everyone. My name is Sheila Jung, and I started working for GoodData almost 5 years ago as a solutions architect in professional services.

I am currently a senior manager in product enablement, leading a team of customer engineers and developer advocates.

And our team's main focus is to empower the GD developer community and enable our internal sales team. Thanks for having us. And, Philip, how about yourself?

Hello, everyone.

My name is Philip Farr. Having been with GoodData for a handful of years at this point, I've held several different positions, which have touched on various aspects of the data engineering world.

My current role is as a senior manager of technical program management and customer success,

where I help oversee a team which partners with our customers to,

provide

technical expertise

and data product development,

and ensure a positive mutually beneficial experience

throughout our entire customer life cycle journey.

Thanks for having us today, Tobias.

And going back to you, Sheila, do you remember how you first got involved in the area of data management?

Yeah. Definitely.

I got involved in data management right after college when I joined a boutique consulting startup,

as a BI and ETL consultant.

This is where I started getting familiar with various data management, warehousing, and visualization tools. That startup was eventually acquired by Teradata, and I've been in the data space ever since. And, Philip, how about you?

So from a

a a professional first experience,

was was probably when I joined a consulting firm,

which specialized

in cybersecurity

and privacy strategy right out of college.

During

that time, GDPR

was becoming an increasingly hot topic in the security and privacy space.

Many companies were searching for,

difficult to answer

questions, like,

do we already have or can we create some type of data flow diagram to help us identify the right data relating to an individual,

or how do we ensure proper deletion

once,

an individual requested that, once you've identified all of this information,

and where it was located?

So my my firm was really brought in to help,

solve these complex problems, and and that's pretty much where I was,

exposed to, data management.

And so in terms of the work that you're doing at GoodData, can you give a bit of a description about what it is that you're building there and as much of the origin story as you're each familiar with?

Sure. So Good Data's origin story starts with our CEO and founder, Ramon Sonek.

Ramon was previously the founder for,

multiple companies like NetBeans and Sysranet. And after his last company got acquired, he started GoodData in 2007,

and his mission was to disrupt the BI space and monetize big data. At GoodData,

what we're doing right now is we're building a way companies can provide analytics directly to their customers and partners.

Good data is seamlessly integrated into workflows and provides,

authenticated

access to business reporting,

dashboards, and ad hoc analysis,

which

is delivered at a timely, relevant, and customizable

way for the independent users.

And

the business intelligence space in general has been around for decades at this point with a number of different iterations of it and different generations of how the tools are used and the types of data that they're dealing with and questions that are being asked and answered. And I'm wondering what the factors are that might lead somebody to choose good data over any of the other contenders in the space and some of the use cases

that it's uniquely well suited

to work for? Yeah. That that's a great question. And I think, you know,

we are from good data. So, obviously, we're we're we're slightly biased in this regard. But we really provide you with a platform to help boost the adoption of your application by embedding modern analytics.

And the goal is to empower all of your end users

via ad hoc exploration

or templatized analytics

and really remove

what we would consider to be the costly customization of 1 off reports. Right?

Say, an internal organization

spinning out 1 time reporting,

to support whatever questions or answer whatever questions they're seeking. And so we've developed

this cloud hosted solution where changes roll out very quickly, and we we really enable those end users to continually benefit from our platform improvements. Right? And this can be done through, you know, powerful

analytics for any persona,

self-service

dashboards,

interactive visualizations,

what what you could expect from a analytics tool.

On an on another hand, we're really looking at flexible pricing options, which help enable,

companies

to use good data,

which are vary in size. Right? So we recently

launched free and growth pricing tiers for easy entry and quick scaling into

different departments or across various solutions for customers.

And then, you know, rounding out those offerings, we do have a an enterprise offering or a long standing offering, which supports thousands of users and terabytes of data. So we're we're really pushing the boundaries of scale there. From the

flexible

platform perspective, right, like, this is another 1 of the the the key,

differentiators,

is the the platform is built, for flexibility

for developers, and we provide kind of infinite options here.

We have a React based JavaScript library for creating analytical interfaces,

which users and their creators absolutely love. In addition to this, there's

very well documented

APIs, which,

allow, you know, creators to,

interact with all platform capabilities via SDKs that we have developed. Furthermore, I I think another area is really the robust and flexible data integration where we we have the ability to support any, you know, data source or technology stack. We integrate with things like Snowflake, Redshift, BigQuery, so all of the big cloud data warehousing names,

as well as the,

ability to ingest data through hundreds of different connectors.

And this this is not limited to, say, small data volumes. It's it's any data volume, any type of connector. We can do custom connectors as well. And then,

you know, rounding it out is the

enterprise level security,

and governance structure that we have put in place. We support things like agile change management, real time user provisioning,

solution monitoring,

and we have a variety of,

compliance,

and certifications

in place, SOC 2, HIPAA, GDPR. These are all things that we're able to,

help you solve as a customer.

Regarding the use cases that, you're mentioning, Tobias, I would say that there are a wide range of use cases in industries, anything ranging from financial services to retail management. We don't actually have a single industry that we focus on supporting.

In going off of

what Sheila said, I think our primary focus has really been to develop an analytics solution which we call powered by good data, and that's kind of our term for it. And you the best way to explain powered by good data is it really is, an industry agnostic use case or a business model, if you will, where we partner with a customer, and then that customer

is looking to

distribute analytics

to all of their customers and provide it to, you know, tons of end users. And so we a typical

good data use case would be to distribute

analytics, that same version of analytics at scale, in the context

of the end users' data for as few as, say, 5 customers, but as many as 25, 000.

And as Sheila mentioned, right, this could be done pretty much as you can imagine for any industry. And that's 1 of the interesting pieces of the product that you're building where a large component of the business intelligence market is focused on these internal analytics use cases where you connect it up to your different data sources, usually some sort of data warehouse, and you have scheduled reports and dashboards that internal business users can look at to get a temperature check and get some sort of sense of how things are going in their business,

maybe things like inventory or sales figures.

Whereas with good data, what you're saying is that it's primarily focused

on external analytics where maybe a SaaS platform is providing some view to their customers in terms of the usage that they're getting out of the platform or maybe sales figures that they're tracking in something like HubSpot. And so I think that that is 1 of the unique

things about good data where it is much more external facing. And by virtue of that means that you have to take much more of a platform approach versus a point solution that you might get with something like Pentaho or Redash or a, Superset or something like that. And I'm wondering what your thoughts are on just the overall market of business intelligence tools and how that's evolved in recent years and some of the contributing trends in the technology and business use cases that have brought us to where we are today and what where you're going with good data. Yeah, Tobias. That is a really great 1. So I would say that business intelligence tools

evolved in recent years, to help businesses make informed decisions.

So the change is that analytics has been used for decades to help businesses make informed decisions both strategically

and operationally

by deriving insights from the data that they've collected.

In more recent years, that has now shifted towards analytics everywhere. So rather than confining data analytics to a single use case or location like you were mentioning, We are seeing an increased demand and value add for distributed analytics for your business partners, employees, customers, pretty much everybody. And the goal is to enable all the people at various levels and businesses

to make these data different,

decisions with confidence.

And then from

the the trend perspective, Tobias,

I think there are a couple worth noting,

that we do see

customers actively pursuing and and and building upon. So the first 1 that I I'd like to call is really the democratization

of technology.

Right? Right? And and this is really where customers or

users are having increased accessibility to technology, and this is pretty much pervasive throughout every industry. Right? And for us, it's really the change towards analytics everywhere,

and it's attributed to the increased

accessibility of technology and modern days reliance

on this type of accessibility.

So

this spans all aspect of a business from, say, the employees

to clients or customers

to stakeholders or executives, right, who all need to be able to access

and make

decisions in real time, right, make those data driven decisions that are important.

And so, you you know, we're we're seeing this in a variety of industries.

1 example would be, say, the sharing economy.

Right? Now as opposed to maybe a historical approach where decisions are made just

based on, a couple or handful of internal stakeholders. Right? Now you're needing to put data in the hands

of many, many users. Right? The people who are actually sharing or participating in the sharing economy,

the, you know, the providers or the renters,

and and how do they get the access to the data that they need to make these decisions in real time? And that has a,

a really high influence over your business.

And, also, like, when you provide that type of analytics to

to,

those people or those individuals, it helps enable companies to overcome

their competitors. Right? You're giving power back to that user

In a similar

vein but a different,

trend. Right?

IoT, the Internet of Things, is an area where we see many customers playing.

Businesses where

they may not have realized historically

what data they had access to or how powerful that data is to provide it back to their end users or how to effectively share that with people across different levels of data literacy.

And so we're we're talking about more archaic industries like auto parts or

transportation,

which can be slower moving and more

publicly, you know, or governmentally regulated.

And so it's it's really

giving these large

industries

access to analytics,

which they probably never had thought about. They never maybe requested it, or they may not have ever thought that this would ever come to their industry. But we're helping enable

those types of companies to modernize

and provide

insights to to their customers.

And the other piece that's interesting is the fact that, as you mentioned, you're developer oriented where you're focusing on exposing a set of rich APIs for being able to build analyses and visualizations on the underlying data where a lot of the existing suite of business intelligence tools are built as a vertically integrated solution where the dashboarding is just a native capability, but also likely somewhat constraining interesting

and unique interesting and unique ways that that capability is being leveraged by your customers.

So thanks, Tobias, for mentioning the APIs. That's definitely something that our end users are leveraging, especially from the developer aspect of people that are,

integrating the, good data platform into

their own analytics.

And that's something that we,

leverage through embedded analytics. And the ways we're able to embed the GoodData platform

into the client's products or their own application is in 3 different ways. The very first way is just through, like, direct embedding via Iframe, where you're getting the good data reports

directly into the client's app or platform, and

that isn't utilizing,

the good data you APIs.

Another way is just to embed the link that is directly linked to the white label good data portal. And then the third way is the good data, UI or gd.ui,

which is the react based development library,

allowing developers to seamlessly integrate into good data with their product. So combined with something that we developed called the accelerator

toolkit,

this pretty much streamlines the front end development efforts

so that, there's a lot of custom visualizations

and integration

into the customer's app. And going off of that, I think

where this really plays a role in is,

the, say, the customer development life cycle. Right? The this is where we see the most heavily

or heavy reliance on APIs or open APIs and SDKs,

which really allow for that seamless integration

and, say, a CICD

type of system.

You you know, the platform really provides,

say, advanced support for release and rollout procedures, which a customer can leverage

to

cascade

across

different environments and manage all of those life cycles independently. So we're talking about, say, a development life a development environment or a QA environment or a production or a series of different production environments based on,

segmentation.

The platform also provides support for, say, on demand provisioning, right, that can be driven and integrated with SSO. For example, say, on demand provisioning via

SAML

2 assertion. And so, you know, 1 of the the good use cases, I think, Tobias, you had mentioned a use case is we do have a customer who has

is using good data to provide analytics on project management software.

There is a lot of technical complexity to the good data platform.

They control all aspects of their application in which they bed

embed good data at different levels of granularity

and expose

the analytics to their end users.

They had very detailed customer

customization

that

supports,

you know, the their preferred user experience

through, say, our APIs and SDKs.

And they manage all aspects of good data, including

things like data loading,

user management,

including role assignment, provisioning, deprovisioning,

front end development

for

2, 000 plus

of their own customers.

And because of the fact that you are

serving up these analytics capabilities

to developers and end users who are integrating it into their own product, whereas a lot of the business intelligence market was oriented towards

the business analysts and data scientists as their end users. I'm wondering what you have found to be some of the useful guidelines or guardrails for helping your

regards to,

the semantic layer here, and the semantic layer here is really the logical data model that we can speak about.

This is a layer that ensures that everyone understands the data in the same way, including self-service users.

So this,

semantic model can be leveraged for

guided analytics

and provides a shared understanding for those analyzed entities and their relationships.

This means that objects that were created by analysts once can be used by other common users and helps them to interpret the data and perform ad hoc data discovery, and this is possible through our analytical designer. So as you were mentioning, when we're migrating from the concept of,

individual

contributors or, like, individual

reporters looking at data analysis,

this semantic layer allows multiple people to take a look at the same data and understand it in the same way. And I think thematically, Tobias, like, 1 of the things that you're referring to and kind of it it's 1 of the things that we find to be most valuable about good data is this this concept of migrating away from just these internally used business intelligence tools

to these externally used,

say, embedded analytical products. Right? And and when we look at these as a whole,

both have similar types of requirements for things like data security and compliance and

the great user experience

for productivity,

say, the ease of development.

Maybe it's the

the different ways to integrate the data or the ability to build semantics around that data.

Right? But when we consider

the realm of embedded analytics,

right, or it's something that we call analytics everywhere at GoodData,

we're we're really looking to embed directly into that software or that application and and

specifically in the context of

that use case

for that end user. Right? Unlike the internal analytics space,

the embedded analytics really requires

very strong life cycle management capabilities.

And and we're referring to,

you know, provisioning,

versioning,

how do you perform releases,

how do you roll that out to many of your customers or your end users.

Right? If you

have, say, 3 customers, you may be able to build a solution from scratch every single time and maintain those

changes and silos. But if you start considering, say, 100 or 1, 000 or maybe even tens of thousands of customers and up to 1, 000, 000 users, right,

How do you manage that? You really need

a a completely different

architecture and power to operate or handle that change management and develop a solution for,

may say, different customer segments or different ways that you're choosing to monetize that data product

or different

access to different datasets. Right?

And so the the difference is really that the internal analytics

is centered around, you know, personal aid and productivity,

where these, you know, SaaS embedded analytics provide

an aid in collective productivity

across

users

and across multiple organizations.

The the 1 other piece I'd like to add for for that is there's an additional added complexity

when you're considering

many, many users. Right?

And and that's really you know, how do you provision and deprovision all of these users, but also retain,

say, the complete control over

different levels of access?

And that could be access at the the the as granular as, say, the data row level. Right? Or it could be access to, you know, which dashboards or which reports that they're getting or the ability to create their own dashboard. So these are all complexities, I think, that we are solving for in this world of embedded analytics.

And digging deeper into the good data platform itself, can you talk about how it's architected and some of the evolution that it's gone through as you have continued to build out new capabilities

and

stay up to date with the changing landscape of

architecture piece. Right? We we've developed what we consider to be a very modular set of components,

which our customers can kind of slice and dice and and append to 1 another to create this end to end distributed analytical solution.

And so I'll kind of walk you through

maybe data source level to, say, dashboard level.

At the lowest level, you know, when we're talking about this end to end pipeline, we're really

speaking to data ingestion. Right? We have a 150 plus connectors

that are available for regular download of data from all sorts of source systems.

We we also do direct connections to those cloud data warehouses like Snowflake and Redshift and BigQuery.

We could build out custom connectors

in certain instances that sit on top of our customers'

open APIs as well.

So we we've we've we've done all of this in the past.

From there, we

run through our ETL processes and load data into

what we call,

ADS as the acronym, which stands for agile data warehousing service. And this is our internal data warehouse for staging and transforming the data. It has, you know, all sorts of,

different, you know, tables and and and views to support the transformations.

And then ultimately,

you know, once we're ready to load this

to,

different, say, tenants, right, or workspaces

and good data terminology,

we use a mechanism called automatic data distribution or ADD.

That distributes data to workspaces

themselves.

And what happens is, you know, we can do this from ADS. We can also do this directly from cloud data warehouses

if if the schemas match,

whatever is in the logical data model,

in the workspace. And so we do have a lot of flexibility

there to, you know, load data into workspaces

And to

define,

workspaces for for the audience,

a workspace really is this end

storage, which is a datamart

that is

loaded with a specific subset of a customer's data so that, you know, when their customer wants to view analytics in the context of their data, it's just

that particular subset of data that's been loaded. The workspace

you know, within that workspace, we have the logical data model or the semantic model, which Sheila had referred to earlier.

And and on top of that semantic model, we're able to easily build out dashboards,

reports,

metrics,

all in the the the context of the business, as well as enable

a key functionality of the platform, which is analytical designer, which we use as an ad hoc data discovery tool where you can easily drag and drop, slice and dice your data so that you can come to your conclusions and insights more quickly.

We spoke about SDKs.

Right? We have SDKs that interface with our open APIs, and then we also provide the tools for embedding, say, via Iframe,

more granular embedding via our good data dot UI, which leverages the React and Angular framework. And and and throughout all of this, right, we have the life cycle management tooling to control provisioning,

releases, rollouts,

user provisioning as well. And I I think the guiding principle for all of this architecture really is governance and security. Right?

It it provides,

you know, a a driving force behind the the reason why our platform is architected the way it is.

We support, you know, complete end to end SLAs. We have, you know, top security

for, you know, those certifications like GDPR

or HIPAA compliance.

And

we really have good data as

these platform components that we can, you know, fully manage our own infrastructure

and provide that

design, which enables, you know, high performance,

largely scalable,

and,

ultimately distribution of these analytical workspaces.

And with the modularity of your architecture, I imagine that also simplifies the use case of letting your customers have different integration points

into your platform for determining where in the life cycle of their data they want you to take over because there might be some custom ETL logic that they wanna do on their systems before they load it into their workspaces,

or they might just have a data repository of a data lake or a data warehouse somewhere, and they just want you to do everything end to end. And I'm wondering what the options are for people who already have an existing data infrastructure and processing capabilities

to lean on good data for just the pieces that they care

about and some of the

examples of customers who are hooking in at those different stages of their life cycle?

Absolutely. So what you're referring to is, what pieces of the good data architecture would the client want to leverage? So we're talking about,

the data warehousing piece that Phil was talking about with ADS where we could potentially get the aggregation of lots of different data sources on 1 place,

whether or not that's something that the client wants to,

leverage from the good data side or own on their side. There's also the loading mechanism, the ADT piece, where we're talking

about how the client,

would be able to load that data, whether or not they want to keep it on their side or,

actually keep it on the good data side. So the ways we're able to manage that is really the flexibility

of the types of sources

we're able to download from whether or not we are doing the transformations or just loading directly into the platform.

So with all the connectors that we have with these prepackaged

Ruby bricks that are leveraging

the good data APIs as well as the source APIs,

we're able to integrate their data and load into ADS through those connectors.

Or if the client wants to own a lot of the transformations themselves,

match the exact,

metadata output for the, the semantic layer or the models that are on the workspaces, they're able to load that directly in with their, data warehousing source through our automated data distribution or ADD,

especially if they're using things like Snowflake,

Redshift,

or BigQuery.

And in the overall system architecture of what you've built at GoodData,

how much of it have you been able to leverage off the shelf components for, whether it's things like Kafka

or pre built data warehouse systems? And how much of it has had to be custom engineered

because of the complexities that you're working around and having to design around in order to ensure that the entire system remains performant for a multitude of customers in a multi tenant situation?

Yep.

So from from an off the shelf component,

perspective,

right, we're really talking about

2 2 primary areas, the first being front end and the the second being the back end. We are leveraging a pluggable UI framework on the front end. So this gives us the ability,

to use many publicly available

UI components like React and Angular

for the data presentation and visualizations.

And what really

is enabled here is that seamless integration. Right? You can basically build the custom

client application and give it that slick slick look and feel to match your client's vision or the, you know, their customers need or they maybe they have a specific style guide that they have to follow for all of their internal applications that they have built. So it really has infinite possibilities

relying on this particular framework.

Similarly, our back end, it has a pluggable container based

architecture as well. And so that allows us to deploy these custom code,

you know, custom code bits like these modular bricks that we had referred to previously,

which are essentially productized Ruby scripts that interact directly with our open APIs. And and and the idea is that we can deploy custom code that's written to our transformation processes,

and give kind of this more

flexible

architecture for ETL management and data pipeline management.

And and these bricks can be orchestrated into data transformation

workflows

for things like, data ingestion

context of our internal data data warehousing.

And then for somebody who's building their data product on top of good data, what is the overall workflow for being able to go from concept to completion?

Yeah. That's a that's a great question. Right?

So, you know, how how do you build a data product?

And the way that we approach it at GoodData is we really strive to build the data products,

which focus on a specific end user persona.

Right? Because we we believe that this is what

really highlights the value of having analytics,

or embedded analytics,

in the context of of an application

or or a site or whatever that may be. Once we understand the mindset of those end users,

we are then able to build the dashboards, reports, metrics,

KPIs, right, all of these analytics,

which enable those individual

visuals to make those data driven,

decisions in the context of their roles. You know, it's about understanding the problem

and then getting to the answer.

And that's definitely achievable through the good data platform.

The areas

that this can be broken down into when we talk about actually implementing an embedded analytical solution

really is in in 5 main areas. And the first is getting that data. We we need to extract and consolidate

from, say, your application

or your data warehouse or

flat files that you're pulling from disparate areas of your business. Maybe it's a connection to a third party that we need to augment, you know, your internal

data warehouse with,

some other tooling that you're using outside of your infrastructure. And then we use this to create the the data model. On top of that, the next thing that we'll do is build out the analytics.

This is where we create those

dashboards,

reports,

the metrics, right, that are relevant to the questions that are looking to be answered. And this is something that we think every single

1 of the customers should be looking to answer. Right? Like, this is the standard. Once we have that standard, we go through our release and rollout processes.

And this is where we often refer to something as a a template or a master. Right? This is the the the standard reporting that everyone will get out of the box.

And we take these life cycle management tools, and we

help, you know, perform this rollout on either a stand alone or maybe a set of embedded reports or dashboards, and we we we disseminate that to the entire customer base. And and once the customers have access to it, right, and they have access to their data, They can go in and finally customize

and build their own set of custom reporting.

Maybe they are looking at a different aspect of the business, or maybe they're looking at a reorganization

that they need to solve for. And so we're really allowing the flexibility

on the end user to to formulate the insights that they need. I think the final piece of an implementation is how do we operationalize

everything,

right, once it's in place.

We need to manage and drive the entire end to end life cycle for all of those customers.

Right?

And we do this through, you know, things like provisioning and deprovisioning

new customers

and and new users.

Or maybe it's in terms of managing growth and scalability

or monitoring actual usage on the platform.

So I I think these are all of the the key steps that we walk through in building,

say, a brand new data product.

And then the other

component

of building the data product and the perspective of the customers in terms of working with their data is I'm curious

how the overall life cycle of the data flows through the good data product from when the customer

first collects the data through to delivering it to their end users and ensuring that the overall experience is as performant

and robust as possible?

So the design that we have to work around,

to ensure performant access

to the customer's data sources in the case of creating a data product, I think, comes from 2 sides.

First is from the customer's data side,

understanding,

the customer's data instance

in the case that we're actually pulling things from the Jira warehouse or they're sending us files to be ingested into ADS.

Those are the kinds of things that we want to take a look at. What is the actual health of their production instance? Can we have access to separate schemas and views to make sure that we aren't negatively impacting their production environment?

What is the size of the data that we're putting in, that we need to architect around? Do we need to,

provide some sort of incremental ingestion logic with deletions and all that kinda of stuff so that we can handle large volumes of data in an efficient manner.

Another thing to consider is when we're looking at a client's data

from ingestion

to something that will match,

our logical data model.

They're gonna be very different. So from an analytical toolset,

you don't necessarily need things at the most granular or transaction level. So when we're looking at things like that, how are we going to maintain some sort of data retention policy

that matches the client's data, what they're giving us,

versus what's actually going to be on the platform. So those are the things that we might need to think through and architect around.

Another thing is,

as I was mentioning before about,

an aggregation of different data sources within our ADS layer. This is the kind of stuff,

that we like to showcase where we centralize our data pipeline across multiple data sources. So this is something that the customer can leverage as well if they want to look at a various amounts of data sources. So let's look at, like, a sales example. They might have their own transactions,

separately within their MySQL database,

but then they also wanna pull in their Salesforce data. We can aggregate all that information if it's not available for them, in their own data warehouse. And the last piece about just the data side is really looking at that custom connector piece. How are we actually

getting that data over from,

their side over to us? And historically, we've had, experiences in the past where we were able to build up these

custom bricks or connectors,

so that our, clients would be able to have their data migrated from their side over to good data. On the other side of this,

in terms of the data migration, I would also say there is a performant level that we like to acknowledge

from the platform perspective.

So when we're looking at large amounts of data or if we're looking at,

near real time analytics. There are things on the platform that we want to consider as well, like precaching.

Maybe there is a very important meeting that a lot of people like the good data analytics for. We want to ensure that everything

is cached before these kinds of important meetings or stuff like that. So we have scripts that,

make that possible.

Another,

ability that we're able to,

give to our clients is the option for different hardware,

to handle high concurrency or high data volumes.

And there are also, a lot of different ways that we've enhanced

the logical data model to make sure that it is,

very reasonable

to have performant access for for our clients. And 1 way that we are able to do that is through many to many relationships.

Rather than duplicating data and increasing data volumes on that Datamart on the client workspace,

we can leverage the many to many functionality

on our data models to help clients with that kind of access.

And in your experience

of building the platform and working on it yourselves and working with your customers to ensure that they're having successful outcomes, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

I would say the most interesting part of working on client implementations,

of good data is the breadth of business use cases we are developing for our clients.

So no 1 use case,

is is the same. They're all very different because all these different businesses have different business models and different data models. So I would say this was

especially rewarding because of the constant innovative data challenges that we had to overcome

with the good data platform.

My my take on this 1 is

based on all the implementations

I I've seen,

it never ceases to amaze me, but there's always

always a customer that's trying to push the boundaries of platform functionality.

Right? And and this is always a a tricky 1, especially in in a customer relationship

standpoint.

You want to help them meet their needs, right, within

all of the platform

limitations. And, of course, they want to do something new and creative, and you want to enable them to do that. But we need to find

a healthy medium without compromising

the end result or eventually setting that client or customer up for eventual failure due to a lack of sustainability.

And so I I it's it's always very, you know, interesting and proposes a unique challenge every single time it comes up because you have to go back to the drawing board and be like, these are all of our confines.

How do we build something,

new?

Right? Or maybe it is. We we reach out to product to help us extend some of the existing functionality.

But but oftentimes,

the solution needs to be more immediate.

And 1 of the interesting

pieces of the overall good data platform as well is the fact that you have introduced a

different

interface for being able to define the logical models using the MAQL

or multidimensional

analytical query language dialect.

And I'm wondering if you can give a bit of a compare and contrast

between that and SQL and some of the benefits and additional functionality that MAQL provides.

Yeah. So MaCl,

stands for multidimensional

analytical query language. And this is Good Data's proprietary query language for creating metrics or aggregations of the underlying data that is defined in that semantic layer. This difference for this differs from SQL in that it's, a streamlined analytical language with less code to write and maintain. So less technical folks,

find MacL easier to use, and we found that anyone familiar with SQL is able to pick up, MacL very quickly. The key advantages that I would state about MacL is, 1 working with a good data platform. It works out of the box. It's something that is innate in our platform.

It's also multidimensional.

So this,

pairs very well with our semantic layer. And going back to what I was saying about less code,

what happens is there are no joins or subjoins in Makl queries that,

are are stated to define a metric

because it works on top of the logical data model. So these queries are already context aware. And talking about the context aware piece with, everything being semantically related through the logical data model, any metric can also be, immediately used for, reporting and can be reused again. So this is something that,

can be utilized for all of our clients. It doesn't need to be rewritten. It can be, precomposed and then used by, 1, 000 or tens of thousands of users across their reports.

There's also the,

composability of these metrics. So you could put in nested metrics,

build foundation metrics.

That way, when you drag and drop this into your analytical designer interface, you could apply different filters and all that kind of stuff. So there is that capability as well.

And I would say the last piece about,

a key advantage of Makl is the resiliency.

So a lot of the times when there is some, like, source to target mapping of, that will require some sort of significant refactoring

of the actual data model,

This sits exactly on the semantic layer. So there there really doesn't have a

a serious impact on the existing metrics or reports,

unless, of course, there was something that, like, a serious LTM change was made on the front end that needed to be released and rolled out. But I would say, in general, the benefit

of Makl is,

less code. It's context aware because of the semantic layer.

Composing metrics and reuse is very easy and the resiliency.

And in terms of good data, what are the cases where it's the wrong choice and someone might be better suited using a vertically integrated

internal platform or building out their own analytics solution for exposing to end users?

Yeah. So if you're seeking a single static

data visualization

for your management team or maybe a public chart for your website, good data is

will be too complex of a platform for you. In this case, using data visualization

tools or libraries would be better suited.

However, if you plan to do anything more than just a simple static data visualization,

you would need to find something that's more reliable,

that does have this life cycle management,

pretty much what good data has.

And what do you have on the road map for the future of good data in terms of new capabilities

or just overall improvements

or new use cases that you're looking to provide?

Big data continues

on the trend of analytics everywhere for everyone, and this includes improvements on data integration options,

bringing better data visualizations on the front end that is available,

through analytical dashboard, bringing better collaboration

between data engineers and analysts, and improving the self-service

analytics

ease of use for these nonanalysts.

And to add to that, I think another another component of our our road map that's really

exciting for us is

we're working on a a newer

Kubernetes based deployment option of GoodData. Right? And this this really will help us enable collocating the analytics with a SaaS application

that may be deployed,

say, in a public or private cloud platform or in a a local on prem data center.

So the goal is to enable the same functionality

that when we get out of the cloud hosted GoodData platform, maybe for companies that need more enhanced

control or stricter guidelines or just want to feel like they have the full ownership, and it's less of a managed service that we are providing to them. Are there any other aspects of the space of embedded analytics or

the product that you're building a good data or anything else in the business intelligence and analytics space that we didn't discuss that you'd like to cover before we close out the show? I I think that we touched on a a lot of the the key aspects and and the reasons why we think that good data has

a a competitive

advantage. So I I think I think we're

from from my perspective, we we touched on a lot of things I I hope the larger audience would find,

useful and and that we kind of present,

like, information from the the good data perspective,

as well. Alright. Well, for anybody who wants to get in touch with either of you and follow along with the work that you're doing, I'll have you each add your preferred information to the show notes. And as a final question, I would just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. Yeah. I I mean, that's

that's always the the big question, right, is is what's next? Where where are we heading?

I think from you know, I see that there could be improvements in terms of,

the data cleansing,

area of,

data management and and data engineering. Right? I I I know, you know, personally,

a lot of people, including myself,

spend a lot of time

debugging or cleaning up datasets or trying to

to ensure

that,

you know, the data is clean enough to run n 10 through ETL. And, you know, part of that is is error handling. Part of that is, you know, maybe

removing records, right, which could lead in some type of inconsistency.

Right? So, you know, the area I would like to see development on would be some, you know, very customizable

data cleansing solution that has a very flexible integration with other analytical tools. You know, a a drag and drop, similar to how we mentioned earlier, you know, a pluggable UI framework. Right? Is there a way that someone could build a solution that would integrate directly into our native technologies?

And we could customize that and and and deliver that as a a packaged option to our customers as well,

and really limit the amount of time that it takes to process

and handle all all of the the data and and troubleshooting

and limit the troubleshooting

and hopefully free up time for

more value add activities.

To add to that, I would say, not necessarily a big gap, but a big change that we would probably see in the future is as more users are getting access to tools where they have access to data. Maybe they didn't have access before as Phil was mentioning in a use case earlier.

We're gonna need improvements to make semantics,

and relationships in general in data a little bit easier to understand. So even though we do have, like, a semantic layer and other tools may have something similar to make it easier for,

for their end users to actually utilize.

I predict in the future that we would need to simplify this even further for a wider audience.

Alright. Well, thank you very much for both taking the time today to join me and discuss the work that you're doing with good data and empowering embedded analytics for end users and making the overall analytics space more accessible. It's definitely a very interesting product, and I had a lot of fun learning about it as I got prepared as as I prepared for the show. So thank you both for all of the time and energy you put into that, and I hope you enjoy the rest of your day. Yep. Thank you, Tobias. Thanks for having us today. Yeah. Feel free to share our contact information. We're happy to communicate offline with anyone who has any open questions or concerns or follow ups that are needed.

Yeah. But we appreciate your time as well. Thanks. Thank you so much, Tobias. Thanks for having us. Bye.

Listening. Don't forget to check out our other show, podcast.init@python

podcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site at data engineering podcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links