Bringing Business Analytics To End Users With GoodData - Episode 138

Summary

The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves there is a growing need to bring business intelligence use cases to a broader audience. GoodData is a platform focused on simplifying the work of bringing data to employees and end users. In this episode Sheila Jung and Philip Farr discuss how the GoodData platform is being used, how it is architected to provide scalable and performant analytics, and how it integrates into customer’s data platforms. This was an interesting conversation about a different approach to business intelligence and the importance of expanded access to data.

GoodData LogoGoodData is revolutionizing the way in which companies provide analytics to their customers and partners.

With data-driven applications now becoming the new norm, GoodData allows you to easily provide tailored scalable data access to multiple companies, groups, and users. Ready to see how you can get started? Start now with GoodData Free, our product offering that makes our self-service analytics platform available to you at no cost. When you sign up for GoodData Free, you get five workspaces for an unlimited number of users. You can continue to use GoodData Free for as long as you like, and our support team is available for whatever you need. If at any point you’d like to take your analytics to the next level, our team can guide you through the process of transitioning to our Growth or Enterprise tiers.


Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $60 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!


Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • GoodData is revolutionizing the way in which companies provide analytics to their customers and partners. Start now with GoodData Free that makes our self-service analytics platform available to you at no cost. Register today at dataengineeringpodcast.com/gooddata
  • Your host is Tobias Macey and today I’m interviewing Sheila Jung and Philip Farr about how GoodData is building a platform that lets you share your analytics outside the boundaries of your organization

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing what you are building at GoodData and some of its origin story?
  • The business intelligence market has been around for decades now and there are dozens of options with different areas of focus. What are the factors that might motivate me to choose GoodData over the other contenders in the space?
  • What are the use cases and industries that you focus on supporting with GoodData?
  • How has the market of business intelligence tools evolved in recent years?
    • What are the contributing trends in technology and business use cases that are driving that change?
  • What are some of the ways that your customers are embedding analytics into their own products?
  • What are the differences in processing and serving capabilities between an internally used business intelligence tool, and one that is used for embedding into externally used systems?
    • What unique challenges are posed by the embedded analytics use case?
    • How do you approach topics such as security, access control, and latency in a multitenant analytics platform?
  • What guidelines have you found to be most useful when addressing the concerns of accuracy and interpretability of the data being presented?
  • How is the GoodData platform architected?
    • What are the complexities that you have had to design around in order to provide performant access to your customers’ data sources in an interactive use case?
    • What are the off-the-shelf components that you have been able to integrate into the platform, and what are the driving factors for solutions that have been built specifically for the GoodData use case?
  • What is the process for your users to integrate GoodData into their existing data platform?
  • What is the workflow for someone building a data product in GoodData?
  • How does GoodData manage the lifecycle of the data that your customers are presenting to their end users?
  • How does GoodData integrate into the customer development lifecycle?
  • What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on and with GoodData?
  • Can you give an overview of the MAQL (Multi-Dimension Analytical Query Language) dialect that you use in GoodData and contrast it with SQL?
    • What are the benefits and additional functionality that MAQL provides?
  • When is GoodData the wrong choice?
  • What is on the roadmap for the future of GoodData?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:12
Hello, and welcome to the data engineering podcast the show about modern data management. What are the pieces of advice that you wish you had received early in your career of data engineering? If you were to hand a book to a new data engineer, what wisdom would you add to it? I'm working with O'Reilly Media on a project to collect the 97 things that every data engineer should know and I need your help. Go to data engineering podcast.com slash 97 things to add your voice and share your hard earned expertise. When you're ready to build your next pipeline, I want you to test out the projects you hear about on the show you'll need some more to deploy it. So check out our friends over at linode. With their managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like pulse R and Packer with simple pricing fast networking object storage and worldwide data centers, you've got everything you need to run a bulletproof data platform. Go to data engineering podcast.com slash linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Good data is revolutionizing the way in which companies provide analytics to their customers and partners start now with good data free that makes our self service analytics platform available to you which no cost. Register today at data engineering podcast.com slash good data. Your host is Tobias Macey, and today I'm interviewing Sheila young and Philip far about how good data is building a platform that lets you share your analytics outside the boundaries of your organization. So she can you start by introducing yourself.
Sheila Jung
0:01:45
Hi, everyone. My name is Sheila Chung and I started working for good data almost five years ago as a solutions architect in professional services. I am currently a senior manager and product enablement, leading a team of customer engineers and developer advocates Our team's main focus is to empower the gdx developer community and enable our internal sales team. Thanks for having us.
Tobias Macey
0:02:07
And Philip, How about yourself?
Philip Farr
0:02:09
Hello, everyone. My name is Philip Farr, having been with good data for a handful of years at this point, I've held several different positions which have touched on various aspects of the data engineering world. my current role is as a senior manager of technical program management and customer success, where I help oversee a team which partners with our customers to provide technical expertise and data product development and ensure a positive mutually beneficial experience throughout our entire customer lifecycle journey. Thanks for having stood by us.
Tobias Macey
0:02:45
And going back to you Sheila, do you remember how you first got involved in the area of data management?
Sheila Jung
0:02:50
Yeah, definitely. I got involved in data management right after college when I joined a boutique consulting startup as a bi and ETL consultant. This is where started getting familiar with various data management, warehousing and visualization tools. That startup was eventually acquired by Tara data and I've been in the data space ever since.
Tobias Macey
0:03:09
And Philip, how about you?
Philip Farr
0:03:10
So from a professional first experience was was probably when I joined the consulting firm, which specialized in cyber security and privacy strategy right out of college. During that time, GDPR was becoming an increasingly hot topic in the security and privacy space. Many companies were searching for difficult to answer questions like, do we already have or can we create some type of data flow diagram to help us identify the right data relating to an individual? Or how do we ensure proper deletion once an individual requested that once you've identified all of this information, and where it was located, so my firm was really brought in to help help solve these complex problems. And that's pretty much where I was exposed to data management.
Tobias Macey
0:04:08
And so in terms of the work that you're doing it good data, can you give a bit of a description about what it is that you're building there and as much of the origin story as you're familiar with?
Sheila Jung
0:04:18
Sure. So good data's origin story starts with our CEO and founder of Oman, Sonic. Ramon was previously the founder for multiple companies like NetBeans, and Cisco net. And after his last company got acquired, he started good data in 2007. And his mission was to disrupt the BI space and monetize big data, add good data. What we're doing right now is we're building away companies can provide analytics directly to their customers and partners. Good data is seamlessly integrated into workflows and provides authenticated access to business reporting, dashboards and ad hoc analysis, which is delivered at a timely, relevant and customizable way for The Independent users
Tobias Macey
0:05:01
and the business intelligence space in general has been around for decades at this point with a number of different iterations of it and different generations of how the tools are used and the types of data that they're dealing with and questions that are being asked and answered. And I'm wondering what the factors are that might lead somebody to choose good data over any of the other contenders in the space and some of the use cases that it's uniquely well suited to work for?
Philip Farr
0:05:28
Yeah, that's a great question. And I think, you know, we are from good data. So obviously, we're we're slightly biased in this regard. But we really provide you with a platform to help boost the adoption of your application by embedding modern analytics. And the goal is to empower all of your end users via ad hoc exploration or templatized analytics and really remove what we would consider to be the costly customization of one off reports, right? Say internal organizations spinning out one time reporting to support whatever questions or answer whatever questions they're seeking. And so we've developed this cloud hosted solution where changes roll out very quickly. And we, we really enable those end users to continually benefit from our platform improvements, right. And this can be done through, you know, powerful analytics for any persona, self service dashboards, interactive visualizations, what what you could expect from a analytics tool. On another hand, we're really looking at flexible pricing options which help enable companies to use good data, which are very and size right. So we recently launched free and growth pricing tiers for easy entry and quick scaling into different departments or across various solutions for customers. And then, you know, rounding out those offerings we do have an enterprise offering or a long standing, offering which supports thousands of users and terabytes of data. So we're really pushing the boundaries of scale there from the flexible platform perspective, right? Like this is another one of the key differentiators is the platform is built for flexibility for developers. And we provide kind of infinite options here. We have a react based JavaScript library for creating analytical interfaces, which users and their creators absolutely love. In addition to this, there's very well documented API's which allow, you know, creators to interact with all platform capabilities via SDK is that we have developed Furthermore, I think another area is really the robust and flexible data integration where we we have the ability to support any, you know, data source or technology stack, we integrate with things like snowflake redshift, BigQuery. So all the big cloud data warehouse names, as well as the ability to ingest data through hundreds of different connectors. And this this is not limited to say small data volumes, it's any data volume, any type of connector, we can do custom connectors as well. And then, you know, rounding it out is the enterprise level security and governance structure that we have put in place. We support things like agile change management, real time user provisioning, solution monitoring, and we have a variety of compliance and certifications in place sock two, HIPAA GDPR. These are all things that we're able to help you solve as a
0:08:43
customer.
Sheila Jung
0:08:44
regarding the use cases that you're mentioning, Tobias. I would say that there are a wide range of use cases in industries anything ranging from financial services to retail management, we don't actually have a single industry that we focus on supporting and going
Philip Farr
0:08:58
off of What Sheila said, I think our primary focus has really been to develop an analytic solution, which we call powered by good data. And that's kind of our term for it. And you the best way to explain powered by good data is it really is an industry agnostic use case or a business model, if you will, where we partner with a customer. And then that customer is looking to distribute analytics to all of their customers and provided to, you know, tons of end users. And so we a typical good data use case would be to distribute analytics, that same version of analytics at scale and the context of the end users data for as few as say, five customers, but as many as 25,000. And as Sheila mentioned, right, this can be done pretty much as you can imagine, for any industry.
Tobias Macey
0:09:54
And that's one of the interesting pieces of the product that you're building where a large component of the business Intelligence market is focused on these internal analytics use cases where you connect it up to your different data sources, usually some sort of data warehouse. And you have scheduled reports and dashboards that internal business users can look at to get a temperature check and get some sort of sense of how things are going in their business may be things like inventory or sales figures, whereas with good data, what you're saying is that it's primarily focused on external analytics, where maybe a SaaS platform is providing some view to their customers in terms of the usage that they're getting out of the platform or maybe sales figures that they're tracking and something like HubSpot. And so I think that that is one of the unique things about good data where it is much more external facing and by virtue of that means that you have to take much more of a platform approach versus a point solution that you might get with something like Pentaho or a dash or a superset or something like that. And I'm wondering what your thoughts are and just the overall market of business intelligence tools and how that's evolved in recent years, and some of the contributing trends in the technology and business use cases that have brought us to where we are today and where you're going with good data.
Sheila Jung
0:11:15
Yeah, Tobias. That is a really great thought. So I would say that business intelligence tools evolved in recent years to help businesses make informed decisions. So the changes that analytics has been used for decades to help businesses make informed decisions both strategically and operationally by deriving insights from the data that they've collected in more recent years that has now shifted towards analytics everywhere. So rather than confining data analytics to a single use case or location, like you were mentioning, we are seeing an increased demand and value add for distributed analytics for your business partners, employees, customers, pretty much everybody. And the goal is to enable all the people at various levels and businesses to make these data, different decisions with confidence.
Philip Farr
0:12:03
And then from the the trend perspective to bias. I think there are a couple worth noting that we do see customers actively pursuing and building upon. So the first one that I'd like to call it is really the democratization of technology. Right, right. And this is really where customers or users are having increased accessibility to technology. And this is pretty much pervasive throughout every industry, right? And for us, it's really the change towards analytics everywhere. And it's attributed to the increased accessibility of technology and modern days reliance on this type of accessibility. So this spans all aspects of a business from say, the employees to clients or customers to stakeholders or executives, right who are need to be able to access and make decisions in real time, right, make those data driven decisions that are important. And so, you know, we're seeing this in a variety of industries. One example would be, say, the sharing economy right now, as opposed to maybe a historical approach where decisions are made just based on a couple are a handful of internal stakeholders. Right? Now you're needing to put data in the hands of many, many users, right? The people who are actually sharing or participating in the sharing economy, the, you know, the providers that the renters and how do they get the access to the data that they need to make these decisions in real time, and that has a really high influence over your business. And also like when you provide that type of analytics to to those people or those individuals it helps enable companies To overcome their competitors, right, you're giving power back to that user In a similar vein, but a different a trend, right? IoT, the Internet of Things is an area where we see many customers play businesses where they may not have realized historically what data they had access to, or how powerful that data is to provide it back to their end users, or how to effectively share that with people across different levels of data literacy. And so we're talking about more archaic industries like auto parts, or transportation, which can be slower moving and more publicly, you know, are governmentally regulated. And so it's really giving these large industries access to analytics, which they probably never had thought about. They never maybe requested it, or they may not have ever thought that this would ever come to their industry, but we're helping enable those types of companies to modernize and provide insights to their customers.
Tobias Macey
0:15:07
And the other piece that's interesting is the fact that, as you mentioned, you're developer oriented, where you're focusing on exposing a set of rich API's for being able to build analyses and visualizations on the underlying data where a lot of the existing suite of business intelligence tools are built as a vertically integrated solution where the dashboarding is just a native capability, but also likely somewhat constraining in terms of the types of representations that you can build. And I'm wondering what you have seen in terms of some of the interesting and unique ways that that capability is being leveraged by your customers.
Sheila Jung
0:15:46
So thanks, Tobias, for mentioning the API's. That's definitely something that our end users are leveraging, especially from the developer aspect of people that are integrating the good data platform into their own. analytics. And that's something that we leverage through embedded analytics and the ways were able to embed the good data platform into the clients products or their own application is in three different ways. The very first way is just through like direct embedding the iframe, where you're getting the good data reports directly into the clients app or platform. And that isn't utilizing the good data API's and other ways just to embed the link that is directly linked to the white label good data portal. And then the third way is the good data UI, or gd.ui, which is the React based development library, allowing developers to seamlessly integrate into good data with their product. So combined with something that we developed called the accelerator toolkit. This pretty much streamlines the front end development efforts, so that there's a lot of custom visualizations and integration into the customers app.
Philip Farr
0:17:00
Going off of that, I think where this really plays a role in is the, say the customer development lifecycle, right? This is where we see the most heavily or heavy reliance on API's or open API's and SDKs which really allow for that seamless integration and say a CI CD type of system. You know, the platform really provides say advanced support for release and roll out procedures which a customer can leverage to cascade across different environments and manage all of those life cycles independently. So we're talking about say, a development life development environment or a QA environment or a production or a series of different production environments based on segmentation. The platform also provides support for say on demand provisioning, right that can be driven and integrated with SSL. For example, say on demand provisioning The SAML two assertion and so you know, one of the the good use cases, I think, Tobias you'd mentioned I use cases we do have a customer who has, is using good data to provide analytics on project management software. And this is a highly technical customer and they are virtually completely self service at this point. Even though there is a lot of technical complexity to the good data platform. They control all aspects of their application in which they bed, embed good data at different levels of granularity and expose the analytics to their end users. They have very detailed customer customization that supports you know, the their preferred user experience through say, our API's and SDKs. And they manage all aspects of good data including things like data loading, user management, including role assignment, provisioning D provisioning, Front End development for 2000 plus of their own customers,
Tobias Macey
0:19:06
and because of the fact that you are serving up these analytics capabilities to developers and end users who are integrating it into their own product, whereas a lot of the business intelligence market was oriented towards the business analysts and data scientists as their end users, I'm wondering what you have found to be some of the useful guidelines or guardrails for helping your customers to build useful and accurate and easily interpretable analytics for their end users.
Sheila Jung
0:19:36
So in regards to the semantic layer here, and the semantic layer here is really the logical data model that we can speak about. This is a layer that ensures that everyone understands the data in the same way including self service users. So this semantic model can be leveraged for guided analytics and provides a shared Understanding for those analyze entities and their relationships. This means that objects that were created by analysts ones can be used by other common users and helps them to interpret the data and perform ad hoc data discovery. And this is possible through our analytical designer. So as you were mentioning, when we're migrating from the concept of individual contributors, or like individual reporters looking at data analysis, this semantic layer allows multiple people to take a look at the same data and understand it in the same way.
Philip Farr
0:20:33
And I think the magically Tobias like one of the things that you're referring to and kind of it's one of the things that we find to be most valuable about good data is this this concept of migrating away from just these internal use business intelligence tools to these externally used, say, embedded analytical products right and, and when we look at these as a whole, both have similar types of requirements for things Like data security and compliance and the, you know, great user experience for productivity, say the ease of development. Maybe it's the the different ways to integrate the data or the ability to build semantics around that data. Right. But when we consider the realm of embedded analytics, right, or it's something that we call analytics everywhere at good data, we're really looking to embed directly into that software or that application and specifically in the context of that use case for that end user. Right. Unlike the internal analytics space, the embedded analytics really requires very strong lifecycle management capabilities. In in we're referring to, you know, provisioning versioning, how do you perform releases? How do you roll that out to many of your customers or end users, right? If you have Say three customers, you may be able to build a solution from scratch every single time and maintain those changes and silos. But if you start considering, say, hundreds or thousands, or maybe even 10s of thousands of customers and up to a million users, right? How do you manage that you really need a completely different architecture and power to operate or handle that change management and develop a solution for may say, different customer segments or different ways that you're choosing to monetize that data product, or different access to different data sets. Right. And so the the difference is really, that the internal analytics is centered around, you know, personal aid and productivity, where these SAS embedded analytics provide in aid in collective productivity across users and across multiple organizations. The one other piece I'd like to add for for that is that Additional added complexity when you're considering many, many users right? In that's really, you know, how do you provision and de provision all these users but also retain, say, the complete control over different levels of access. And that could be access at the as granular as, say, the data row level, right? Or it could be access to, you know, which dashboards or which reports that they're getting or the ability to create their own dashboards. So these are all complexities, I think, that we are solving for in this world of embedded analytics
Tobias Macey
0:23:35
and digging deeper into the good data platform itself. Can you talk about how it's architected, and some of the evolution that it's gone through as you have continued to build that new capabilities and stay up to date with the changing landscape of data infrastructure and data usage?
Philip Farr
0:23:52
Yeah, I'll start with the with with the architecture piece, right. We've developed About what we consider to be a very modular set of components, which our customers can kind of slice and dice and append to one another to create this end to end distributed analytical solution. And so I'll kind of walk you through maybe data source level to say dashboard level, at the lowest level, you know, when we're talking about this end to end pipeline, we're really speaking to data ingestion, right? We have 150 plus connectors that are available for regular download of data from all sorts of source systems. We also do direct connections to those cloud data warehouses like snowflake and redshift and BigQuery. We could build out custom connectors in certain instances that sit on top of our customers, open API's as well. And so we we've done all of this in the past. From there, we run through our ETL processes and load data into To what we call a DS as the acronym, which stands for agile data warehousing service. And this is our internal data warehouse for staging and transforming the data. It has, you know, all sorts of different, you know, tables and views to support the transformations. And then ultimately, you know, once we're ready to load this to different say, tenants, right, or workspaces and good data terminology, we use a mechanism called automatic data distribution or add, that distributes data to workspaces themselves. And what happens is, you know, we can do this from a DS, we can also do this directly from cloud data warehouses if if the schemas match whatever's in the logical data model in the workspace, and so we do have a lot of flexibility there to load data into workspaces and to define workspace For for the audience. A workspace really is this and storage, which is a data mart that is loaded with a specific subset of a customer's data so that, you know when their customer wants to view analytics in the context of their data, it's just that particular subset of data that's been loaded in the workspace. You know, within that workspace, we have the logical data model or the semantic model, which Sheila had referred to earlier. And on top of that semantic model, we're able to easily build out dashboards, reports, metrics are all in the context of the business, as well as enable a key functionality of the platform, which is analytical designer, which we use as an ad hoc data discovery tool, where you can easily drag and drop, slice and dice your data so that you can come to your conclusions and insights more quickly. We spoke about SDKs right we have SDK is the interface with our open API's. And then we also provide the tools for embedding, say, the iframe, more granular embedding via our good data.ui, which leverages the react and angular framework. And throughout all of this, right, we have the lifecycle management tooling to control provisioning releases, rollouts, user provisioning, as well. And I think the guiding principle for all of this architecture really is governance and security, right? It provides, you know, a driving force behind the the reason why our platform is architected the way it is. We support you know, complete end to end SLA s we have, you know, top security for, you know, those certifications like GDPR or HIPAA compliance. And we really have good data as these platform components that we can You know, fully managed our own infrastructure and provide that design which enables, you know, high performance, largely scalable. And ultimately distribution of these analytical
Tobias Macey
0:28:14
workspaces. And with the modularity of your architecture. I imagine that also simplifies the use case of letting your customers have different integration points into your platform for determining where in the lifecycle of their data they want you to take over, because there might be some custom ETL logic that they want to do on their systems before they loaded into their workspaces. Or they might just have a data repository of a data lake or a data warehouse somewhere and they just want you to do everything end to end. And I'm wondering what the options are for people who already have an existing data infrastructure and processing capabilities to lean on good data for just the pieces that they care about. And some of the examples of customers who are hooking in at those different stages of their lifecycle?
Sheila Jung
0:29:02
Absolutely. So what you're referring to is what pieces of the good data architecture what the client wants to leverage. So we're talking about the data warehousing piece that Phil was talking about with ADF, where we could potentially get the aggregation of lots of different data sources all in one place. Whether or not that's something that the client wants to leverage from the good data side or own on their side. There's also the loading mechanism, the ATP piece, where we're talking about how the client would be able to load that data, whether or not they want to keep it on their side or actually keep it on the good data side. So the ways we're able to manage that is really the flexibility of the types of sources. We're able to download from whether or not we are doing the transformations or just loading directly into the platform. So with all the connectors that we have with these pre packaged Ruby bricks that are leveraging the good data API's as well as the source API's, were able to integrate their data and load into ADF through those connectors, or if the client wants to own, a lot of the transformations themselves, match the exact metadata output for the semantic layer or the models that are on the workspaces. They're able to load that directly in with their data warehousing source through our automated data distribution or add, especially if they're using things like snowflake, redshift, or BigQuery.
Tobias Macey
0:30:35
And in the overall system architecture of what you've built it good data, how much of it Have you been able to leverage off the shelf components for whether it's things like Kafka or pre built data warehouse systems and how much of it has had to be custom engineered because of the complexities that you're working around and having to design around in order to ensure that the entire system remains performant for a multitude of customers? In a multi tenant situation,
Philip Farr
0:31:01
yep. So from from an off the shelf component perspective, right, we're really talking about two, two primary areas, the first being front end. And the second being the back end, we are leveraging a pluggable UI framework on the front end. So this gives us the ability to use many publicly available UI components like react and angular for the data presentation and visualizations. And what really is enabled here is that seamless integration, right, you can basically build the custom client application and give it that slick, slick look and feel to match your clients vision or the you know, their customers need or maybe they have a specific style guide that they have to follow for all their internal applications that they have built. So it really has infinite possibilities, relying on this particular framework. Similarly our back end, it has As a pluggable container based architecture as well. And so that allows us to deploy these custom code, you know, custom code bits, like these modular bricks that we had referred to previously, which are essentially productized Ruby scripts that interact directly with our open API's. And in the idea is that we can deploy custom code that's written to our transformation processes, and give kind of this more flexible architecture for ETL. Management and data pipeline management. And these bricks can be orchestrated into data transformation workflows for things like data ingestion, or loading data into ABS or performing sequel transformations within the context of our internal data, data warehousing,
Tobias Macey
0:32:52
and then for somebody who's building their data product on top of good data, what is the overall workflow for being able to To go from concept to completion?
Philip Farr
0:33:02
Yeah, that's a that's a great question, right? So, you know, how do you build a data product. And the way that we approach it at data is we really strive to build the data products, which focus on a specific end user persona, right? Because we, we believe that this is what really highlights the value of having analytics, or embedded analytics in the context of an application, or site or whatever that may be. Once we understand the mindset of those end users, we are then able to build the dashboards, reports, metrics, KPIs, right, all of these analytics, which enable those individual visuals to make those data driven decisions and the context of their roles. You know, it's about understanding the problem and then getting to the answer. And that's definitely achievable through the good data platform. The areas that this can be broken down into when we talk about actually implementing an embedded analytical solution really is in five main areas. And the first is getting that data, we need to extract and consolidate from, say, your application or your data warehouse or flat files that you're pulling from disparate areas of your business. Maybe it's a connection to a third party that we need to augment, you know, your internal data warehouse with some other tooling that you're using outside of your infrastructure. And then we use this to create the the data model. On top of that, the next thing that will do is build out the analytics. This is where we create those dashboards, reports, the metrics, right that are relevant to the questions that are looking to be answered. And this is something that we think every single one of the customers should be looking to answer right like this is the standard. Once we have that standard. We go through our release and rollout processes. And this is where we often refer to something as a template or a master. Right? This is the standard reporting that everyone will get out of the box. And we take these lifecycle management tools, and we help, you know, perform this rollout on either standalone or maybe a set of embedded reports or dashboards. And we disseminate that to the entire customer base. And once the customers have access to it, right, and they have access to their data, they can go in and finally customize and build their own set of custom reporting. Maybe they are looking at a different aspect of the business or maybe they're looking at a reorganization that they need to solve for. And so we're really allowing the flexibility on the end user to to formulate the insights that they need. I think the final piece of an implementation is how do we operationalize everything right once it's in place, we need to manage and draw The entire end to end lifecycle for all of those customers, right. And we do this through, you know, things like provisioning and de provisioning new customers and, and new users. Or maybe it's in terms of managing growth and scalability, or monitoring actual usage on the platform. So I think these are all of the key steps that we walked through and building a, say, a brand new data product.
Tobias Macey
0:36:28
And then the other component of building the data product and the perspective of the customers in terms of working with their data is I'm curious how the overall lifecycle of the data flows through the good data product from when the customer first collects the data through to delivering it to their end users and ensuring that the overall experience is as performant and robust as possible.
Sheila Jung
0:36:56
So the design that we have to work around to Ensure performance access to the customer's data sources in the case of creating a data product, I think comes from two sides. First is from the customers data side, understanding the customer's data instance, in the case that we're actually pulling things from the data warehouse, or they're sending us files to be ingested into HDFS. Those are the kinds of things that we want to take a look at what is the actual health of their production instance? Can we have access to separate schemas and views to make sure that we aren't negatively impacting their production environment? What is the size of the data that we're putting in that we need to architect around? Do we need to provide some sort of incremental ingestion logic with deletions and all that kind of stuff, so that we can handle large volumes of data in an efficient manner. Another thing to consider is when we're looking at a client's data from ingestion to something that will match our logical data model They're going to be very different. So from an analytical toolset, you don't necessarily need things at the most granular or transaction level. So when we're looking at things like that, how are we going to maintain some sort of data retention policy that matches the client state of what they're giving us, versus what's actually going to be on the platform. So those are the things that we might need to think through and architect around. Another thing is, as I was mentioning before about an aggregation of different data sources within our abs layer, this is the kind of stuff that we like to showcase where we centralize our data pipeline across multiple data sources. So this is something that the customer can leverage as well, if they want to look at a various amounts of data sources. So let's look at like a sales example. They might have their own transactions separately within their MySQL database, but then they also want to pull in their Salesforce data. We can aggregate all that information if it's not available for them. In their own data warehouse, and the last piece about just the data side is really looking at that custom connector piece. How are we actually getting that data over from their side over to us. And historically, we've had experiences in the past where we were able to build up these custom breaks or connectors, so that our clients would be able to have their data migrated from their side over to good data. On the other side of this. In terms of the data migration, I would also say there is a performance level that we like to acknowledge from the platform perspective. So when we're looking at large amounts of data, or if we're looking at near real time analytics, there are things on the platform that we want to consider as well, like pre caching, maybe there's a very important meeting that a lot of people like the good data analytics for, we want to ensure that everything is cached before these kinds of important meetings or stuff like that. So we have scripts that make that possible. Another ability that we're able to give to our clients is the option for different hardware to handle high concurrency or high data volumes. And there are also a lot of different ways that we've enhanced the logical data model to make sure that it is very reasonable to have performant access for for our clients. And one way that we were able to do that is through many to many relationships. Rather than duplicating data and increasing data volumes on that data mart on the client workspace. We can leverage the many to many functionality on our data models to help clients with that kind of access.
Tobias Macey
0:40:44
And in your experience of building the platform and working on it yourselves and working with your customers to ensure that they're having successful outcomes. What are some of the most interesting or unexpected or challenging lessons that you've learned in the process,
Sheila Jung
0:40:59
I would say The most interesting part of working on client implementations of good data is the breadth of business use cases we are developing for our clients. So no one use case is is the same. They're all very different because all these different businesses have different business models and different data models. So I would say this was especially rewarding because of the constant innovative data challenges that we had to overcome with the good data platform.
Philip Farr
0:41:27
My take on this one is based on all the implementations I've seen, It never ceases to amaze me. But there's always, always a customer that's trying to push the boundaries of platform functionality. Right. And this is always a tricky one, especially in a customer relationship standpoint. You want to help them meet their needs, right within all of the platform limitations, and of course, they want to do something new and creative and you want to enable them to do that, but we need to find out held the media without compromising the end result or eventually setting that client or customer up for eventual failure due to a lack of sustainability. And so I it's it's always very, you know, interesting and poses a unique challenge every single time it comes up because you have to go back to the drawing board and be like, these are all of our confines, how do we build something new, right, or maybe it is be we reach out to product to help us extend some of the existing functionality, but oftentimes, the solution needs to be more immediate.
Tobias Macey
0:42:36
And one of the interesting pieces of the overall good data platform as well is the fact that you have introduced a different interface for being able to define the logical models using the MA qL, or multi dimensional analytical query language dialect. And I'm wondering if you can give a bit of a compare and contrast between that and SQL and some of the benefits and additional functionality that mackel provides.
Sheila Jung
0:43:05
Yeah, so mapple stands for multi dimensional analytical query language. And this is good data is proprietary query language for creating metrics or aggregations of the underlying data that is defined in that semantic layer. This difference for this differs from SQL in that it's a streamlined analytical language with less code to write and maintain. So less technical folks find mackel easier to use. And we found that anyone familiar with SQL is able to pick up macro very quickly. The key advantages that I would state about mackel is one working with a good data platform, it works out of the box. It's something that is innate in our platform. It's also multi dimensional. So this pairs very well with our semantic layer. And going back to what I was saying about less code, what happens is there are no joins or sub joins in macro queries. are, are stated to define a metric because it works on top of the logical data model. So these queries are already context aware and talking about the context aware piece with everything being semantically related through the logical data model, any metric can also be immediately use for reporting and can be reused again. So this is something that can be utilized for all of our clients doesn't need to be rewritten. It can be pre composed, and they use by thousands or 10s of thousands of users across their reports. There's also the composability of these metrics. So you could put in nested metrics, build foundation metrics, that way, when you drag and drop this into your analytical designer interface, you could apply different filters and all that kind of stuff. So there is that capability as well. And I would say the last piece about a key advantage of Mako is resiliency. So a lot of the times when there is some like source of Target mapping of that will require some sort of significant refactoring of the actual data model. This sits exactly on the semantic layer. So there there really doesn't have as a serious impact on the existing metrics or reports. Unless, of course, there was something that like a serious LTM change was made on the front end that needed to be released or rolled out. But I would say in general, the benefit of mackel is less code. It's context aware because of the semantic layer. composing metrics and reuse is very easy and the resiliency
Tobias Macey
0:45:35
and in terms of good data, what are the cases where it's the wrong choice, and someone might be better suited using a vertically integrated internal platform or building out their own analytic solution for exposing to end users?
Sheila Jung
0:45:49
Yeah, so if you're seeking a single static data visualization for your management team, or maybe a public chart for your website, good data is will be too complex of a platform for you. In this case, using data visualization tools or libraries would be better suited. However, if you plan to do anything more than just a simple static data visualization, you would need to find something that's more reliable, that does have this Lifecycle Management, pretty much what good data has.
Tobias Macey
0:46:20
And what do you have on the roadmap for the future of good data in terms of new capabilities, or just overall improvements or new use cases that you're looking to provide?
Sheila Jung
0:46:31
Big Data continues on the trend of analytics everywhere for everyone, and this includes improvements on data integration options, bringing better data visualizations on the front end that is available through analytical dashboard bringing better collaboration between data engineers and analysts and improving the Self Service analytics ease of use for these non analysts.
Philip Farr
0:46:55
And to add to that, I think another another component of our our roadmap That's really exciting for us is we're working on a newer Kubernetes based deployment option of good data, right. And this, this really will help us enable co locating the analytics with a SaaS application that may be deployed, say in a public or private cloud platform or in a local on prem data center. So the goal is to enable the same functionality that when we get out of the cloud hosted good data platform, maybe for companies that need more enhanced control or stricter guidelines, or just want to feel like they have the full ownership and it's less as a managed service that we are providing to them. Are there any other aspects of the space of embedded analytics or the product that you're building a good data or anything else in the business intelligence and analytics space that we didn't discuss that you'd like to cover before we close out the show? I think that we've touched on a lot of the key aspects and the reasons why we think that Good data has a competitive advantage. So I think I think where, from, from my perspective, we we touched on a lot of things I hope that larger audience would find useful and that we kind of present like information from the good data perspective, as well.
Tobias Macey
0:48:20
All right. Well, for anybody who wants to get in touch with either view and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And as a final question, I would just like to get your perspective on what you see as being the biggest gap and the tooling or technology that's available for data management today.
Philip Farr
0:48:36
Yeah. I mean, that's, that's always the big question. Right, as is what's next, where where are we heading? I think from you know, I see that there could be improvements in terms of the data cleansing area of data management and data engineering, right. I know, you know, personally. A lot of people, including myself spend a lot of time debugging or cleaning up data sets or trying to, to ensure that, you know, the data is clean enough to run and 10 through ETL. And, you know, part of that is is error handling. Part of that is, you know, maybe removing records, right, which could lead in some type of inconsistency. Right. So, you know, the area I would like to see development on would be some, you know, very customizable data cleansing solution that has a very flexible integration with other analytical tools, you know, drag and drop, similar to how we mentioned earlier, you know, a pluggable UI framework, right, is there a way that someone could build a solution that would integrate directly into our native technologies, and we could customize that and deliver that as a packaged option to our customers as well and really limit the amount of time that it takes to process handle all of the data and troubleshooting and limit the troubleshooting and hopefully free up time for more value add activities.
Sheila Jung
0:50:11
And to add to them, I would say not necessarily a big gap, but a big change that we would probably see in the features as more users are getting access to tools where they have access to the data, maybe they didn't have access before, as Phil was mentioning, in a use case earlier, we're going to need improvements to make semantics and relationships in general and data a little bit easier to understand. So even though we do have like a semantic layer and other tools may have something similar to make it easier for for their end users to actually utilize I predict in the future that we would need to simplify this even further for a wider audience.
Tobias Macey
0:50:50
All right. Well, thank you very much for both taking the time today to join me and discuss the work that you're doing with good data and empowering embedded analysts. for end users and making the overall analytic space more accessible, it's definitely a very interesting product. And I had a lot of fun learning about it as I got prepared as I prepared for the show. So thank you both for all of the time and energy you put into that, and I hope you enjoy the rest of the day.
Philip Farr
0:51:14
Yep. Thank you, Tobias. Thanks for having us today. Yeah, feel free to share our contact information. We're happy to communicate offline with anyone who has any open questions or concerns or follow ups that are needed. Yeah, but we appreciate your time as well. Thanks.
Sheila Jung
0:51:29
Thank you so much, Tobias. Thanks for having us. Hi.
Tobias Macey
0:51:37
Listening Don't forget to check out our other show podcast dotnet at Python podcast comm to learn about the Python language, its community in the innovative ways it is being used and visit the site at data engineering podcast comm to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts EPS data engineering podcasts dot com with your story and to help other people find the show. Please leave a review on iTunes and tell your friends and co workers
Liked it? Take a second to support the Data Engineering Podcast on Patreon!