Business Intelligence Beyond The Dashboard With ClicData

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means?

Our friends at Outland started out as a data team themselves and faced all this collaboration chaos.

They started building Atlan as an internal tool for themselves.

Atlan is a collaborative workspace for data driven teams, like GitHub for engineering or Figma for design teams.

By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams to create a single source of truth for all of their data assets

and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.

Go to dataengineeringpodcast.com/outland

today. That's a t l a n, and sign up for a free trial.

If you're a data engineering podcast listener, you get credits worth

$3, 000 on an annual subscription.

When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Pacaderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linode

today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today I'm interviewing Telmo Silva about ClickData. So, Telmo, can you start by introducing yourself? My name is Telmo Silva, and I am the CEO and CTO of ClickData.

And ClickData is a business intelligence platform,

data management platform as well, that basically allows any size business, small, medium size business, or large corporations to basically incorporate their data into a single data warehouse,

create visualization, and publish that out to the wider audience.

And do you remember how you first got involved in the area of data?

Absolutely. I remember exactly the day, in fact,

that I said, I gotta stop working for this large corporation that at the time I was working for, a large pharmaceutical

global company.

And after working with them for 5 years and traveling over across 40 countries, you know, hiring an army of consultants and internal people to support

Oracle data warehouses and

buying visualization

tools like Tableau and Spotfire and things like that.

And being totally depressed 6 months after implementing that in 1 of the affiliates or 1 of the business

units, going back and seeing all the customers still using all the business users using Excel.

And that frustrated the crap out of me. And that basically came to that point in time, that realization to say

something is not right. We've been doing business intelligence for years.

And still, you know, after spending 1, 000, 000 of dollars in both services and interfaces,

scripting, database licenses, servers,

and whatnot, and we're still here. We're still using Excel, and people still don't trust

or are not being satisfied by what we call business intelligence as it is today. And that's the day that I decided to jump off and saying, you know what? I'm gonna stop whining about other people's applications and problems, and I'm gonna create my own application

and so that I can whine about it myself. So there you go.

And so you mentioned that Click Data is a business intelligence platform. And from the looks of it, it seems to be sort of an end to end option versus a lot of the

more

narrowly scoped tools that people might call business intelligence, like Tableau or Superset or,

you know, some of the previous generations like Pentaho. And I'm wondering if you can just describe a bit more about what it is that you're building at ClickData

and why it is that you decided that you wanted to spend your time and energy on this particular problem domain. Hey. You're absolutely right when you mentioned, you know, the narrow scope versus this is what marketing does to us. Right? You know, remember the days of web 2.0 and big data and IoT, and still people have no idea what that means. Machine learning is another 1. Artificial intelligence. Right? And then when you get down to it, what is business intelligence? Right? Is it just the visualization piece? Is it the data warehouse piece? When we talk about data lakes, data warehouses, data marts, what are all these pieces? Right? People just throw away these terms left, right, and center. And, ultimately, you know, for a medium sized business or an easy business for that matter,

you know, before starting to predict the future and and use advanced statistics

and predictive analytics to kind of figure out where they're going, all they really want is something that explains the past. Right?

And, you know, many years after, we're still chasing this holy grail of, can we have a stable reporting system that people don't have to log in to every day to find out where they are, but, you know, proactively tells them, you know, this is where you are, this is where your business is at, Gary is where you need to look at. Right? Simply.

But

visualization has never been the issue. Right? We have amazing graphing platforms. I mean, even Excel has evolved over the years to create beautiful charts. And if you wanna go all the way down to d 3gs

and create your own amazing and creative visualizations, you can do that. Right? All those things are possible today. Digitalization is not the issue, or at least not the only issue.

Where we spend most of the time, 60, 70 percent of the time, is really massaging and treating the data and making sure the data talks. Right? We love the cloud, but the cloud has created another barrier for users to get to their data. Right? How hard is it to get your Facebook data out? How hard is it to get your LinkedIn data out? And if you're having a hosted database on Amazon or or Azure,

do you have the keys to go through, you know, all the firewalls and so forth to get to the data? Of course not. That's protection.

But you're removed now. You're must much more removed than you were before.

And the promise of APIs

is another 1 of those things that went the wrong way. Oh, you have a REST API? Okay. Then it's fine.

No. Of course, it's not fine. Not 1 REST API is built the same way as another REST API.

There's different understandings of standards. So all these challenges make it so difficult for a business user and even an IT department within a company to assimilate all this data

and create a data warehouse with all the relevant data that is needed to report on. The crux of it is

visualization

is not independent. Tableau by itself is useless if you don't have a database

for all the terms that you wanna use, data warehouse, data mart, whatever you wanna call it. Data storage

that has clean data

properly updated frequently. Right? And that's been our goal. We built connectors,

native connectors to as many systems as we can

to feed that database and to create a catalog of data

that may or may not be related to each other that can then feed the data visualization and then furthermore

down processes.

And in that, we've connected both ends of the world. Right? The ETL tools such as Informatica and Talend, all the way down to the databases, Oracle, SQL Server, MySQL,

all the way to the database tools like Excel and Tableau and Spotfire,

all the way down to the portals such as SharePoint and other things. Right? So, again, the whole gambit of tools is now all within the same platform

with APIs, where data lineage is is assured. You know where that data came from, where it started, and where it ends. And that's really was our holy grail. That was our vision. Huge vision.

But I felt that was the only way we could solve this problem.

The idea of business intelligence is not just the

terminal point of I have a graph. I can see, you know, what is my trend of sales over the past 6 months. Like, that's not the entirety of business intelligence. That's just what somebody sees when they look at the dashboard. Like, business intelligence, as you said, is the entire operation of getting to the point where you can answer that question.

And I'm wondering if you can just

characterize what you see as being the current state of the market for business intelligence and where it stands now given the fact that it is decades old and has gone through several generational shifts and

maybe talk about the

technical and organizational

capabilities and systems that are necessary to be able to effectively run a properly engineered

business intelligence system that is not just a dashboard at the end of the day. Right. No. I frankly don't believe I mean, there are some new players on the market like ourselves

that bring a product similar to ours.

I do not believe that today much has changed. I mean, Tableau has been recently acquired by Salesforce.

Looker has been acquired by Google. I mean, all these are acquisitions of technology and potentially people, but in essence, nothing much has changed.

If you want to implement these tools for the data visualization, because, ultimately, business users, they don't care about your SQL script interfaces, API rest, and databases.

They care about those charts at the end of the day,

right, and how they can make decisions.

By the way, I don't believe that BI ends with those charts. I think that's just the beginning of BI, in fact. And we can talk about that and those ideas a little bit later, but I think that

there's more to it than just the charts. But, nonetheless, going back to the current players, I don't think much has changed in the fact that if you want to do things proper, you have to have these connectors to go suck your data out from different systems

to put it in different staging areas and the data warehouse and that you need some processing of that data to cleanse it, to make it match against each other. And all these requires tools, interfaces, coding,

and, you know, potentially a lot of internal people or external people to make it work properly. The scheduling, the backups, the security,

you name it. You have to put all that in place if you want to do a solid business intelligence.

Or rather, let's not even call it business intelligence. Let's just call it, you know, let's make this company data driven.

Right? You're building a tool where you are easily able to reach the data, to treat the data, and to analyze the data. That's really all we're trying to do to the business users. Right? Because as soon as you think you have the business figured out, they're gonna throw another monkey wrench at you and say, but now I wanna see it this other way. Right? Because my competitor has changed or because the market conditions has changed. Oh, COVID came into play. Okay. Now we have to change our entire forecast. Can you give me the numbers? You know? All these things are dynamic, and people think

especially us, engineers, sometimes we think we want things cookie cutter. Right? We want stable

business processes so we can code against. But the truth of the matter is that that is a fallacy. Right?

Stable in business is something that never happens. New change in management, new acquisitions, the market, it's constantly evolving. So by creating a data centric kind of approach to it and saying, yeah, data is really the key. And once the visualization has to be flexible enough to handle these things. But that entire pool of tools and technology

has to be very easily accessible. So, yeah, I don't think much has changed with the traditional players today, and that's why Click Data, in fact, is enjoying the success that we are. But there are other players in the market as well which have very interesting items

to bring to the table as well. And some that even go beyond,

you know, just traditional business intelligence such as predictive analytics, automated predictive analytics, which is pretty cool to see as well. Right? To your point of the sort of dynamism,

that also brings into question how you are structuring the data in the storage layer that is actually driving these charts. And then you also made some good points about the visualization

is not the end goal. That's just a step in the path of actually

making the data useful because,

you know, just saying, okay, I see a chart. That doesn't do anybody any good if I know what to do with it. And that's also another

major trend that's been happening in the past few years is people realizing that, you know, just because I'm able to give you a chart of our sales projections for the next 6 months, you have to do something about it to actually realize those projections.

And so, you know, where's the button to say do the thing?

Right. Absolutely. I mean, a lot of people have talked about embedded analytics,

which was the simple fact of placing a chart inside another application.

Right? So it's closer to your transactional application. Right? So if you're talking about a medical system

or about, an invoicing system, as you enter your invoicing, you see right beside a little dashboard or chart of the invoices for these clients so so you can provide better support and so forth. And that's a great notion. Any decent BI system needs to have that. But there's the other side of it as well is why don't we have embedded applications or embedded data management inside dashboards as well? So we can also look at click data, for example, where you can build a dashboard with a lot of charts and visual elements, but right beside the data form that the user can say, well, this is not looking great, so I may wanna change my projections for the next month. And immediately on the dashboard,

enter data that gets sent back to the database behind the scenes that immediately recalculate their projections, right, on the spot. So this is a slightly different approach

than what most have taken. So I believe there's a a lot of room like that for innovation

around all items within BI. I mean, if you think about it,

except for Steven Pugh bringing the bullet chart, I don't know how many years ago, 15 years ago, no additional innovations on data viz has has come along. Right?

Very few have innovated visualization. The same way that very few have innovated

if you look at databases as well. You know, we talked about in memory

processing. I think that was 1 of the big things back a few years ago. We talked about columnar store of databases.

We've talked about ADAP and all these different, you know, data lakes and NoSQL

type of databases.

And we've talked about it vertical and horizontal scaling of databases.

But this has been around for a long time. Right? Everybody knows put the data in memory. It's faster than on disk. Yeah. That's in memory. Like, caching has been around for ages. Right? But has there been any

major strides in terms of bringing data faster

and not by simply throwing more workers and scaling cloud and putting more electricity into all these CPUs. Right? There's gotta be also an intelligent way

of analyzing this data, including

recent technologies. PostgreSQL

has done a phenomenal job in moving

the scale in terms of improving their performance

and trying to innovate within the limits of of databases. But, nonetheless, there's still a lot of room for innovation in these areas in my view. And I think, you know, what it requires

is, you know, players again like ourselves, which try to do the maximum they can on both sides of of the fence to kinda push the limits of the current technology.

Yeah. So that's definitely something I'm looking forward is to see what else can we do. Let's stop looking at this just as you say at the previous chart at the end. What else can we do with these things? Right? How can we get the users more involved? How can we get the technology more involved? And as businesses are

trying to

realize these capabilities or build out a business intelligence system, what are some of the

complexities and challenges that they might face in either building it themselves or if they do decide to go with a vendor solution, understanding

what the various options are actually going to provide for them? Based on our experience with our customers, a lot of their challenges is mostly in the data acquisition stage.

And they're very

stuck, you know, for lack of a better word. You know, I have my data in system x, y, z. It's cloud hosted or, you know, it's provided by a certain vendor,

and they don't give me the keys to the database. Right? It's not something that I can, you know, easily tap into.

Typically, it's an API or

there's way too many security along the way. So how do I tap into this information? This is my data, and yet I can't reach it. That's challenge number 1 for any business. Right? So that's the first step. They will need to potentially look at developers or look at

some sort of a data broker

software of sorts

that has connectors for their systems that they're in use, and that can bring it back in a

format that is digestible for them. Right?

Failing that, most likely, and this is what 80% of them do, they just do exports. Every major application, to a certain extent, has a little export feature somewhere

that can export into CSV or Excel,

some or partially

some data or their entire dataset.

And that's the process they will have to live with, basically, to export every time they want to do analysis to export it and kinda redo all their entire processes to get to the end. So that's challenge number 1. Challenge number 2 is if you have to do that across more than 1 dataset,

and you have to kind of start cleansing and matching the data. Right?

Do businesses have the required skills to know a little bit about SQL, potentially, if they put that on a database?

If they stick with Excels and whatnot,

are they experts enough to, you know, do things like XLOOKUPs and VLOOKUPs on Excel and kind of start

managing things a little bit,

you know, homemade,

in a homemade fashion?

And is that really a repeatable and profitable

thing for them to do? Right? If you're a doctor or if you're in a profession where you can bill a $100 an hour for doing your job, and you're spending 2 hours copying and pasting data with Excel, that's $200 of your billable time that, you know, for what? For you to get a pretty chart at the end. Right? Which,

in a sense, what you need to do to achieve that. And you'll get faster with time, but it's a challenge for them. Right? They don't wanna do it. That's not what they wanna do their business for.

And the third challenge is actually visualization.

Well, we have a lot of clients that come in and saying, you know, we love 3 d charts. We love all kinds of fancy

charting and everything else, and we always try to help them along the way. You know, does a pie chart really fit?

You know, you have 500 categories. You're gonna see 500 little pie slices.

Is that something you really wanna do? Will you be able to make an informed decision based on that when your pies,

you know, are basically an eighth of an inch, you know, wide there at on the upper end? So, again,

visualization is not for everybody and presenting

some people try to put too much into the dashboard. So, again, in my view, it's a true

career or a true academic

effort to know how to properly visualize data in my view. It's no different than any other of the sciences that we have, how you should visualize data, how you need to bring attention to the things that matter.

And then the 4th piece is the automation of all of this. You really don't wanna do this over and over and every day and every week. You have a team of 5 people, 10 people, and you have to continuously email them manually.

Your cool dashboards. It's just not worth it. So how do you automate that, and how can you script it? Right? So that scripting,

that word alone invites you to say programming.

Are these business users programmers, or do they have teams that can program? If they do, then that's fine, and that becomes a cost center for them. Or they externalize that. Right? But in this day and age where you can zap here and do cool things like that, you know, can't we make those things simpler as well? So, again, those 4 challenges to me are probably the biggest ones that they face among others. But those are also the ones we're trying to solve for them. So can you describe a bit about what it is that you have built at Click Data and some of the ways that you've architected the platform to be able to make those different challenges scalable and maintainable for you so that you can provide them as a service to your customers?

1 of the things we decided earlier on is we wanted to go with a global cloud provider because

we quickly understood that and this was already a few years ago before

GDPR, before the California Protection Act, before, you know, protection of data

became a very, very hot topic. It was a hot topic. It's always been a hot topic. But in recent years, it became even more so. 1 of the things we wanted is to make sure that we had distributed it storage across the world. So we needed to go with a big, you know, large scale provider, either Amazon or Azure. So we've selected Azure for a variety of different reasons,

namely because we use dot net technology in many of our

applications.

Other times because, you know, we felt that, you know, we had a very good affinity with SQL Server,

and we've modified a lot of things to work with SQL Server in a more performant way. So we felt that was the right approach for us. So that was the first thing we want with the large cloud provider. The second thing is we wanted our application to be architected in such a way that it was highly scalable

in the areas of how the tasks are processed, where those are data treatment or data loading tasks. So we wanted to even before the word microservice came along, 1 of those great words,

that goes with technology,

we wanted to create this concept of micro tasks

that have very specific

tasks to do, very narrow as much as possible. Now we can duplicate and replicate across the different data centers

that we host click data for our customers.

So, also, the proximity of the data to the application was very important to us. So we wanted to make sure that our application

was delivered in sort of what we call internally bubbles. Some people call them clusters,

but at the same time, talk to each other.

Because we realized that some of our customers are international customers,

and they do require that data to be present in 1 country, but yet still be performant and accessible in other countries. So we wanted to make sure that our dashboards

were accessible, that the data that the application itself was accessible.

And the last piece is we did not want 1 single downloadable,

you know, fat client installed.

It's not because we don't like Windows or Mac or Linux. It's just because we want a browser. And we were lucky enough, unlike, you know, Tableau's and other companies that started many years before us,

they had 1 technology. Right? They had, you know, built for Windows. That's most likely where you need to be. The browsers were completely

nonstandard.

Flash was rampant all over the place. Right? Silverlight was confusing everybody. Right? So

we started a time where

the world was made

simpler, at least the Internet world was made simpler,

you know, by substantially simpler. It's still complex. But

by things such as no more flash, no more silver light, everything is HTML 5, whatever. Again,

that standard meant to some. But there was a huge standardization

based on the Chromium engine, Firefox, and Safari. We had 3 browsers basically to deal with as opposed to the infinity that we had before and the little quirks.

In the grand scheme of things, we came at the right time as well where cloud was accepted.

Platforms such as Salesforce

already broke the mold in 2, 005, and they've been building trust across different industries for many years. So when we started,

we were at the right time.

Building for the browser

is a huge architectural advantage. Right? Because now we can offer this to all the platforms. Right?

We love the Macs. Right? There's organizations, even government institutions, that 40% of their employees and their users are Mac users, and yet they can't run Tableau. Right? They can't run the desktop. It's Windows only. Right? So we were, again,

adamant about everything has to be browser based. The designer dashboards, the ETL tool, the management of their data warehouse, the whole thing, all 1 application,

full, a 100% browser. So all those things put together kind of made up the DNA of what Click Data is.

Right? There's a lot of small things underneath. Right? How do we process data faster? Right? How do we make these data acquisitions a little bit

smarter, more intelligent? Right? Does everybody understand what pagination is on an API?

And do they know how to request a refresh token of OOS or not? Right? We made all that work for them behind the scenes. They don't need to know that stuff. Right? We take care of them. You you wanna connect to this system? Okay. No problem. We know that API. We know how to to refresh tokens, how to page the data, and how to bring the data into your data warehouse

clean. So there's a lot of intelligence

in that area that, you know, is kinda hidden away from our customers.

An interesting parallel to what you're building is the idea of the modern data stack where each of these different concerns are basically a different SaaS provider. So you're using Fivetran with dbt cloud and Looker to be able to do your end to end system.

And I'm wondering what you see as the

advantages that you're able to

realize by virtue of being the sort of end to end system of collecting the data from the source systems through to delivery and analysis of the information and some of the optimizations that you're able to

build into that full architecture

because you have awareness of what all the operations are at each of those stages versus having them be

decoupled with a very loosely defined interface?

I think there's 2 points there. The first 1 is if you look at an intra customer scenario, right, where a customer loads, let's say, Facebook data, you know, page likes,

We know that the customer has made disconnection. We know what the data looks like. We have actually formatted the data.

So when it comes to the visualization site,

we're not gonna propose a visualization that is not pertinent to that type of dataset.

We can be a lot more intelligent in that sense to say, well, we know this is page likes, so you probably want to see page likes over time. Immediately, we can start proposing more intelligent

visualizations.

Right? And this concept of data lineage is very important. Right?

It's not just because of knowing, okay, this data comes from there, but also to be more intelligence along the way. On the data transformation side, if you're trying to merge Facebook with Twitter and to create a social media, you know, join view of your data to get a bit of a social reputation.

Again, there's intelligence there to say, well, I know what this column is and I know where that column is, so I'll propose that join for them already. Right? They don't have to think about those things. So that type of intelligence

in a decoupled situation

is very difficult. It has all to be programmer driven.

Right? Because the programmer at that stage does not know where the data came from. It has been decoupled,

so they have to add that intelligence themselves to these interfaces.

So that's kind of the issue. It's exactly that. The second 1 related to that is updates.

I mean, Facebook changes their API

pretty much on a monthly basis, you know, adds columns, removes columns, adds a new endpoints, you know, terminates a new version, etcetera.

The effort involved in that is quite large

to keep up with them. Right? Now imagine doing that across all the cloud platforms that we support

for a decoupled situation becomes a very, very tough because if the beginning breaks,

everything else breaks. Right? So, again, it's important that your system is built in such a way that it's coupled

not because,

you know, people think, oh, you decouple to minimize

breaks, but in fact, it's the actual the opposite here. You couple, in our case, so that you can continue to work with the new data irrespective of the fact that the first piece has not worked because we can either circumvent that or, you you know, the data warehouse and the dashboarding still continues operating after that fact.

So that's in an intercompany situation.

On an extracompany

situation, the intelligence that we can give by having this data coupled is also quite interesting for us. And, again, I'm gonna give you a bit of a case scenario here, for instance,

where, let's say, you have 2 accountants using a platform

such as ours. And the first 1, you know, gets an account, starts building an awesome dashboard

out of their, you know, QuickBooks data and a few other systems potentially. Right? And they built a great dashboard, and the system is learning at the same time. It's saying, okay. This person has built a dashboard that looks like this and this from QuickBooks.

Can we utilize that? Not obviously the exact data, but can we utilize that for the next accountant that comes in and connect similar looking data

onto our system?

Have you tried building a chart that looks like that? Right? So these kind of auto suggestion, right, that we see everywhere in Google search and so forth,

but we have not done that. Best of practice in terms of visualization.

We're all building the same charts for over 50 years. It's always the same. That p and l, how many times we have to build a p and l, how many times we need to build a chart that sales over time? It's always the same chart.

Can we be a bit smarter and say, oh, I know this data? It's coupled. I know what it is. So most likely, and based on the experience of other users, I believe this is the right visualization for this.

And the third is the actual data identification itself.

I think we can also do a lot smarter things around the identification

where even though we may not know exactly the data, for example, if you load an Excel

with column names in Japanese, the data is all in Japanese.

You as a data analyst, you may not know Japanese. How do you know what that represents?

There's a lot of metadata classifications,

open data.org

type of things.

Wikipedia. There's I mean, there's a lot of data sources, Wikimedia and so forth, that we can start tapping into

to DNA the data and kinda say, oh, even though I don't know where this data came from,

I can kind of figure out that it has these characteristics,

which by the way are the same characteristics as other data, which I do know for. So maybe it is similar data. And these type of auto suggestions and proposals

quickly advance the work of a business analyst, a data scientist, or just a normal business user in terms of achieving what they wanna achieve in the end. Can you talk a bit more about the workflow for somebody who's using click data to go from, I'm a new customer. I wanna get set up to I have, you know, these various sources of data loaded into the system. These are the dashboards I want to build and maybe talk through a bit of some of the feedback mechanisms that you've built into the platform to be able to look at a chart and say, this isn't actually what I wanna do. I wanna input some additional information and have that update these

projections. Right. From the point in time that you start a trial with Click Data, you get an account set up, you pick your region, You know, you have a couple systems to connect. You know, we're talking about

30 minutes to a couple of hours of work. Again, if you have the keys to the data, your data sits in a platform that is credentials, you have the credentials and so forth, you can get your data into Qlik Data in a fairly quick manner. We can talk about volume of data and how much data you have, etcetera, but I'm talking typical terms of a medium sized business where we're talking 2, 5, 000, 000 rows. They can get that up on their 1st day within Click Data. Once the day is there, it's it's a question of understanding what do they want to do. And I think that's the most difficult part for anyone is to say, okay.

I have all this data. What do I wanna do? What am I interested in? And a lot of people start from the fact of going data from data to develop metrics. So I have this data. What metrics can I develop? Right?

And that, to me, is actually the wrong way to go about it because,

you know, you can load a table of data with, you know, 50 columns. There's a lot of metrics there that you could do. There's a lot of different variations.

That is not as important as to understand, okay, what do I want to know

about

this topic, my business situation, my sales, logistics,

whatever the subject matter is. You know, I always recommend turn off the screen,

grab a piece of paper, and tell me, draw it a little bit, sketch it. What would you like to see in front of you every morning when you wake up, you know, when you're getting your coffee? Like, tell me, you know, you run Click Data.

What is the first thing you wanna know? Well, I wanna know how many customers joined yesterday. I wanna know, you know, how much new revenue we generated, how many customers we've lost. So you start defining your key indicators to say, these are the topics that matter to me. If you're obviously in a more operational or departmental position, then your metrics may be a little bit more detailed and more expanded. But in essence, the work is the same. You have to sit down and think, okay. This is what I wanna see. And now you're gonna say, okay. Does my data support that? Right? My data have the elements. So that's the time

that it will take 1 of our users or customers

to think about what they really wanna do with click data because it just opens that door and say, oh, k. So that was pretty easy to bring all my data in. Now what do I do with it? Right? So they always start with our tool thinking that's gonna be a challenge for them. And then when they see the data, they kinda go, okay. Now what do I do with this? Right? And they start creating all kinds of fancy dashboards and all that, and pretty much it becomes a jumble of chart and indicators

as opposed to thinking about it. Okay. Let's think from the top down and see what do I wanna see. And that, again, it depends on type of customer, their experience as well. Listen.

Dashboard BI as a whole and and dashboard, I'm not gonna sit here and say, you know, it's all kind of automated and is easy and, you know, it's point and click. We all know that's not true. There are cases where, you know, you're trying to calculate last 30 days, average sales,

medium, whatever. That's a formula. You need to know your formulas. Right? It's like Excel. You know, it's easy to say cell a plus cell b equals that's the sum. That's great. But it's not as easy to do a regression, a linear regression,

or, you know, kind of a vlookup. So there's different levels of expertise, and there's different levels of needs. So, again, it depends what they're trying to do. Are they trying to do dashboards that are very interactive, you know, that you have, like, 50 filters, and based on those filters, it's gonna calculate some metrics and some forecasting, and it allows them to change some variables.

So it really depends on the needs. There's not 1 specific

use case that I can tell you. Yeah. It could be as easy as a couple days or it could be, you know, as complex to implement

as 2 or 3 weeks.

However, what we can assure is that you definitely don't need, you know, 500 tools to do that, and you don't need, you know, to contact different vendors. And it's all 1 tool, all 1 platform and all supported centrally.

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it's often too late and the damage is done.

DataFold's proactive approach to data quality helps data teams gain visibility and confidence in the quality of their analytical data through data profiling, column level lineage, and intelligent anomaly detection.

DataFold also helps automate regression testing of ETL code with its data diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values.

DataFold integrates with all major data warehouses as well as frameworks such as Airflow and DBT and seamlessly plugs into CI workflows.

Visitdataengineeringpodcast.com/datafold

today to book a demo with DataFold.

In terms of the

sort of data modeling that you're doing, you mentioned being able to, you know, add multiple filters to the dashboard so you can, you know, slice and dice the data in different ways.

And I know that 1 of the challenges that often comes up with business intelligence is

you set out with the intention of answering 1 question, and then you realize from that answer, okay. Here are 15 other questions that I want to ask, and then you go and say, okay. Well, now how do I ask and answer these questions? And the answer is, oh, well, that's gonna require some additional work because we don't have all that data, or the schema isn't structured in a way that makes that, you know, feasible given the computations that need to happen. And I'm wondering how you approach some of those challenges of

allowing people to do this exploration of the data in a ad hoc and iterative fashion while still making sure that it is sort of performant and well structured and maintainable over the long run? Absolutely. That was 1 of our largest challenges. And I gotta be honest, you know, Click Data at its current stage is still, I would say, more of a business

reporting

tool

rather than a data exploration tool. Right? We're not SPSS.

We're not even Tableau in the terms of, oh, here's a dataset.

What kind of stuff and patterns can I see?

We start from the point of view that we're trying to support the business

in having decent metrics in front of them every single day, but they need to know what those metrics are. They need to know what they're looking for. We are building a new module called insights as well that will assist with that.

But, nonetheless,

the idea of us building a dashboard and then, you know, taking our example, you know, showing it to the business users and the business users

challenging and then saying, well, now it'd be better if it was this, this, and this. And you having to go back to the data model

scared us because I've been there. I've done that. And that requires the rebuilding of pretty much all your data dimension model, your hierarchy, and so forth. And we don't wanna do that, so we don't. There's at no point in Qlik Data will you have to say this is a metric. This is a hierarchical dimension. This is a dimension x, y, and z. When you load data into Quick Data, we immediately launch parallel processes

to basically

index every single column you have in a column or store indexing. So however you wanna look at it, that's how you're gonna look at it. It's quite fast to do aggregations from that standpoint, And this is happening behind the scenes. And we're building these mini index and clustered

stores all over the place on every single column,

almost every single column. We don't do columns greater than 4, 000 characters for performance reasons. But every single column you throw in there, we do a columnar store index.

And it seems expensive and it seems

almost nonintelligence,

almost blunt, a brute force kind of way of doing things. But that's because exactly I don't know what the user is gonna ask down the stream. Right? Down the stream, he may want to aggregate on this column.

And I don't have to want to go back all the way to the beginning

to, you know, turn on a little option to create that or to aggregate based on that. So we do that behind the scenes,

and this is where we gain the performance. This is where we kind of make that already table

available. We also fought very hard at the beginning this concept of saying, you know, every business was saying, oh, well, we hate data silos. We hate data silos. And say, you know what? We kinda love data silos in a sense. We actually went, you know, the other way. And sure enough, recent years have proved us in a sense right where data lakes are exactly that. I should call them more like, you know, garbage dumps, really, because you throw everything into a data lake. Right? And then you sort it out after. That's how we saw click data very early on. We said, yeah. You just put tables all over the place, and you don't have to define the relationships

between them offhand. You don't have to define the dimensions.

You'll define them as you go along. And once we learn there's a relationship,

then we'll use it going forward. So those are the type of things that we kinda did to make sure

that we avoid, you know, that cycle of, you know, redefining your data model all the time and going back to all your interfaces for to adopt that. Another interesting element of your platform is that you're providing the storage layer for the business intelligence system. And I'm wondering if you can talk through some of the

security and regulatory challenges that that brings on and some of the ways that you have to work with customers to

sort of build up their confidence and trust in your capacity to be able to be effective stewards of their data? That's obviously foremost priority on our side.

The minute we have some sort of disaster, whether, you know, natural or not, that's the end most likely of our type of business. Right? So 1 of the reasons why we obviously went with either Amazon or Azure

was because we wanted to have that first layer of infrastructure highly secure. So the physical security

that would be there from the get go, we don't work with anybody else.

We then worked a lot on all the data is encrypted. That's also nonnegotiable

in transit,

internally in transit, and externally in transit through HTTPS.

And then we worked on a lot of other security in terms of how our applications themselves

work with the databases and how they access the customer's database.

And 1 of the things we did early on was we launched in 2 polls. We launched in Europe and which is hosted out of Ireland, and we launched in the US, which is hosted out of the East Coast.

And those are our 2 major polls.

But

soon enough, we had customers in countries in France, in Germany, in Canada

that basically said, no. Our data has to be hosted. And this led us to think, okay. Well, let's start another cluster there.

But starting a clustering a new region

is costly for us. And so we start thinking about

what if, you know, the customer is okay with the data in transit encrypted

going through some services outside of the country, but their data is stored in their country. And then we start coming up with this approach of having the customers almost hosting the data themselves

so that they feel secure. They hold the encryption keys to the database,

but they give the application still. It's a cloud application, but they still work with their database.

So we're working with some customers on that if they are concerned on that. Now we host clusters as well in these other countries as well. So those are things that we've been working very hard to make sure that we can ensure that the customers understand that our focus here is to make sure that their data is secured to the maximum possible.

You know, you could say you're HIPAA compliant. You can say that you are GDPR compliant.

But, you know, all these are just words, right, until you actually

put in place policies and processes and almost bake it in to your

code and automation

in such a way that even if somebody wants to or forgets to follow a certain procedure, that, you know, the system will do it. So this is really where we try to kinda work on. We don't try to do 1 offs. We always look at it and saying, k. But next month we'll have to do it again.

So let's make sure that it doesn't happen again.

This continuous

process and evolution, it has to be baked into

mostly in into the people rather than to the technology.

Because all you need is that 1 week link,

and then everything crumbles. Right? 1 of the things we did early on with Click Data started with, you know, basically a freemium version of Click Data, believe it or not. Over a space of 12 months, we have 30, 000 accounts running on Click Data. And we use that, you know, with the caveats

of freemium

to learn a lot about security, about scalability, about infrastructure,

about how our application was behaving,

and what potentially, you know, holes, you know, we needed to gap very quickly. So that gave us a lot of insight as well in terms of the security portion of it as well. In your work of building up the Clickdata business and

being able to serve this business intelligence market, I'm wondering what were some of the ideas or assumptions that you had going into it that have been changed or challenged either as you explored the problem space more fully or

as technologies evolved and requirements evolved and maybe some of the

particularly thorny engineering challenges that you faced along the way? I think 1 of the assumptions was that I thought everybody,

you know, coming from the enterprise world and working with BI

and other enterprise levels. I came into

Click Data thinking that everybody understood BI clearly

and what it is and what it's supposed to do,

and that is not true. And it's not to say they're wrong or right or that they don't understand

this or that. It's just that it was just a wrong assumption. I thought everybody understood that you needed a data warehouse to do proper BI and to keep your historical data.

And,

you know, either through

mass marketing

from other

database companies or whatever,

they felt that that was never really a topic, that

ETL was a bad word. It was too technical, and yet it's something that we need to do to cleanse the data. That's the process. You can call it whatever you want.

And, you know, I I thought it coming in that everybody knew that

medium, small size business. And that's not true. That's obviously not true. Especially some of our customers which are focused on their business as they should be, that's not of their interest. They just want those dashboards and the insights at the end of the day.

There was a lot of assumptions around,

you know, the value that we would bring by bringing an all in 1 platform right off the get go. Right? That people would see that benefit right away. And again, through marketing and potentially

miscommunication,

basically training,

we spend a lot of time,

you know,

teaching what BI is. And it's almost like,

you know,

if you're trying to sell your product to a customer and your product is cloud and this customer is on premise a 100%,

do you really should even spend the effort there? Right? Are they gonna change their mind that cloud is for them? We said right off the get go, no. We have to work with customers that are

advanced

enough in their thinking that they've chosen cloud as a solution for them, and they wanna move to the cloud. And, therefore, our application is appropriate for them. So it's picking your battles as well, those assumptions that I thought, oh, yeah. Of course, everybody wants to go to cloud. Why wouldn't they?

Well, no. They may have very good reasons to stay a little bit behind on premise or even not a cloud, but some kind of a private

cloud. And all those assumptions are, you know, for me kind of to find a new path within Glick Data to either provide options for that or sometimes just to say, well, no. We can't do that. Right? We can't offer that. So In your experience of building the product and working with your customers, what are some of the most interesting or innovative or unexpected ways that you've seen Click Data used? I've seen dashboards, which are mini apps, iPhone apps,

highly interactive iPhone apps they built.

I've seen a Pong

game being built on a dashboard. I've

seen a lot of people on their free trials, young students,

entire classes of students creating

very creative dashboards

about topics which

are

you know, they stand out from normal business topics. You're used to 1 of our support team members comes to me and shows me, oh, look at this dashboard. This is great. The customer

showed me this, and they need help with this. I felt it was a really good dashboard.

And we look at it, and it's typically business data. Right? Trying to help them solve some kind of a calculation error or something like that.

And every now and then, we got these dash boards on,

you know, Pokemon stats and stuff like that and, you know, PUBG stats. You know, this this type of thing that you go, oh, wow. This is even being used for other areas. Right? But we've seen that we've seen our dashboards being built for hospital for tracking inventory at hospitals

on tablets,

which is not typically something that you'd see a dashboard just for.

But because they can do it, you know, they do it. It's a tool like any other. We've seen people that just don't think about our visualization

at all.

But they love our data aggregation capabilities and connectivity and data warehouse.

And they plug, you know, Python and R into it, and they go nuts in terms of, you know, advanced statistical analysis or machine learning, whatever you wanna call it, and do very cool things with it. And some of the projects that were shown to me were impressive, like, beyond impressive.

We had 1 that did sentiment analysis. So

with with, you know, only using our web service connector

and our API, and I found that impressive as well. So all these things, they continue to surprise me

because, you know, it's just outside of what I'm usually seeing.

From a technical point, it's really cool as well because they're using things, you know, it's like finding a little Easter egg somewhere. It's like, oh, they found that, did they?

They found out to use that, did they? Okay. That's cool. So that's pretty cool. As you have been building and growing the company, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

That's a loaded question, Tobias. That's a lot of stuff. I mean, we could go into investors all the way to fundraising, to

dealing with engineering and sales. Yeah. It's there's so much. I mean,

I think ultimately what I love is the fact that

we're building something. You know, it's like anything else. Right? And I think this entire startup movement started because

people just love to create. Right? That's human beings. We love to create. Right?

Whatever it is. And programmers, especially, I think a lot of developers get into it

because they can mold computers into doing whatever they wanted to do. Right?

It's almost that power that says, I can do this. Oh, can I do this? It's also the challenge as well.

And that is the driving principle still with click data is can we, as the little guys of click data, you know,

we're being compared on, you know, media with Tableaus and Lookers and

Oracles and, you know, Domo's, companies that are $1, 000, 000, 000 valuations. And I'm

going, what? How is this possible? Right? And I think ultimately it's just because, you know, that passion of just creating something absolutely

incredible and as large as we are trying to build it

is still there. But the challenge have been many, obviously. You know, we're fighting the big guys. So they have a lot more money than we do. We're not known, you know, and this is why we're thankful for opportunities like this podcast that

allow us to kind of

without, you know, spending 1, 000, 000 to share our vision and to talk to interesting people and to discuss whatever topics related to data and BI and technology that we can. We had challenges in terms of

ensuring that

we're a small company providing a huge data service across the world. How do we support such a large scale? We're present in over 50 countries.

US is definitely our biggest country. How do we make sure that we support all these customers

across the world

and all these data centers from Microsoft, and how can we manage that in such an effective way? All those have been challenges. Right? And doing that as well without just

thinking, oh, just throw more money at it. You know, that's always a solution. Right? But I think we can do better than that. I think we can be smarter than that. Right? We're engineers. It's always about cost

effectiveness, not just about throwing money at it. Right? Absolutely.

It's always about the time, cost, and space trade offs.

Absolutely.

And so for people who are looking to be able to

build up a, you know, reliable business intelligence capability, what are the cases where Clickdata is the wrong choice?

I think Clickdata is the wrong choice if, for example, you're in a high volume, high transactional

data scenario. Right? If you're a telecommunications

company, for example, where you have

literally

thousands per millisecond

text,

you're not gonna be using click data to store those transactions. You may want to use a large scale

parallel database, you know, even large scale,

potentially things such as BigQueries and other technologies

for that, and kind of aggregate that data to a level

that, you know, is digestible for click data. Right? Again, we're trying to make insights, not data exploration, and not start detecting,

you know, text patterns and things like that. That's not us.

We're not an SPSS or a statistical advanced tool.

Don't think you're gonna come here and start writing Python. You can, but that will be on your machine, and you can hit your data on click data. But we don't have that facility.

Machine learning

is kind of a step for us.

We do provide certain basic

functionality

for statistical analysis, trending, and segmentation, things like that.

But more advanced than that, you'll have to kind of look elsewhere,

which is not to say that Click Data cannot feed that. Right? You can still collect the data from the different sources and then provide that data to those tools.

So those are basically the 2 main reasons, you know, that, you know, ClickData is probably not the right tool for you as well.

Of course, the third 1, but that goes without saying, if you're looking for an on premise solution, we're not it. That, you'll have to look elsewhere. We're fully cloud,

and we don't provide that on premise.

And as you continue to work on the business and iterate on the technology and the feature set that you're providing, what are some of the things that you have planned for the near to medium term? Well, interesting enough, we have a huge release November 30th and then another 1 planned for February 15th. And in both those releases,

we're launching

3 new modules of click data.

1 of them is called reports. Do you remember reports?

Reports are those, multi page, you know, financial

statements, you know, that somebody would just dump on your desk. And you think this day and age with data analysis and spreadsheets, and people would just get away from that. But we've had so many requests.

Dashboards are great, but they're single page focused. Right? You have to navigate through them.

I want to deliver a list of clients, you know, that's 3, 4, 5, 7 pages long

with their addresses and their value or something like that. That type of report building, which we used to have tools like Crystal Reports and Well Business Objects had a portion like that, and we had a few other tools like that. That's what we're building. We're launching that module. It's an awesome designer.

I think there's some cool innovations from the traditional report designers

that we've always had. You can build that and publish it the same way we do dashboards,

produce PDFs that you can download or stream to your browser.

That's pretty exciting. Nothing new, but this is a surprise for me as well.

We stopped designing click data for our based on our imagination and what we wanted to do with it, and we started listening to our customers

a few years ago

and asking them, okay. Where do you want us to take? And this report module was 1 of those things.

The next 1 is the data stream. Data stream was released a month ago, but we have some improvements to do there. The same way we can automate the ingesting of data into click data, we can now automate

the push of data to outside systems.

So you can use Click Data to aggregate all your data from the different systems.

And now, you know, every 5 minutes, pop the file on an FTP server or add a few rows of data to, you know, some database on the cloud

or to pop an Excel into a Dropbox or something like that. Click Data then becomes a full or is now a full kind of data treatment, data management

tool, irrespective whether it has visualization or not. Right? Some people don't want visualization. They just want yeah. Get all this data, clean it it up, add these calculations, and now put a file from my vendor on this folder every day. You know? And that's all they want. So we're gonna have that piece done as well.

I believe I touched briefly on insights, and insights is our exploration

data tool. And now the difference with insights

is that from a dashboard or some of the other tools we have is that you start from a set of data,

and you can start dragging columns in and immediately see scatter plots or pivot tables or line charts depending on the type of columns you drag in into.

And it immediately adapts to the data that you dropped into as opposed to saying, okay. I wanna build a scatter plot, and then it's gonna just put the dots in it. Our first version is gonna be a classic exploration tool. We're hopefully going to improve it to make it a lot more

smart in that sense where it starts changing the visualization

based on the type of columns that you throw at it. So those are the 3 modules that we're working on and, delivery over the next 2 releases as well.

And are there any other aspects of the work that you're doing at ClickData or the market for business intelligence that we didn't discuss yet that you'd like to cover before we close out the show? 1 actually interesting question. I'd love to know how people

view machine learning specifically,

what I usually call advanced statistical analysis, which is basically

extrapolation of data or segmentation of data or treating the data and creating models and then applying those models to the bulk of the

data. Is that something that should be part of BI in a sense?

At which point do we say

BI is just historical data and reporting on that, analytics on that? Now which point do we say, you know,

let's do some kind of forward looking activities here that can extrapolate that data. I'm always concerned that, you know, it's being treated as 2 separate topics. And sometimes, I see it as a single topic, which is data is data is data. You're just working with data 1 way or the other. You know? So

we're trying to kind of

create our own, I guess, opinion

of where machine learning sits in this entire

process. Right? We have some ideas. You know, I'd love to hear more as well, you know, whether it's there's gonna be a comment section here

on what people think,

you know, at which point does machine learning and data science

differ from business intelligence.

And maybe is it time right now to stop putting these labels on these things and just call this data

something? I don't know what that something is, but data something. You know, we know it's about data vis. It's about data aggregation, data connectivity, data management, statistics, data, you know, calculations, whatever it is. Right? It's still a very interesting topic for me because from a product perspective, I'm trying to see where it fits in the grand scheme of things. I'd love to build

a Python plugin.

And, you know, it's it's a challenge to see where, you know, is it really something that people will think that belongs

into

our platform,

or whether they will think, no, this doesn't belong here. You know, I'd rather use it elsewhere. So that's something that is going forward is does this really fit into Click Data in the future?

Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I don't know if it's a gap or not or whether it's a limitation of our own technology based on

on what we can do, but

I think it's at the point in time

where you'll be hard pressed finding datasets

that are low in volume. You know?

It's becoming quite large, these number of datasets and the number of rows in each days or

documents or items you have in your dataset.

And yet we're still limited by technology and by hardware specifically at times

on things like 1, 000, 000 rows and memory and things like that. And you figure by this time, we would have figured how to improve that. Right?

We're constantly

challenged using your browser

to display to our customers, you know, a scatterplot

with more than a 1000000 points. Right?

Why is that? You know, we understand not everybody has, you know, an RTX 3080, TI, you know, graphics card,

but the browser itself will not even support it. So that is to me, that is the largest limitation

right now. We're still not capable

unless you have a super duper, you know,

infrastructure

and all closed off, you know,

top level network stuff and disks and whatnot.

But we solved the problem of storage by compression and other methods, and right now we can get SSD drives to incredible sizes.

But we still can't digest more than a few 1000000 rows at a time, or at least present them

in a way that you know, in a visual way. I think there's a challenge there. I know this is not exactly data management as you asked, but to us, it kind of belongs to that piece as well. Because a lot of the data management we try to do is in memory.

And it's challenging to try to, you know, look at just a few millions of rows when we know there's more rows outside that we still need to take into consideration.

So, yeah, that's 1 of the biggest items we have. Yeah. That's I don't know if we have any specific gaps. I think we have a lot of challenges.

Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Click Data and your perspective on the business intelligence market. It's definitely a very

interesting and storied problem domain. So I appreciate the time and energy you're putting into it, and I hope you enjoy the rest of your day. Thank you very much for having me.

Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links