Summary
Business intelligence is a necessity for any organization that wants to be able to make informed decisions based on the data that they collect. Unfortunately, it is common for different portions of the business to build their reports with different assumptions, leading to conflicting views and poor choices. Looker is a modern tool for building and sharing reports that makes it easy to get everyone on the same page. In this episode Daniel Mintz explains how the product is architected, the features that make it easy for any business user to access and explore their reports, and how you can use it for your organization today.
Preamble
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Your host is Tobias Macey and today I’m interviewing Daniel Mintz about Looker, a a modern data platform that can serve the data needs of an entire company
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by describing what Looker is and the problem that it is aiming to solve?
- How do you define business intelligence?
- How is Looker unique from other approaches to business intelligence in the enterprise?
- How does it compare to open source platforms for BI?
- Can you describe the technical infrastructure that supports Looker?
- Given that you are connecting to the customer’s data store, how do you ensure sufficient security?
- For someone who is using Looker, what does their workflow look like?
- How does that change for different user roles (e.g. data engineer vs sales management)
- What are the scaling factors for Looker, both in terms of volume of data for reporting from, and for user concurrency?
- What are the most challenging aspects of building a business intelligence tool and company in the modern data ecosystem?
- What are the portions of the Looker architecture that you would do differently if you were to start over today?
- What are some of the most interesting or unusual uses of Looker that you have seen?
- What is in store for the future of Looker?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Looker
- Upworthy
- MoveOn.org
- LookML
- SQL
- Business Intelligence
- Data Warehouse
- Linux
- Hadoop
- BigQuery
- Snowflake
- Redshift
- DB2
- PostGres
- ETL (Extract, Transform, Load)
- ELT (Extract, Load, Transform)
- Airflow
- Luigi
- NiFi
- Data Curation Episode
- Presto
- Hive
- Athena
- DRY (Don’t Repeat Yourself)
- Looker Action Hub
- Salesforce
- Marketo
- Twilio
- Netscape Navigator
- Dynamic Pricing
- Survival Analysis
- DevOps
- BigQuery ML
- Snowflake Data Sharehouse
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the data engineering podcast, the show about modern data management. When you're ready to build your next pipeline, you'll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40 gigabit network, all controlled by a brand new API, you've got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch, and join the discussion at data engineering podcast.com/chat.
Your host is Tobias Macy. And today, I'm interviewing Daniel Mintz about Looker, a modern data platform that can serve the data. So, Daniel, could you start by introducing yourself?
[00:00:57] Unknown:
Sure. So I'm Daniel Mintz. I'm Looker's chief data evangelist, which is not the most common title, I would guess. And I have been with Looker for about little more than 2 and a half years. And before that, I was actually a Looker customer for 3 and a half years, when I ran data and analytics at the media startup, Upworthy.
[00:01:17] Unknown:
And do you remember how you first got involved in the area of data management?
[00:01:21] Unknown:
I mean, I guess. I, you know, I I, I worked in politics for a long time. I worked for the, political organization moveon.org, and, move on dot org is a membership organization of about 8, 000, 000 people. But, fundamentally, it's, like, just a giant database. It's a giant database of email addresses. And so in order to do anything, to communicate with those people, to find out what those people are excited about, what they care about, you have to sort of ask the data. And so move on from the very beginning when it started in 1998 was a very sort of technologically sophisticated political organization in the sense that if you can remember back to the nineties when most political petitions were the, like, add your name to the bottom, and forward to your friends variety where, there was no centralized repository of all the signers.
MoveOn said, well, maybe we could, you know, have a petition and and have people come to a website and sign it, and so they did that. And then all of a sudden, they had, like, a 100, 000 people, and then they had more than a 100, 000 people, and they were like, well, shoot. Now we should probably figure out something we can do with all of these people. They're looking to us to figure out what to do next. And so that's kind of where move on came out of, and I sort of fell into move on's world as a volunteer initially and then as a staff member. And so a big part of that job was just being comfortable with working in very large quantities of data, in a MySQL database. And so that's kinda how I got got started.
[00:02:56] Unknown:
And so now you're working with Looker. So can you give a bit of an overview about what Looker is and the problems that it's aiming to solve? Sure.
[00:03:06] Unknown:
So Looker is a data analytics platform. It is a data analytics tool that sits on top of your database. It does not try to sort of extract all of your data from your database or your data warehouse into its own proprietary store. Instead, it leaves it where it lives, and it uses SQL to write queries and ask questions of the data. And so the fundamental thing that Looker does that's really important and and really powerful is it allows anybody inside the organization to ask questions of the data without having to know SQL. It does that because it can get the knowledge about what the data means and how it's structured and how it joins out of the analyst's heads where it usually lives and into, an intermediary layer layer, which we call LookML. And LookML sort of is an abstraction layer on top of SQL so that it allows Looker to write the SQL for the user. And 1 of the
[00:04:02] Unknown:
categories that Looker is often grouped into is business intelligence. So can you give your personal definition of how you think of business intelligence?
[00:04:13] Unknown:
I mean, I don't know. It's just a thing that people say. I mean, I think business intelligence is like the historical long term business term for data analytics. The I always call the data analytics when I was in the sort of nonprofit political space. And then I came to the more business for profit space, and they were like, oh, you mean business intelligence? And I was like, sure. Yeah. Whatever whatever you wanna call it. I mean, fundamentally, it's just 1 way of looking at data, primarily sort of historical data. So it's, you know, it's more looking backwards, rather than sort of present day operational data and certainly somewhat different than future data, which is more sort of predictive modeling and and data science. But, you know, obviously, if your data about what happened is wrong, then your data about what's going to happen or your predictions about what's going to happen are are very unlikely to be correct. And business intelligence
[00:05:12] Unknown:
as a particular suite of technologies has been around for a number of years and there have been a lot of iterations on the best way to approach that. So I'm wondering how the approach that Looker is taking is unique from other tools that are currently in the market or other more traditional approaches to the ways that business intelligence has been addressed in years past?
[00:05:36] Unknown:
Sure. So maybe I'll I'll take your listeners on a little tour, a historical tour of of business intelligence all the way back to the sort of early eighties when I got its start. So if you go all the way back there, I don't know how many of your listeners were doing BI, and data warehousing back then. That was really actually before data warehousing was even a term that weren't really columnar stores. But if you go all the way back then, you know, you bought a database and a business intelligence tool all from the same vendor because that was the only way to do it. And it was a giant machine that got trucked into your data center, and it could give you some dashboards, like a few.
And that was kinda all it did. But, you know, that was a lot better than anything else because your choices were 0 dashboards boards or a few dashboards. And so 0 dashboards was worse. A few was better. So people took it. Right? So, you had these extremely expensive data appliances, they were called, that were powering these BI tools. And, you know, it was it was great because all of a sudden you could get answers where you couldn't have before, but you were operating on pretty small volumes of data. Your whole workflow was about curating and cleaning and structuring and, you know, turning into a star schema the data before it got into the warehouse or into the database because if you didn't, it would break the database.
So your whole workflow was about protecting that database and and making everything just perfect for it so that it could do its thing. And if you put too much data in it, you would have to spend another $1, 000, 000 or $2, 000, 000 on another data appliance, and you didn't wanna do that. So, you were very protective of it and, and tried to ask as little of it as possible. And, you know, as I said, that was better than what had come before, but it it obviously had some drawbacks. Now it also had some advantages, which is that because everything was so locked down, it was owned by IT and, you know, there are database administrators who were, who were in charge of all the data flows, and nobody else could see it and see what was happening, let alone touch it. The data that you got out of those systems was was right. It you know, the the sort of dashboards, the reports you got were dead on because they had been curated, and they took 6 months to build that dashboard, and no 1 was allowed to touch it and everything was just just so.
But you couldn't ask any new questions or you could, but you needed, like, a c level sponsor and then, you know, 6 months to wait and a lot of money to pay the consultants to build it. So, you know, there's no agility to ask new questions about things that were happening in your business. And so as sort of database technology and computer technology in general got a little bit faster and a little bit cheaper, and all of a sudden people started having these departmental servers or even, you know, desktop computers that could hold a reasonable amount of data that, you know, lived somewhere outside the data warehouse. They started doing new things with data, and so they, you know, they would they couldn't operate on all of the data that was in the data warehouse because only the warehouse had enough power to do that. But, like, if you were in finance, you could probably slice off enough of the data to be meaningful, store it on your departmental server, and then do a little bit of analysis, slice and dice it a little bit and and make some visualizations and and figure out what was going on. And that was better because now you had the ability to sort of ask new questions of the data. And if you're in finance, that was lovely because you didn't only have the sort of company wide dashboards.
And everything seemed great until you from finance got into a room with your colleagues from sales and your colleagues from marketing. And you said, guys, what's going on? You know, sales are down this quarter. And sales said, no. They're not. They're up. Marketing said, no. They're not. They're flat. And now all of a sudden you realized, well, you were all working on different extracts of the data that lived in different places and had been extracted from the source system at different times. And then you had all imposed different logic on that data in your little workbooks, and nobody could agree on what the data meant. You'd you'd lost that sort of core agreement on, you know, what the understanding of the data was.
And so, you know, that was understandable why that happened because it was the only way to sort of do that that kind of quick analysis. But it was a real problem, and it continues to be a real problem that leads to real data chaos inside organizations where no 1 can agree on what the data means. And so you spend all your time arguing about what the data means rather than using the data to make business strategy. And so the reason for that was was clear. It was because the databases and data warehouses continue to be very slow and very expensive. But, you know, something crazy started to happen in the sort of early to mid 2000 and and really accelerated in in this decade where databases went from being very slow and very expensive and, you know, living in, data centers, you know, on premise to being extraordinarily fast, extraordinarily cheap, and often living in the cloud.
And so what Looker does is it says, well, if we were reconcing of this idea of business intelligence, of this idea of data analytics, we were reconceiving of it from scratch, and we lacked that core constraint that had constrained all the previous generations of having very slow expensive databases. What would you build? And that's really what Looker set out to build. It said, well, if we had really fast, really cheap data storage and data querying, well, we'd let anybody query the data and ask questions of it, but we do it in a way that that maintained that core data model, that core understanding of what the business's data means so that everyone actually stays on the same page. And that's that's really what Looker does that's different than, you know, what older tools have done. And in terms of the technical architecture
[00:11:36] Unknown:
and the technical infrastructure that supports the Looker product, can you give an overview of how that is all put together and the overall workflow of getting it installed and set up and configured for a user's environment?
[00:11:51] Unknown:
Sure. So, you know, as I said earlier, Looker doesn't try to extract all of your data out the data warehouse or the database or the data engine or whatever you wanna call the thing that holds your data. And so as a result, it doesn't need to be a a ridiculously powerful, very expensive box. Looker runs on a pretty vanilla Linux server. Most, for most customers, we host Looker in the cloud for them. And so the process of starting up Looker is literally just, spinning up a free trial, and, you you contact our sales department. They spin up a free trial, and then they give you the login, and and you're in. And then you connect it to your database.
And so Looker, I think, speaks, like, 46 different dialects of SQL, everything from, you know, from SQL on Hadoop to Google BigQuery and Snowflake and AWS Redshift and Athena to, you know, DB 2 to MySQL and Postgres. So, really, if it has a, a JDBC driver and speaks SQL, we probably connect to it. So you go to the the administration panel in inside the server and you connect to the database, and Looker then scans the the schema, the physical schema of the tables on, you know, in the the schema that you've given it access to, and it starts to sort of intuit what some of the joins between those databases or those those data tables would be.
And it also sort of looks at all the columns and says, oh, well, this is a number and this is a date and this is a a string. And it builds out a really basic sort of data model, and you can query that immediately, because, again, all Looker is doing is writing SQL for you, sending that little bit of data, which is the SQL query off across the wire to the database, letting the database do the work, and then getting the the results back, and displaying it and letting you visualize it or put on a dashboard. But, you know, that base data model, which is written in LookML, which is sort of Looker's markup language that sits on top of SQL, is really a basic model, and the real power comes when you start elaborating on that model as a, you know, as an analyst or a data engineer. And you say, well, you know, the precise definition for our business of lifetime customer value is, you know, a plus b divided by c unless it's somebody who joined before this other date. And so then in that case, it's a plus b, you know, times c, and you put that into Looker once.
And now there's a field on the front end that's presented to business users who don't have to know what that definition behind the scenes is. That just says lifetime customer value, and it takes care of all the logic because you've told it what that logic is. And so, you know, the sort of technical data people love Looker because they spend a lot less time writing and rewriting, you know, slight variations of the same SQL query over and over and instead are able to do that once and say, hey. When you join users to orders, you know, use user ID. That's the join logic, and, you know, the relationship is 1 user to many orders. And it's that simple. And then Looker does that,
[00:15:05] Unknown:
forevermore, and they don't have to worry about it. Yeah. Being able to encapsulate that business logic and have it be a shared resource without having to embed it in either the e t l pipeline that populates the database that you're connecting to or creating some snippet of SQL that you then have to try and figure out how to inject into another larger SQL query, I imagine, is 1 of the big selling points for Looker in particular. Yeah. I mean, particularly when talking to technical resources who who know that pain
[00:15:39] Unknown:
viscerally and have dealt with it. I mean, you know, as somebody who loves SQL, I think SQL's wonderful, and it makes sense to me why it it has been around for 40 years. It thinks the way that I do, but or maybe I think the way it does at this point. But I also know the pain of, like, looking at a 200 line SQL query that I wrote 2 weeks ago and going, what was I thinking? What? Why did I you know what? I'll just I'll write this 1 from scratch because there's just no way that I'm gonna take the time to understand. And that's with SQL that I wrote myself, let alone SQL that somebody else wrote, which I'm not even gonna attempt to to, you know, and, untangle and and understand. And so LookML, because it's an abstraction layer on top of SQL, it it really breaks SQL down into a bunch of bite sized chunks that it then handles recombining in the right ways. So each of those chunks in and of itself is very easy to read and understand and debug.
It's also version controlled. Right? Because LookML is real code. It's not I mean, again, I love SQL, but SQL is not real code. Right? It's not meant to be an executable file. It's just a bunch of sort of recipes. But LookML is very much real code. And so, you know, Git has been part of Looker since the very beginning. And so everything that you do in Looker and in LookML is get version, version controlled. And so, you know, that allows collaboration at scale. It allows me to roll back if somebody, you know, if I make a mistake, I can see what the previous version was and roll back to it. It allows us to have, a sandbox environment for each developer so that I can try new stuff without affecting production while letting other people continue to do their work. So it you know, that it being real code has enormous
[00:17:33] Unknown:
I'm wondering if the common pattern is that there's 1 canonical database that gets connected to in terms of a data warehouse for instance, and there are a lot of e t l jobs or e l t jobs that will load data into that store. Or if it's more common that users will connect to multiple different data sources and then do the merging and transformation at the point of report building and, adding those virtualization layers with the LookML for being able to create those shared pieces of business logic for building up the reports?
[00:18:08] Unknown:
Yeah. I mean, it's a great question. So I would say more and more we're seeing people with multiple data stores. It's just sort of the way that the world is going, and I think it's fine and great. I think a lot of the yeah. 1 of the nice parts about my job is I get to play with a lot of different databases and see how they really work, which is really fun and kind of amazing to see the speed at which, they're developing and and getting better. But in terms of how that sort of ETL process works and where it's happening, I actually wrote an article a little a little while ago about how we've moved away from ETL. Right? We're seeing less and less or pure ETL, but we're not quite in a world of of real ELT yet. And, you know, if we had infinitely fast databases, you'd do everything at runtime. Right? And so you would really be in a a world of pew pure ELT. There'd be no reason to pre transform or pre aggregate because you could just do it at runtime instantly. But, you know, databases are very, very fast, but they're not infinitely fast yet. And so what we see a lot of instead is, I think I called it, like, e, little t, l, like, big t, another little t, and then, like, tiny t.
You know, it where it's just like this much more fluid process. You know? So you, like, you extract the data from the source systems. You do, like, some really basic transformation just, like, clean it up. If it's, like, JSON and you want it to be flattened, you do that. You know, you you remove the the rows that you don't really care about. You remove the columns you don't really care about. You know? So that very basic cleanup. You're not doing the real transformation and the and the cleanup that early in the process unless you have to or there's a good reason to. But in a lot of cases, we're seeing that get sort of pushed down further down the pipeline. And then you load it into 1 of these very powerful data warehouses or databases. And then you do, you know, the bulk of your transformation in that that data warehouse. And, you know, that's probably on some kind of, you know, that you're you've got some DAG or or a CronJob or something that's managing that. Right? I've used Airflow. I've used, Luigi.
I think at Looker, we use NiFi internally. So, you know, there's lots of great tools that are coming up, to manage that. But, you know, you're not that's those those are the transformations that you don't wanna be doing at runtime or that you can't do at runtime. You know? So something like sessionization is a great example where you actually can't really sessionize data properly until the day is over because you need that that date change to, like, finally close out any sessions that, that were happening at the end of the day. So you can do it, you know, sort of incrementally, but you really at the end of the day, you have to recessionize and make sure everything's clean. And so that's the kind of thing that you're gonna do once every 24 hours. So you do that once every 24 hours, but then, you know, you wanna you wanna update a session fax table. Well, you need to do that after the sessions have been created.
But, you know, you maybe you can do that at run time or maybe you can do it, you know, a little bit before runtime or so it's just a much more fluid process. And I think the advantage to that is clear that the the closer to runtime it is, the more agile it is. Right? And I don't mean that in a sort of, all uppercase agile way. I just mean, like, the more agility you retain. So, you know, if you need to change, a piece of logic in the old pure ETL days, that meant reprocessing all of your data. And if it was a a piece of logic that got imposed very early in the pipeline, very early in the ETL, it was a it could be a huge task. Right? It could be days or or weeks or even months to reprocess all your data with that new piece of logic.
And, you know, if you were in a pure ELT world, again, you could change the logic, and it would just take effect the next time you run the query, which would be lovely. But by moving as much of the transformation forward in the in the pipeline as you can, you know, it it cuts down on the amount of reprocessing. And so, you know, in a Looker world, I think people are doing a lot of that transformation right at runtime as much as they can, and that's the LookML that's doing that transformation. Right? If you can define your business metrics in LookML, then it you know, you can do some pretty heavyweight transformation with LookML, heavier weight than you would wanna do in in SQL just because it's an abstraction, so you can get away with much more complex things that you'd never wanna write out by hand in SQL.
But, but you're still probably you still have an ETL, and you're still doing some of those things earlier in the pipeline using, you know, 1 of those ETL tools. And that's fine. And I think that's
[00:22:49] Unknown:
that, you know, much more sort of organic fluid pipeline is the 1 that we're seeing the most. And as you mentioned, some of the data sources that you are able to connect to with Looker are things such as Hive or Athena where you can query across records in a data lake infrastructure before it has been processed where you're more likely to have these raw records that have just gone through the e and the l stages without necessarily any transformation. And when I was having a conversation a few weeks ago about these overall data curation process for larger enterprises where you might want to land something in the data lake, do some initial discovery and report building to see what's actually useful before you determine what transformations to make, what aggregations to make.
And then you will sort of cycle that through eventually until it lands into the data warehouse where it's highly structured. And I'm wondering what support Looker might have in terms of being able to take advantage of the LookML abstraction to have a business metric defined based on some of the early exploration in the data lake. And then as those pieces of data are cycled through into the data warehouse, being able to essentially keep that same, you know, maybe function definition isn't the right term, but the the sort of concept of having this 1 entry point for this business metric, but being able to redefine how it processes the information as it goes from the data lake through to the data warehouse so that the analyst interface is the same, but the actual underlying data has changed.
[00:24:25] Unknown:
Yeah. I mean, that's definitely something we see. And I do think that that we're seeing sort of evermore convergence between the idea of a data lake and a data warehouse, right, where a lake is the sort of purely just dump everything in their unstructured world and the warehouse traditionally has been this highly structured world. I think we're seeing both that the data lakes are gaining some of the functionality of warehouses where they can do pretty fast querying across just enormous datasets that are not particularly well structured, you know, Presto, Hive, Athena, things like that. And then we're also seeing that data warehouses are able to cope with much less structured data. Right?
And so things like, you know, BigQuery where actually the charge for storing stuff in BigQuery is the same as the charge for storing stuff in in Google Cloud Storage. You know, Snowflake, where they actually their storage medium is s 3. So, you know, you're you're not paying any extra to store it in Snowflake. So we I do think we're seeing a convergence there. And, yeah, I mean, I think the advantage of of LookML being an abstraction on top of SQL is that it is dialect agnostic. And so you can write a piece of business logic. You can write a a dimension or a or a measure in LookML, and, you know, you can leverage all of the power of your particular SQL dialect. So, you know, you can, you can write sort of any SQL that your SQL your SQL engine speaks, into the definition of that measure. But you can also Lucknow is highly reusable. It's it's meant to be very dry, very don't repeat yourself friendly.
So once you define a measure once or a dimension once, you don't you know, referencing the physical location in the in the table, you don't then rereference that table. You just rereference that measure. So if I wanna do, days to process an order, right, I'm probably gonna reference the, like, the order date and the ship date in the table, which are physical fields in the table, and I'm gonna put some logic that, you know, if the ship date is null, then it's still processing kinda stuff. But then if I so my days to process is a calculation based on those 2 physical fields, but they're gonna they're not gonna reference the table directly. They're gonna reference those fields that are referencing the table directly. And then if I wanna get average days to process, that's gonna reference the days to process, which references the fields that are referencing the table. Right? And so that reusability and that ability to use references means that you really are abstracted away from the underlying structure of the of the data in the table. And so we often see companies not just sort of move data from 1 place to another where they're saying, oh, well, we have our lake over here, and so we can start by modeling the data there and then carry that model over as the data moves its way into the more structured warehouse. But also, we often see companies swap out their whole data infrastructure from underneath Looker, And I don't wanna pretend that it's like you snap your fingers and it just happens and it's magic.
You know, it but it takes hours or a couple of days rather than months because you can just point Looker at the new location for the data and say, hey. Rather than writing MySQL, now you need to write Redshift, you know, queries. And Looker says, okay. I can use the same LookML. I know how to do that. I know how to speak both dialects, so it kind of translates natively. And so that that smooths that path for moving the physical location without having to rewrite the whole model. And I think that's 1 of the core sort of advantages of Looker is this idea of separating, you know, the physical location of the data from the meaning of the data because, you know, more often than not in sort of the history of BI and data, those 2 things are very much intertwined.
And so that means anytime that the meaning of the data changes, the location of the data and the structure of the data has to change as well and vice versa. But if you separate those 2 things as Looker encourages you to do, gives you a lot more flexibility to change 1 without having to make big changes
[00:28:44] Unknown:
to the other. And since the actual data storage is owned and controlled by the customer and Looker is connecting to that for being able to build and serve these reports. There's the question of security and access control and governance. So I'm wondering what capacities Looker has available for ensuring the appropriate levels of security and, control for users to be able to feel comfortable exposing their data sources to the Looker product? Yeah. Absolutely. So we take security
[00:29:21] Unknown:
really, really seriously. Our customers wouldn't have it any other way, and we wouldn't have it any other way. And so we give you a lot of tools for setting up a an environment that meets your security needs. So, you know, on the Looker side, you have a lot of control over who can see what, who can access what. You know, that can be everything from just making certain fields or certain columns or certain tables totally unavailable to certain people to you know, if you're using if you're using Looker in in an embedded analytic context where rather than sort of using it to query internal data, you're turning Looker around and showing it to your customers so that they can see their data that you have of theirs. Right? So if you're a supplier and you want your retailers to be able to see how's how things are going with, you know, your data, you can do that, and and you can use Looker to sort of partition that data so that, you know, only the right retailer can see the right data and not no 1 can see each other's data. So you can do it on the Looker end, you know, very easily, and we also have lots of SSO app options, you know, from Google OAuth to LDAP to SAML so that you can you don't have to redefine all of that that, enterprise, permissioning in Looker but rather can just leverage existing ones. But then also you maintain the ability to do that on the database side as well and to carry those permissions through the Looker. So, you know, we recommend, in general, when you connect Looker to your data store, that you give Looker read only access except to a scratch schema where Looker sort of stores intermediate roll ups and and prebuilt cache stuff. So Looker can't ever write to your your main database. And you can even parameterize the connections that Looker uses so that, you know, if if you, Tobias, have sort of full access to the database, when you connect to Looker, Looker can then connect to the database as you so that you have the full connection and the full access. But if I have restricted access, when I connect to Looker, Looker then connects to the database as me who has restricted access at the database level. So you can carry those permissions right through the database to Looker.
[00:31:32] Unknown:
And for customers who are using Looker, who either might be just getting started or who have been using it for a while, what does the typical workflow look like, and what are some of the most widely used features or some that are less well known or less actively used?
[00:31:52] Unknown:
Sure. So oh, boy. Okay. That's a big question. So on work flow, you know, I mean, I think we see that Looker allows the the folks who do speak SQL, whether their job is, you know, sort of being a an enterprise wide data modeler or data engineer or it's to be an analyst for a particular department, it allows them to just be much more efficient. Right? It's it's all Looker is all about leveraging their skills so that they're not needed every time somebody who doesn't have those skills wants to access data. I'm sure most people listening have been in that world where, like, oh, I need that report, and so you go to the analyst or the data engineer or maybe just the regular engineer and say, hey. Can I get, you know, sales by state for the last 6 months? And the person says, yeah. I'll have it for you tomorrow, and they pull the report and they give it to them. And then the next day, that person comes back and says, hey. I'm really sorry, but I actually need it by by region for the last 9 months, and I don't wanna touch the SQL query because I'll break it. Can you rewrite it for me? So having Looker means that that person can change those filters all by themselves and not worry about breaking things. And so that's, that's really nice for the business user because they don't have to wait in line, and it's really nice for the analyst because they don't have to rewrite the same SQL query over and over. But, you know, we generally see that there's a lot of focus at the outset on getting the model right, on getting all that business logic encoded in a model so that everybody really can self serve.
That sometimes leads to really interesting discussions, sometimes difficult discussions where for the first time, people are forced to confront the fact that different departments or different people are defining key business metrics differently. And as long as they live in their little, like, Excel workbook, they don't realize that. And then the data team comes by and says, hey. We're ready to, like, get you on to Looker. Let's get your key metrics. And and they go, hey. Just so you know, like, you you guys are defining customer differently than sales. Should we, like, all get on the same page about that?
And so Looker and and the process of building out that data model sometimes forces those hard conversations, which are really valuable. But, you know, once the model is set up, I think it tends to sort of take care of itself except as the business changes. So, you know, you're not constantly tweaking the model except that the business is changing. And so as it is changing, you're you're making those changes. But it's not like you're, you know, sort of giving it care and feeding day by day. You're just sort of fixing it as the business changes to make sure that it stays in line with with the actual business. In terms of the features, so, you know, I mean, I think the what I've seen as a Looker customer is that the people for whom Looker is, like, really life changing are the people who are Excel wizards.
They're really sort of data driven, but they're not quite technical enough to be able to access the data themselves. Or maybe they are, but they don't have permission. And all of a sudden, Looker lets them change their workflow from a workflow where they go and put in the data request, get the data, put it in their workbook, do their slicing and dicing, realizing that that they don't have all the data they need to ask. The next question, go back, stand in line, get the data, put that in, join it up, you know, all these very manual slow processes to a world where, they just put in a request and to Looker, get the data right back. It leads to the next question, and so then they run the next query, which leads to the next question. They you know, which leads to the next question, which leads to the next question. And that that is just life changing for them because all of a sudden, they can ask questions to their heart's content, and that's often really a huge change for the business because those data driven people who really know the business are the ones who often come up with the key insights that drive the business forward. So, you know, I think for those folks, the ability to explore freely is the feature that makes Looker amazing.
In terms of other features, so 1 thing that we rolled out about a year and a half ago, 2 years ago, is called the Action Hub. And what that allows us to do is to connect Looker to other services so that Looker can write or send data to outside services so that you're it cuts down on the amount of time that you spend flipping from tab to tab using your different tools so that you can send data from Looker to Salesforce or to Marketo or to Twilio to trigger text messages or, you know, whatever service that you need to send it to. You can do that with the click of a button right from within Looker. That's really popular and and pretty neat. Some other things, the ability to to sort of merge results from different data stores is really powerful and something that we, again, rolled out about a year and a half ago and and has been, you know, a big deal for companies, particularly as the number of data stores that they're maintaining multiplies. You know, often, you might have sales data in 1 place and, you know, web analytics data in another place, and you don't need to join that data at the row level. In fact, you probably couldn't. There wouldn't be a good join key. But you might wanna see them side by side and see how do they vary against each other day by day.
And so Looker makes that really easy to run the query against the sales database and then run the query against the web database and then just join them at the day level. Looker says, oh, well, these are both rolled up by day, so I'm just gonna join them by day, and then you can graph those side by side and see how they how they correlate. So that's I think that's really powerful. I don't know what what people use less of. I mean, 1 of the really powerful things that we added that that I I haven't looked, so I don't know how much people are using it. But however much they're using it, they should be using it more is, a really intelligent cash management system that allows you to hook Looker right into your data pipeline and your ETL tools so you can set very smart rules about when to expire caches in Looker and when to rebuild what we call persistent derived tables, which are sort of the roll ups and transformations that Looker handles, for you.
So you can trigger those by running a sort of a cheap SQL query to say, like, oh, when new rows appear in this table, you know, expire that cache, or you can trigger it just based on a time, value. But you can also trigger it with an API call so that, you know, if you're running a DAG as part of your ETL, you can make the last step in that DAG an API call which notifies Looker that there is new data and says, hey. The ETL just finished. Go expire the cache, rebuild the tables, do all the things so that the freshest data is in Looker. But until it runs again, now Looker has that data cached, and so it really cuts down on unnecessary, load on the database because it lets the database leaves it alone, lets Looker's cache do the work until there's new data.
And so I I love that feature, data groups, which which really, you know, lets you manage that cache really smartly. Yeah. That definitely sounds like
[00:38:52] Unknown:
a highly useful feature that could easily be overlooked in terms of the amount of value that it provides for end users and for the people managing the data infrastructure?
[00:39:02] Unknown:
Yeah. I mean, I think because Looker is leaving the data in your database, you know, we have to be very cognizant of how exactly we interface with your database and how we reduce load on your database and all of those things. And so, you know, making sure that those are all very tunable is is a big focus of of our team. And given that there
[00:39:23] Unknown:
are massive differences in terms of the scale of data that different organizations might use or that might be used at different points within the organization and different scales in terms of the number of people who may be trying to interpret or analyze the data. I'm wondering what the scaling factors are for Looker along both of those axes of the volume of data that it's available to build reports on top of and the scaling factors for supporting multiple concurrent users and what the bottlenecks are? Yeah. So,
[00:39:58] Unknown:
I I have the great pleasure of filling out, industry analyst RFIs, and and, I I'm being a little bit facetious there. But, 1 of the questions that are often is asked is, like, at your biggest customer, what is the scale of the data? And the honest answer is I have no idea because we don't care. You know, the the hard work of of processing terabytes or petabytes of data is being done by the data warehouse, and so we don't see that. We don't know how much data that they're processing. And so as your volumes scale up, Looker doesn't have to make any changes. You know, obviously, if you're trying to process billions of rows on, on Postgres, you're probably gonna run into some problems, but those are problems that are on Postgres' side, not on Looker's side. Right?
So, you know, I know that we have just from sort of talking to customers that we have customers that are using Looker on top of petabyte scale data, and it works just fine. Some of the biggest users of BigQuery and, and Hadoop installations are using Looker on top of them. And so I know that it can scale really, really big, but but the honest answer is it doesn't actually matter from Looker's perspective how much data you're querying. On the user side, you know, the the amount of work that Looker is doing is much smaller than a a server that is trying to actually query the data, you know, keep that data in memory or or query it locally. So, you know, certainly, as you scale Looker up to more and more users, you do need to you know, you can run Looker in a clustered environment and and and lots of our customers do that. But it's nothing like you know, your server costs are not gonna be tens of 1, 000 of dollars a a month the way that they would with with some other tools.
[00:41:46] Unknown:
And in terms of the business aspects, I'm wondering what you have seen as some of the most challenging aspects of being able to build and grow and scale the business in terms of it being a business intelligence tool in the modern data ecosystem and, just helping to explain and share the amount of value that it can provide as compared to the vast suites of other tools that are available for often times overlapping purposes.
[00:42:18] Unknown:
Yeah. I mean, so the technology is by far the easier side of that. There's just no question that the technology is is easier to get our heads around and and easier to sort of figure out and fix when it is not working the way we want and all that. The people on the other hand are hard, you know, as you go into bigger and bigger organizations and work with people who have been working with tools that are older and, have been around longer or maybe just lived in a world where the idea of having access to data was not an option, and so they have learned to do their jobs very well without data. Teaching them how to use data to do their jobs in a way that is useful to them is really hard. Right? Change management is really hard.
Peep changing people's ways of doing things is really hard. Teaching people new skills is really hard. And the reality is that if I go to you and I say, here's a new tool, sure, it'll take you a little while to learn, but it's great. And you say, well, what can I do with it? And I say, well, it gives you access to data. It's great. You are not going to use that tool. Like, that's just not interesting to you if you're a sort of standard line of business person. If on the other hand, I you know, let's say you're a marketer and I come to you and I say, you know, here's this new tool. It's really great. You'll learn it in a week, and it it's a huge help. And you say, okay. Great. But what is it gonna do for me? And I say, well, have you ever wondered which ad campaigns are delivering the most valuable long term customer? And you say, well, yeah. That that's a good question. You know, I can see conversions, but I can't really see what kinds of customers those people become.
And I said, well, let me sit down with you and show you how we can see that here. And then rather than slicing it by campaign, let's slice it by day of the week. And then let's slice it by day of the week in campaign, and then let's layer in, you know, state, or maybe let's pull that out and and pivot by gender. And all of a sudden, you're like, woah. This is valuable information. Right? This is data is not gonna be exciting to people except data people. But information, knowledge, you know, those are those are really powerful levers to get people to change their workflow. So, you know, I think focusing on the people is really the key piece and is the most rewarding, but it's also the hardest.
[00:44:50] Unknown:
And in terms of the technical architecture of the Looker product, are there any sharp edges or pitfalls that you have come across that if you were to start the entire project over again today that you would do differently?
[00:45:06] Unknown:
I mean, yeah. Of course. You know, I mean, I think 1 thing that is really well, so so I should say, when I started using Looker, because we were customer number 22, so when I started using Looker, the vision for what Looker should be was was there. Our founder, Lloyd Tabb, is an industry veteran. He is a he was a databases and a languages architect at Borland. He was the chief architect on Netscape Navigator Gold. He's been around. And so he had a very clear vision of what Looker should be from his past experiences. The product, on the other hand, was pretty new, and, you know, they had just added charts.
They had all 3 kinds, line, bar, and pie. So, you know, the product has come a really long way since then, but I still want it to be more usable. Looker is an incredibly powerful tool, and it's an incredibly general tool. It's a tool that can be made to do a lot of things. And so that's wonderful, but also there are downsides in terms of usability because when you can do anything, you know, those things may be harder than a tool that can only do 1 thing where you can really build an interface and an experience that is custom designed for that. And we're starting to, address that with some, you know, new things sort of built on top of the platform that really are meant for single types of users doing single types of use cases. We're calling them applications. But, you know, I I do think that Looker I sometimes say Looker was designed by engineers, and, that is wonderful because it gives you a lot of power.
But sometimes turns out that designers, do have their charms and can make really beautiful usable things. And, so we've got an incredible stable of designers now who are
[00:47:04] Unknown:
across an organization, and you can use it for arbitrary reports, and it has the capability of embedding charts and graphs into other applications. I'm curious what you have found to be some of the most interesting or unusual or unexpected uses of Looker that you've come across.
[00:47:20] Unknown:
Yeah. I mean, people have built some pretty top of their application because their application is, it's a chatbot. Application because their application is, it's a chatbot, and it's different for every type of for every company that they're working with, every customer of theirs. And so each of those has a slightly different sort of structure, and they've built an automated workflow that that reads their code and turns it into a LookML, model so that they're not having to do updates in 2 places or 3 places, but rather they just update their code for each customer, and then it automatically rebuilds the LookML model. And I think that's 1 of the really powerful things about LookML being code is that you can do things like that. I've actually worked on a number of projects where I, write Python, which writes LookML, which is really fun and neat to regenerate a, you know, 50, 000 line, LookML file with the click of a button. And then you wait 10 seconds, and there's my new model that's updated with all the latest parameters. It's pretty neat. But, you know, people have built dynamic pricing engines on top of Looker. People have done survival analyses, on top of Looker and Looker Now. People have built whole sort of monitoring systems for their data pipeline to make sure that their data pipeline is, is working the way that they intended and that there's no breakages. They build that in Looker.
So, you know, I see a number of DevOps use cases where people are building whole DevOps systems, monitoring systems in Looker. So, yeah, the as I said, it's a very general solution. So, really, the the possibilities are are quite endless and and quite fascinating to watch what people come up with. And 1 of the areas that we haven't touched on yet at all is the aspect of
[00:49:11] Unknown:
visualization and the report building capabilities of Looker. And visualization can be very powerful, but it can also be very easy to get it wrong or cause it to be misleading. So I'm wondering if there are any sorts of guardrails or user experience design patterns that try to help lead people into making the right choices for the types of visualizations
[00:49:36] Unknown:
and reports that they can build and share with other people? Funny you should ask. We're actually doing a big project right now, to sort of redesign the heuristics that we use to make decisions about the default visualizations that we show you, both the sort of type and the ways that it's formatted. But I think more broadly, the main guardrail that Looker brings is that it is entirely in your browser, and it doesn't cut the cord between the source of the data and the the end product. And what that means is that you can always audit what you what you see. And for other BI tools, too often, the ability to to trace that audit trail all the way back is very limited or very difficult or maybe even impossible because there were manual changes that were made and are not replicable because the transformation happened somewhere else because you're operating on an extract from the database and nobody remembers exactly when it was pulled or how it was pulled. And so the fact that from a Looker visualization, which somebody sends me as a URL, which I then open in my browser, I can then see what was the SQL that is being sent right now to the database to generate it, what are the definitions of the look amount dimensions and measures that are that are generating that SQL?
What are the, you know, what are the Git changes? What are the Git commits that led to this latest definition? Who made those changes? When did they make those changes? Why did they make those changes? Did they write a decent commit message? You know, little things like that. I mean, the fact that you can audit it all the way back is a game changer because it really does allow you to make sure that the data that you're looking at is right. And given the choice between having bad data and having no data, I'll take no data every time.
Because bad data, you look at you think you know what's going on. You make decisions based on that. You rush ahead with all the confidence in the world, and then you you get stuck because you've you've based your decision on the wrong answer. No data. At least you're cautious because you go, well, I don't really know what's going on, but I'll use my gut. I'd much rather that than people rushing ahead with bad data. So the fact that Looker gives you the ability to audit it all the way back really cuts down on the amount of decision making that's done with bad data.
[00:52:00] Unknown:
And so what are some of the features or improvements
[00:52:05] Unknown:
or new projects that are in store for the future of Looker that you're most excited by? Yeah. I mean, so there's as I said, applications, I think, are a really powerful new thing that we're we're starting to roll out publicly, which are custom built interfaces, custom built applications, really, that sit on top of the Looker platform, leverage all of the tools that Looker gives you, but are for a specific type of user in a specific use case. So the first 2, 1 is a is a digital marketing analytics application. So combining your Google Ads and Facebook Ads and LinkedIn Ads and Bing Ads data all in 1 place, letting you do that sort of multichannel analysis without having to log in to 4 different tools, and then connect it to your other data sources about what's happening downstream, down funnel.
So that's 1 of those applications. The other is a web analytics 1. So same idea, but using sort of Google Analytics data. And so I think that idea of really treating Looker as a platform and leveraging all of the goodness of having sort of a single unified surface with which you interact with the data, but having sort of custom specific user interfaces that are much more intuitive to business users, I think that's a really powerful path that we're starting to head down. In terms of other features, I don't know. There's a bunch of fascinating stuff. There's a lot of of changes and and plans on the sort of exploration side on on improving visualizations, improving use of color in the application, improving the way that tables and and the data is presented, improving the way that filters work. So there's just a lot of stuff there that's really on the way that it it's at that sort of UX level of, like, how do people interact with the product. I think that's where things get really interesting. And then, you know, there's always the the data modeling team, which is the team that owns LookML, are always coming out with these, like, unexpected fascinating new ideas. So, 1 recently is is the basically taking unit testing, which is, you know, a really core concept in software engineering, but somehow has not made it into analytics and bringing it into LookML so that you can say, hey. I know the value for revenue last year.
I wanna build a unit test that every time I make a change to the code, checks to make sure that that value is the same and if it changes alert, make it stop broken. So allowing that kind of unit test in or saying, you know, oh, this is the date, and it has to be a date this year. And if I don't get a date or if I don't get a date that's in this year, alert me if something's broken. So those kinds of things bringing really good software engineering practices to the practice of writing sort of analytic code is always is fascinating and exciting to me. And are there any other aspects
[00:54:53] Unknown:
of Looker or business intelligence or the work that you're doing that we didn't cover yet that you think we should discuss before we close out the show? I mean, 1 of the really nice things about not building our own data engine,
[00:55:06] Unknown:
and instead getting to just work with all of the of the tools that are out there is, and there are a lot of them, and they are doing fascinating things and competing with each other is we get to all the stuff that they're doing, which is great. It's a great position to be in. So, you know, there's just constantly new innovation happening there and getting to see what those folks are coming up with, and we have really, you know, tight partnerships with with all of those vendors. And so seeing, you know, a couple of examples like, BQML, which is BigQuery machine learning, which they just rolled out where you can actually run machine learning routines triggered by SQL right in BigQuery without having to move the data and without having to spin up something that can, you know, run a regression on a 1, 000, 000, 000 rows, BigQuery can just do that. So being able to leverage that from within Looker or, you know, Amazon with, with Spectrum, which is basically the combination of AWS, Athena, and Redshift where you can query data that lives on s 3 from right within Redshift. So you really are combining the best of your data lake and your data warehouse, or Snowflake, you know, with the share house where you can have data that's that you have access to and share it with a customer or, you know, share sell access to that data and not have to move it to them, but rather just grant them access the same way you would with, like, a Google Doc. So there's just tons of innovation happening, and so to be able to benefit from all of that and and give our customers the benefit of all of that is really just a really fun, wonderful place to sit. And so for anybody who wants to follow the work that you're up to or get in touch, I'll have you add your preferred contact information to the show notes.
[00:56:48] Unknown:
And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology for data management today.
[00:56:58] Unknown:
That's a good 1. You know? I mean, like I said, I I think that the technology to some extent has outpaced the people aspect of it. Right? Change requires technology and people. And so I do think that as data becomes a bigger and bigger part of our civic lives, of our work lives, I think figuring out ways to make that data intelligible and usable to people rather than just available, that's actually, I think, the most interesting place. And it's, you know, it's something that we're very focused on at Looker, but I don't by any means think we or anybody else has it solved yet. Alright. Well, thank you very much for taking the time today to discuss the work that you're doing at Looker. It's definitely a very interesting project and product
[00:57:44] Unknown:
and 1 that, looks to provide a lot of value. So, it's definitely 1 I'll be taking a closer look at, and I
Introduction to Daniel Mintz and Looker
Daniel Mintz's Journey into Data Management
Overview of Looker and Its Unique Approach
Historical Context and Evolution of Business Intelligence
Looker's Technical Architecture and Workflow
ETL vs. ELT and Data Transformation Strategies
Data Lakes and Data Warehouses: Convergence and Use Cases
Security, Access Control, and Governance in Looker
User Workflow and Popular Features of Looker
Scaling Looker: Data Volume and Concurrent Users
Challenges in Building and Scaling Looker
Technical Architecture: Lessons and Improvements
Interesting and Unusual Uses of Looker
Visualization and Report Building in Looker
Future Features and Improvements in Looker
Looker's Position in the Data Ecosystem
Biggest Gaps in Data Management Tooling