Summary
Building internal expertise around big data in a large organization is a major competitive advantage. However, it can be a difficult process due to compliance needs and the need to scale globally on day one. In this episode Jesper Søgaard and Keld Antonsen share the story of starting and growing the big data group at LEGO. They discuss the challenges of being at global scale from the start, hiring and training talented engineers, prototyping and deploying new systems in the cloud, and what they have learned in the process. This is a useful conversation for engineers, managers, and leadership who are interested in building enterprise big data systems.
Preamble
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Your host is Tobias Macey and today I’m interviewing Keld Antonsen and Jesper Soegaard about the data infrastructure and analytics that powers LEGO
Interview
- Introduction
- How did you get involved in the area of data management?
- My understanding is that the big data group at LEGO is a fairly recent development. Can you share the story of how it got started?
- What kinds of data practices were in place prior to starting a dedicated group for managing the organization’s data?
- What was the transition process like, migrating data silos into a uniformly managed platform?
- What are the biggest data challenges that you face at LEGO?
- What are some of the most critical sources and types of data that you are managing?
- What are the main components of the data infrastructure that you have built to support the organizations analytical needs?
- What are some of the technologies that you have found to be most useful?
- Which have been the most problematic?
- What does the team structure look like for the data services at LEGO?
- Does that reflect in the types/numbers of systems that you support?
- What types of testing, monitoring, and metrics do you use to ensure the health of the systems you support?
- What have been some of the most interesting, challenging, or useful lessons that you have learned while building and maintaining the data platforms at LEGO?
- How have the data systems at Lego evolved over recent years as new technologies and techniques have been developed?
- How does the global nature of the LEGO business influence the design strategies and technology choices for your platform?
- What are you most excited for in the coming year?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- LEGO Group
- ERP (Enterprise Resource Planning)
- Predictive Analytics
- Prescriptive Analytics
- Hadoop
- Center Of Excellence
- Continuous Integration
- Spark
- Apache NiFi
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello. Welcome to the Data Engineering podcast, the show about modern data management. When When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy them. So check out Linode. With 200 gigabit private networking, scalable shared block storage, and a 40 gigabit public network, you've got everything you need to run a fast, reliable, and bulletproof data platform. If you need global distribution, they've got that covered too with worldwide data centers, including new ones in Toronto and Mumbai. Go to data engineering podcast dotcom/linode today to get a $20 credit and launch a new server in under a minute. And go to data engineering podcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. And don't forget to go to data engineering podcast.com/chat to join the community and keep the conversation going. Your host is Tobias Macy, and today I'm interviewing Keld Antonsson and Jesper Sogard about the data infrastructure and analytics that powers LEGO.
So, Kelt, could you start by introducing yourself?
[00:01:10] Unknown:
Yeah. Hello. My name is Kelt Antonson. I'm a senior data engineer at LEGO. I've been with the company for close to 10 20 years. And the last 15 years has been in the space of business intelligence and information management. And the last 3 and a half years has been together with Jesper, by the city that that's here next to me, in the
[00:01:33] Unknown:
big data area and data engineering area. And, Jesper, how about yourself? I'm Jesper Sooko. I head up the, the data engineering and platform teams here in LEGO. I've been in LEGO for 5 and a half years now, the last 3 and a half or so. Again, Tilt mentioned that has been in this space around data engineering and data platforms. Before, I had been with various large global companies working in in different areas of of IT primarily.
[00:02:02] Unknown:
And going back to you, Kyld, do you remember how you first got involved in the area of data management?
[00:02:08] Unknown:
Yeah. I I I was approached by Jesper, actually. He was he was at that time that's that's a lot about 3 and a half or 4 years ago, setting up this this team, and, and, and we talked together about about, me joining that team. So, so that was actually the first time I I got involved with the with the data engineering area. The the business and general intelligence area was pretty known by me at that point in time, but data engineering was not was not and the big data was all new to me at that point in time. And, Jesper, Bernd, do you remember how you got involved in the area of data management?
[00:02:43] Unknown:
Yeah. So so we probably have to step back 3 4 years in, in so I was involved in other spaces, but here in especially in legal. So we we decided that we wanted to it's actually almost 5 years ago now that there there was a product that we wanted to build that, that would need to capture quite a lot of data that we didn't capture already in legal. So we we were thinking about how we do that. I should mention I was an enterprise architect at that point, so we were trying to architect a solution for that. And then, a function here in Lego was started up called Business Innovation Solutions, and I was part of that. And then they asked me to step in and take that role as the head of engineering.
So I've got involved in legal terms in that space out of that 1. And then they said, well, the idea was to not do everything the way we used to. So this was a very new space for legal. This is not the BI data space, but the data space in this way of doing it, working open source and working with other products that normally you I should say that LEGO is a very SAP style heavy, company or ERP heavy company. And, so this was something completely different, much more focused around data development and and engineering practices.
So I was asked to step into that and and build a team around that. In the beginning, just a small team. The first part was just figuring out how on earth are we gonna do this in LEGO and in the context of LEGO. And then, yeah, as Kale mentioned, I asked him to step in and another guy that's also in the team. The the 3 of us, more or less, started up trying to figure out how on earth do we capture all the data, process it, and make it available for analytics, but also for data specific products.
[00:04:27] Unknown:
So that's how I got involved in, in this. And you said that at the beginning, it was 3 people. So has it grown to have more people in the organization now with more responsibilities?
[00:04:39] Unknown:
Definitely. Maybe we should we can, we can take the organizational aspect of it now. So, we we were sitting in what was called Business innovation solutions, and we were more or less growing, not necessarily that rapid, but we kinda took it in steps. We grew we were 3 people in the beginning Quite fast, grew to around 7. We had an external partner helping us out as well, like an an engineering partner. And then, over the years, we grew small. We we a couple of years ago, we decided to split it out into 2 different teams, a platform team, specifically, around the data platforms and then our products or data products team. And that grew at that time. We were probably around 10, 12 people. And if we now look at how we're gonna we've just reorganized it a bit here going into 19, we're gonna have 3 different teams, focusing and supporting different areas of the business. And we are roughly when we're roughly around 40. We're gonna be around 50 people if you include all the external engineers we have. But the 3 teams are gonna be the data platform, and now we call it data platform and operations because we have quite a lot of operational tasks.
So 1 of the things that also happened here in LEGO is that we kinda grew from a project based to now an operational type, environment for the for the data platforms and the the products. So that's we have had to cater for that. So that's the platform and operational piece. And then we have a a data products team, which is the more classical data products around how do we make data ready for insights, how do we store specific types of data, and how do we build data exposure products. And then the last 1 is an AI and machine learning on the hit more heavy on the engineering side. So that's the the 3 areas that are reporting to me. And then we have a sister team that's called advanced analytics, which is the what classical we named as the data science team in most other companies.
So we grew quite quite heavily from just 3, 4 people, and then to now, for if we add everything up, it'll probably be around 50 when the year ends. So for LEGO, that's a an an extreme an extreme expansion over the last 3, 4 years, especially the last year has been Absolutely. Really heavy. I would say 18 has probably been the year where we expanded the most. And also just the area of business, we're now covering the start up, having 1 specific product for 1 specific area of the business, which was in an app, to now be across, supporting analytics use cases more or less across the board in legal. And when you first started the big data group,
[00:07:17] Unknown:
what kinds of data practices or products were in place prior to beginning it? And what was the transition process like of migrating those different data products or data use cases out of their silos and into a more uniform platform that could be used across the business? I think the for the first question, there was nothing. There we started out on a blank sheet of paper. As I mentioned before, we we wanted to go and look at open source products. We wanted to utilize
[00:07:47] Unknown:
best practices from the data engineering practices around the world. So that was very different compared to what LEGO has normally been doing. Then, normally, we we buy off the shelf products. So there was so there was nothing in place. So first, we had to figure out what products do we need to build. And, luckily, you would see the first couple of products we were part of were also absent products being built by LEGO at the same time. So there was no need for an existing infrastructure to be supportive of that. If we say now, as we've moved into other areas, there has been there has been we have started to talk about and we have been doing migration to from more to more of these data platforms and distributions that we would normally use in big data and data and business intelligence to just talking about data, which is big data and data and business intelligence to just talking about data, which is also a massive step for Allego. So we just see ourselves as having 1 data organization.
Even though there's some elements of that and there's different use cases that are run differently, we're just talking about having 1 data organization that is supporting the data use cases in the and then it's, of course, up to us to figure out how on earth do we make that work on the on the higher end than the top of the ASK. Does that make sense? Or
[00:09:05] Unknown:
Yeah. That makes good sense. And in terms of the data organization, what are some of the biggest challenges that you're facing in terms of managing the data and collecting it and identifying potential sources of data that would be useful to your consumers?
[00:09:23] Unknown:
I think that that as as Jesper said, we we we started off pretty easy from a from a data ingestion point of view because we were basically developing the platform together with another product here in LEGO. So all the data that we needed for that was sort of being developed along with the platform that we set up. As we have grown in size, we will also, add more more data from a variety of different data sources, external and also internal, and especially when it comes to starting ingesting the internal data. That that becomes a a large challenge for us First of all, because it's a lot of data, but also because it's traditional. It's it's sitting in an an ERP system and fetching and getting all that data out of that is a fairly large challenge for us.
But it's something that is required for us in order to be, this data organization for the whole labor company. So so it's a it's it's a challenge for us for for for 2019.
[00:10:33] Unknown:
And as far as the types of data products that you're building, is it largely just business intelligence style solutions so that the different product groups can get feedback on the levels of engagement and success that they're seeing, or are there also more predictive or prescriptive analytics that are driving things like product road maps or anything along those lines?
[00:10:57] Unknown:
In the area of of machine learning and AI, the traditional operational reporting in Lagos is is still sitting together with the, the BI organization. But, of course, as as we spend more more of the data and as as we become more data driven, that that we have, of course, have to to figure out exactly how we we organize that split between the operational b product, the analytics part of it, and, of course, the data products themselves that that was heavily dependent on machine learning and and AI. Now I was just gonna say some of the products we grew out of was
[00:11:32] Unknown:
was primarily also figuring how to so we started when we started out, we talked a lot about how do we actually build products that includes machine learning. So that wasn't at at the point when the first product was built, we that was not something that was massively being looked at in the industry, especially not here in Europe. So we also had to figure out how do we actually build machine learning products and and how do we maintain that. And then we it's almost like we have 2 parallel tracks in in the engineering function. We have the machine learning and AI products, that we build, and we have a a strong capability in that. And then we have probably the more classic, if you can say that in in this space, but the more classic products around insights and and generation of insights. So a lot of the work the data engineering function does is is ingest and store and and process the data and make it available for the data science teams and other teams around LEGO to build, dashboards, to build, tracking solutions and stuff like that on top, and then we store the data and make it available.
And recently now, we also started looking at a more real time tracking, real time sales tracking, and exposing that to the wider business and working with that. So so in essence, you can almost say it's 2 parallel tracks, and we kinda had to figure out both as we went along, with this team.
[00:12:54] Unknown:
And can you give an overview of the platform and components of the platform that you've built out?
[00:13:02] Unknown:
It's, the platform itself is dependent or build upon a a Hadoop distribution. So that's the centerpiece of the platform. That that was a decision we took already from the beginning of this this setup. Then the first iteration of that platform was was heavily dependent on on open source components. So, so we we try to to to use as as much as of the big the best practice from the open source community on building the platform also to learn what it is that to have a platform like this and what can it bring us. The second iteration of the platform is still centered around a a a a or a Hadoop distribution, but we are starting to add in some commercial products as well in the different areas where we see those products being being a benefit for the platform.
So today, it's a mixture of a Hadoop distribution and then some open source and commercial products around that. But it's a product and it's a platform that covers the whole stack of data analytics. So we have tools for ingesting, for the R type of analytics and the more dashboard styling of data visualization and then,
[00:14:21] Unknown:
of course, the whole processing part as well. Could I also say that our 2nd generation or 2nd iteration of the platform also we started to work a lot more with DevOps and and those process of scripting all our infrastructure. We almost say infrastructure as code, where as much as possible. Yep. And I should mention both iterations of the platform has been 100% cloud based, which is also something new for Lego to to build up this type of of infrastructure in the cloud. So that has also been an an interesting journey for us to, to learn more about the cloud. And I think right now, compared to where we were 4 years ago, I think we we we know what it takes to run a cloud, and we know how to actually build a cloud based setup for big data and for data engineering and data analytics functions.
So I think and we and you can say we were also we've we've been blessed by Lego being very investment ready in this space for us. So, so we LEGO has big ambitions for using data to become more data driven. We are data driven but become even more data driven and and work with data and utilize data and not just in the classical spaces, but also other areas of the of the wine and liquor group. So I think we've been blessed in that, and it also meant that that we've had the opportunity to reiterate our platform again. And we've had the opportunity to actually
[00:15:42] Unknown:
drive quite a lot of business value through this platform. Yeah. And, especially, the the the part about being able to have the luxury of actually being being able to mature the platform over a number of years and then iterate on that has been very, very helpful for us, especially in in coming into a completely new space for us. So so I think that that's that's definitely something that that we can say that, it's a benefit for the team.
[00:16:08] Unknown:
And given that you're growing both of these new capabilities of the data platform and analytical stack as well as the cloud environment and some of the issues and edge cases around deployment and management of that infrastructure, what have been the most useful strategies for being able to grow and maintain that internal competency
[00:16:34] Unknown:
and be able to get up to speed and stay up to date with the various technological and platform innovations that are going on in the space? I think, we've used, of course, quite a lot of training, and people are very dedicated in the team to also be up to date. And there's a huge willingness. Sometimes we even have to take a step back and say, well, we're not going with that beta version. So sometimes there's a huge innovation willingness in the engineering team, and it's also the culture we've been we've kind of been working on having in the in the big data engineering team in Lego, since the beginning. You know, some of the first meetings me and Kaitlyn and the other persons had was around what type of culture do we want to drive in this. And then we actively sought out partners that could help us in different as we grew. So it was a different partner in the beginning that could help us just what are the basics around using open source. What are the basics? What do you need to know about using cloud? How do you work with cloud? So we will brought bring in specialists. And then we as we changed modus in Lego around big data engineering, we also needed a different partner that could also support that on a wider scale and and bridge it more and and be also the partner that pushes us into, into using new technology. But we are very open, and we have a lot of talks with different partners and different consultancies, all the time. We we try as much as possible to have the people on on training courses and conferences, and we also do speaks and stuff just to keep us on our toes with all this. Because as you mentioned, it it is it is hard to keep up with all the new stuff. And we also had products in our stack that we we discontinued because the community wasn't there anymore. And so so we had to look elsewhere. But that's the beauty of open source. It's also perhaps the the other side of the middle, you could say, on the other side of the coin is that sometimes you focus on a product and then it or more or less by the community gets discontinued. But I think a lot of it comes down to the culture we fostered in the engineering team that it's okay to actually try out new stuff. It's okay to reiterate what you already done and redo it, on a different stack or a different setup. We've done that, especially in the last year, quite a lot on some of the more core products we've had.
So so in that sense, it's been it's that's what we try to do. We try to focus on the culture and make make room for this innovation while at the same time, have focus on how do we drive business value for LEGO. But but a lot of the business value has also proven to come out of stock. We we, we looked at and say, okay. This actually looks very interesting. Can we do more with this technology or this process or this way of doing it? And and then we showed it to business partners and and around Ligand. They say, well, that that makes total sense for our business, and then we would try and utilize that. And then we would actually reiterate around the architecture and around the implementation of those quite a lot. And we've been doing that and we'll continue to do that. We're doing it quite heavily here in 19 as well, especially in the 1st quarters, to actually rebuild some of our core infrastructure, for some of the data products specifically.
[00:19:40] Unknown:
And you mentioned that the data group was created alongside some of these applications and products that required being able to consume and analyze the information that they were generating. So I'm wondering how the growth of the data team has been shaped by some of the business requirements and products that you're launching and how you're working to raise and maintain awareness across the broader Lego group as far as your capabilities and how they have engaged with you as consumers of the platform that you're building?
[00:20:20] Unknown:
So I think to to up to now, up to I'll almost say up to 19, up to this, it's we've been totally driven off business projects. So we would have a business partner or somebody in the business asking us to do specific stuff, and then we would we would support that either whether it was ingesting and preparation and publishing of data to a dashboard or to a solution or whether it was actually building a specific product or so or reiterating a product we've had. So we've been very, very driven of business cases, you could say, to grow the data team, of course. And that's also up to us to tell every time we build a product, maintenance comes along with that and then reiteration. So we work in an agile manner. So we say, okay. The first product we give you is probably the minimal viable product, and then we would like and then we would work with you to iterate on it, which has also proven a good way to to get the right product out there.
So we grew alongside that. And, and and as it worked better and better, we would drive delivering more and more business value out of our products. The the natural next step was to grow even more because there will be more business value coming out of the out of the platform or the specific data products or the tools we're setting up. And then now, I think for 19, we will still be very business case driven, of course, but the we actually also took a shift in legal to also say we need to be at the forefront of some of the stuff we do here. We can't just keep waiting for it. So some of the times we've been saying, okay. This we're gonna need this in a year's time, but there has been we've been struggling with actually getting that implemented because we were so driven off the business cases. But and then we once we then come to Google, we'll almost have to start from from scratch in that space instead of being on the forefront of it. So we made that switch now. So we've taken out hours and time and quite a lot of it to actually be on the forefront. That's why we reiterated the platform as 1 example because we knew a lot of the requirements coming in now, especially around analytics, would require that our platform have different capabilities.
And and so that's the shift we will see in 19 is that we will also come up with products where we can see that's because we know the business is gonna ask for it. Or they're asking for an implicit, but they're not directly asking for us. And then on the same time that's probably also why we grew so much. It's on the same time, we will be delivering even more AI and machine learning products to the business. That does a lot of awareness of this space in in LEGO.
[00:22:50] Unknown:
So there are some of course, a lot of people in the business that are looking to us in order to help them out on some of these areas of of of this technology.
[00:23:02] Unknown:
We've we've formed the centers of excellence in advanced analytics as well around specific product areas and market groups. So that also means that there's a lot of focus on not just the business, but also the analytics team to set requirements towards the engineering team, and that that interdependencies between those 2 teams also means that we have to shift gear and and do things a bit differently than we used to, which is good because it brings more focus on what is the core business value. You could say engineering or data engineering and and data platforms here in are a little bit different than the data science team and probably because we're not totally interrelated. So we have a lot of other customers on the side of advanced. There's no doubt that advanced analytics is our biggest customers and and the ones who work most directly with and and build a lot of products with. But we also have analytics customers in the in the wider business or in the wider legal group that also requires data engineering.
And that's that's also partly why we we changed Mobus on this, and there's been a clear link and clear direction, which has been super cool from the from the executive leadership team where we want to go over data and analytics in Lego. So that's been really good, and that's a strong driver also for the for the investment in in in the data team in general.
[00:24:19] Unknown:
And as far as the sources and types of data that you're dealing with and some of the types of products that they're feeding back into, is it primarily focused on the digital properties that Lego has, like the, video games and mobile applications
[00:24:37] Unknown:
and websites, or is it also feeding into some of the product lines of the physical bricks and things along that? I think it's hard to split those 2 in legal. I we are kind of split, but more and more products we're seeing have a digital element to it. So it's more the themes in general that has both a physical set and a and a digital set. So in that sense, it's it's full. Right? And the data we have been picking up, and I should say we've been we are very focused on if we pick up data, it is not linked to anything. So we are very journey for us to understand how to implement this high level of compliance.
So we always take you could always take the in legal, when we're talking about data that we are we are picking up where whether it's from a website or from an app or anything else, we always take the highest level of compliance and then pull over all our data products. So we actually had to build, like, anonymization products. So we have that in the background as well, so we won't have data on the platform that we're not allowed to have or we're not in platform and make available. But we are very conscious about what data we ingest and how we deal with the data and that we are following compliance. I say periods of time, we we probably talk all with our legal department, and then we talk with anybody else because we we want to be 100% sure that we're not doing something that could damage anything or make people's data available or use data we we shouldn't do. So, so we won't do that, and we won't ingest that. So there's a high level of governance around the team as well. So all these sources could come from anywhere, but we're very, very focused on not ingesting or making data available for that data. That is not something we have either got consent from the user to actually store or that we have legally
[00:26:40] Unknown:
we are right to have on the platform. This, of course, means that our data pipeline, ingestion pipelines, is is can be fairly complicated because we want an anonymization process, right in the very, very, beginning of the ingesting pipelines.
[00:26:55] Unknown:
Because of the need for being compliant with these different regulations and being able to ensure that you have a high level of reliability for the end users of your platform. What are some of the approaches that you've built in as far as testing of new technical implementations and monitoring of the existing platforms that and, you know, various metrics that you're tracking to ensure the overall health and availability of the systems that you're supporting?
[00:27:24] Unknown:
Well, on on the on the development process, we we we run a full CI, setup. So, so we we do the full setup, unit testing, and and and to the an integration testing to the extent that that that's possible with the with the data that we have available. And we try in that process also to cater for any issues that might be in in terms of being compliant on the data side. On the monitoring side, it's the same on operational part, it's the same which trying to keep a full picture of what is what is going on in our platform and what is going on in our system and especially making sure that that the data that we ingest is actually in form and structure is the same as the 1 that we actually did test at at the very beginning of, of the implementation of that ingestion pipeline. So so so we try to cater for data sources changing along the way, so that we don't ingest something that that that that we shouldn't ingest because the data sources change.
[00:28:35] Unknown:
So we try to keep that very, very, very structured around that. But it is interesting because we we are talking about across the board. Right? So we we have all the way from millisecond response time on APIs all the way into data that cannot be warm. And so it's a it's it's an interest it's always kind of an interesting discussion when we have it in the engineering team because there's lots of different metrics as we measure on. So 1 would say if we're measuring against a data pipeline as an example. Right? Are we just measuring the data that's coming through? So all lights could be green, but but there might not actually be any data in there. It's just showing green because 1 row came in where we were looking for 800, 000 rows the last 24 hours. Right? So there is always different aspects, and I think we that's also 1 of the products you could almost say that we reiterated on reiterated on quite a lot, because we needed as we matured in this space, we also have, a high and high requirement on the data that is coming in and our products being available at the right time and not down.
So we almost have to learn how to monitoring and monitor our data platform and then also monitoring our data products, which almost is a completely different thing compared to monitoring data. So we had to learn everything, as we went along as the and as as the requirements
[00:29:57] Unknown:
or rules. Yeah. That that that is a challenge going beyond that, just just monitoring that things are working. Actually, monitoring that. What what they're doing is is what we think they should be doing and what we developed them for. So that's a constant challenge for us.
[00:30:16] Unknown:
I was just saying we go since we go all the way, especially from AI products, looking at, images and image tagging to recommendation and then look at and recommendation engine in the probably in the widest possible term of recommendation engine. It's not just 1 recommendation. It's many with different requirements to APIs. That has been an interesting task. And then at the same time, ingesting data, monitoring data, monitoring consistency in the data,
[00:30:43] Unknown:
monitoring that that the data process in itself is is Compliance. Is compliant.
[00:30:49] Unknown:
And yeah. So there's so many levels of monitoring that it almost feels like you could write a book about monitoring in in Lego. That that's a book I would read. I'll let you know when it's up. Yeah.
[00:31:03] Unknown:
It may take a couple of weeks.
[00:31:07] Unknown:
And as far as the technical platform and the various components, what are some of the technologies, either specific tools or broader categories of technology that you have found to be most useful and which have been the most problematic?
[00:31:24] Unknown:
I think from what what if we look at the data processing part of it, because that that's pretty easy. We have been working with with the Spark framework for quite a while and and found that very useful. So a lot of our big data processing on the platform is built around Spark. On the agent testing side, we tried out various tools and also trying to build something from scratch from but it had ended up on a on an NiFi platform. And the reason for that is that it give gives us the flexibility, but also give us the the standardization that that we want from our data ingestion part. On the analytics side, it's a mixture of open source and off the shelf products. And there, I think we have settled on 2 products that we find pretty useful there and then we'll probably stay with them for quite a while. It's as I said before, it's 1 on based on our end, another 1 that's more on dashboard and data visualization.
But that's more or less the technology stack that we're using today, in terms of what is the most useful ones. But we have tried out different technology, especially on ingestion part. It's been a journey for us and there we have tried out some things, a few products that have shown us where that has been very problematic for us and haven't delivered what we expected.
[00:32:49] Unknown:
I think, adding to that, right, we we also realized that we needed to standardize our way of ingesting. Right? So we were doing it in many ways in the beginning, and we needed that framework to help us, basically. It's just I think also mentioned that we wanted to standardize our way of ingesting. And I think doing that also means that you have to try out quite a lot of different stuff
[00:33:12] Unknown:
to to find out what actually fits your organization and fits your company. And then also the variety of data sources that you have because it is all the way from nonstructured to the structured data. And finding 1 tool that spans that variety of data sources is a problem. So that has been something that we have been looking into, and then at the same time, trying to see if we can get that standardization on on on what the day on the days that we get in.
[00:33:42] Unknown:
And as you have grown to the current level of maturity and necessary reliability, I'm wondering what you have found to be useful approaches for prototyping out some of these new technical implementations while being able to ensure an appropriate level of volume and, production type environment to ensure that the, prototype implementation is actually going to be able to scale effectively in terms of the data volumes and reliability and ensure that it's going to meet the requirements that you set out for it? I think from a prototyping point of view, it's it's proof of pretty,
[00:34:23] Unknown:
useful, the cloud environment that we currently have. The the ease of which we can set up things and and and get something up and running, has has been very, very impressive, I would say. There is still an element of bringing something from a prototype and into something that is what we could call production ready. It's we have done it a a couple of time now, of course, but it's it's something that we've done. Not in the same way. It's been different for every every single time we have done it, and and we haven't really found the some the real good way of doing it. It's something that we need to look into in 2019 because it's it's an area where we'll see more come in.
[00:35:10] Unknown:
So so it's it's it's definitely a focus area for us, to be better at that part. I think those are links to us as as a as a company being super ambitious. Right? So a lot of people want to do a lot of stuff at the same time and, and trying to get good engineering practices in when somebody has a good idea also takes takes some some time and and and some work to do also because a lot of the times, we have to go out and also explain and educate on what the platform can do. So that also is part of it. I think as Killan said as well, I think when we when we try when we we love prototyping, there's no doubt about that. And and a lot of the times, I think the prototype ends up being a minimal viable product because it's really a prototype. So, we are also learning, how to then go to market with these prototypes of minimum viable products in a much easier way and a better way with engineering best practices behind it. So it's not to to say you need to build everything to the optimal degree if you're trying something out. It's more about how do we ensure that if this prototype needs to go into a production environment or a production, it has to be production grade. You have to make business critical decisions of it. How do we ensure that that product can can live in our environment and can be scaled?
As as but I think, as Kels said, I think we've been it's been super helpful for us to have the cloud and the our cloud environments for that because it means that we've been able to scale faster.
[00:36:39] Unknown:
And how does the global nature of the LEGO Group influence the overall design strategies and technology choices that you've made for building out the platform that you're creating and some of the requirements of availability and capacity and being able to maybe span global regions for reducing latency
[00:37:04] Unknown:
of the end user interacting with your products? So we always have global as, it's almost in the begin it's, by nature. It's everything we built is built to be global. So I think that that's how it all started out. If we're if we're delivering specific APIs for specific products as an example, I don't think we're we're delivering any APIs in production that are not on a global or at least a a very close to global level. So that, of course, impacts how we do everything around the APIs, and we try to to ensure that that we can scale on all the different levels we need to scale on once it goes up and down. I think global is 1 issue. I think the much bigger issue for us is seasonality. LEGO is a seasonal product, right, especially around Christmas, especially if you talk Americas and and Europe. We we scale up quite a lot. So we also had to learn. And and as we build, we always said, okay. We need to be able to scale for Christmas or scale for Easter or scale for Black Friday, especially it's 1 of the big things in Lego as well. So how do we And that
[00:38:16] Unknown:
And that that's 1 of the the the the things that always comes up when we go from from this prototyping POCs and into the minimal viable products and then, of course, into something that is production scaling. How how do we scale those tools or products up to being able to handle with the demand that
[00:38:35] Unknown:
Jesper just talked about? We're not saying it's easy. We're just saying it needs to be in the design Yeah. From the beginning. Otherwise, it's it will be it will be hard for us to scale at the right time. Yeah.
[00:38:46] Unknown:
Well, given the ramp up and the fact that you started in a large organization with these global requirements, it doesn't sound like anything that you've done the entire time has been easy, but, it sounds like you've, had a pretty impressive technical implementation.
[00:39:02] Unknown:
We we like to think so. Yes. Yeah. We are proud of what we do, and we are proud of what we build. And I think we're I think, actually, we're probably even more proud of the fact that we can change so fast. We are very change ready, and we can we can iterate fast over problems. And we can build products, both from an MVP we built the last recommendation engine setup. We were not a big team at all, but it actually scaled throughout. It's a global implementation alongside the the business partner. So and we did that with a fairly small team. So I think that's we're we're pretty proud of what we do, and we I think we managed to come a long way. And and but also having the mindset about learning as we go along. I think that's probably been key to having been it's been possible to both implement the first version, but, actually, also the second version now where we're talking infrastructures code as well and and actually pushing ourselves even more to to to use these new, techniques and technologies that are available out there. So
[00:40:07] Unknown:
But it does require the change mindset. Yes. That's key to this because, it's constantly changing and and and being just, I mean, just that the technology space is changing so fast and and if we wanna keep up then we we simply have to
[00:40:25] Unknown:
accept that that that what we do today might change in in in 3 months, in 6 months' time, and we have to follow that. So, Jeremy, also when we hire interns and actually also when we have we also actually also do interviews with externals, mostly. We're very focused on the culture. So the people and the mind the mindset and the culture they're bringing in, not necessarily always about do they have that that specific technical competence. It's more the mindset about learning new things and adopting to new things all the time. That's really key to the people we have in the team and the way we recruit our recruitment strategy around this team.
[00:41:01] Unknown:
And over the course of building out this internal capacity and this internal organization at Lego, what have you each found to be some of the most interesting or challenging or useful lessons that you've learned?
[00:41:14] Unknown:
It's, I think the most interesting thing I've learned when recruiting the team and building up the capability is is around especially just how important this whole cultural aspect is. So, so just how do we work together as a team? Also, in some cases, how do we work with the fact that in in some cases, we have teams in team. So we will have people that that work with a specific product for a long time. How do we adapt that to a team structure as well? So I think that's been 1 of the lessons learned. It's also been I think 1 of the major lessons learned, if I look over the last 4 years, has also been, how do we adapt to changing business requirements all the time? So it's it's not just about, here's an agile framework, and then go that'll fix all the problems in the world. For me, it's that's just a delivery model. It's it comes back to how do you set up the team to actually adjust to changing business requirements and how do you work with your business partners to to ensure that once we're changing it, we we we don't necessarily break sprints very often. So but we do change priorities within this, within sprint or not within sprints, but between sprints. And so how do we actually work with that? How do we create a culture where that's okay and where where people find do not get discouraged by the fact that they might have to leave a product behind for for a period of time. I actually have to make that product go away for at all, and then we're not gonna worry even more about that. So I think that's been more 1 of the more challenging things. So that's probably more the people aspect to it. And then finding people that's not a problem. It's not a problem finding people that wants to work with it, but actually both can fit in that culture, but also have this constant urge to try out new stuff and wanting to try out different technologies and and different services and different environments and different business, stakeholders that have, huge ideas about what to do and how do we handle that. So that's so it's probably that's probably been the the most interesting part of the journey, I think. How do we build up a team that can cater for all that? And also, we are spread wide across what we do. We're not a classic data engineering team that sits up and and do publication of data, and that's it. And we do everything from high performing APIs all the way to, hardcore infrastructure. I don't know how we build it, almost like a small company within the company. That that's been interesting. I think and and just in general, I I mean,
[00:43:41] Unknown:
working in a team like this, you just have to love the technology that you're sitting on. And, and and I think that if you look at, the way people are are working together and and and reacting on on new technologies, it's just it's just interesting to see how how you can you can grow yourself on on on these technologies.
[00:44:00] Unknown:
I think also, yeah, you have to love technology and you have to love data. Right? Yes. And the I think that also encapsulate what data engineering is. But but you're not a you're not a software engineer, and and you're not a data scientist. So there's something in between that that actually bridges to both. And so that has also been interesting, finding talent that wants to go in that direction and want to who can live in that. So so and and are willing to work with themselves in either because they can come from different ways. And I think we're if you look at backgrounds in the team, it's it's all over the place. Yeah. But the 1 thing they have in common, I think, is the love of technology and data and how to bring data to life in an organization.
[00:44:41] Unknown:
And what are you most excited for as you look forward to your road map for the coming year? I think,
[00:44:47] Unknown:
Lego is is a physical and visual product. Right? And and both of it is super exciting just to see how far we can we can get in using data, both for physical and and and visual products and and Lego as a as a company. How can we how can we support the Lego is very focused around children's play and creativity, and that's super cool and super awesome to be part of. And how could we, with data, help that to become an, even better, and an even stronger focus in not an even stronger focus because it is the key. Yeah. It's in LEGO. But how can we support that agenda throughout? There's so many opportunities in these technologies and the data we have, and we hope that that we can influence the rest of the legal organization to use these technologies.
And then I think if we if we succeed in that, it's gonna be awesome, and we have a super exciting 2019 and 2020 2020.
[00:45:41] Unknown:
Continue. Yeah. Yeah. So for anybody who wants to follow the work that you're up to or get in touch, I'll have you each add your preferred contact information to the show notes. And as a final question, I'd like to get each of your perspectives on what you see as being the biggest gap in the tooling or technology that's available for data management
[00:46:01] Unknown:
today? I have 1, which is it's kinda a thing I've I've been thinking quite a lot. So if you if you do especially in the and it's probably more in the tuning and technology side than it is for the data management side. It's around the cross cloud possibilities. There are so many as as more and more public clouds become available, they're and and they specialize. They almost feel like they specialize in different areas. It would be cool if someone could could orchestrate a solution that could work across clouds, so you can basically pick the best services across the different clouds that are available. And then you can orchestrate the solution below or above that orchestrating or using different services from different cloud vendors or even on prem in that case. Right? So you could you could orchestrate that and make that available. That could be super awesome. I know that's probably going to be almost impossible because you would need so many different relationships to different platforms. But if you had if if somebody could come up with a product, and there's probably somebody over in Silicon Valley that's far ahead of what we do, that could come up with a product that could orchestrate across cloud. And so you could have a true multi cloud environment in your in your company today, that will be that will be a game changer, I think. That would be awesome. Yeah. Yeah. Definitely. Because then you could basically pick the solution that works the best. Yep. But, of course, there's lots of how do you orchestrate the data across and how do you make data available for different solution at different times. And there will be a lot of probably, also latency issues and stuff, but it that would be if we can go to that level, that would be super helpful.
[00:47:31] Unknown:
Well, I want to thank the both of you for taking the time today to share the work that you've been up to. It's definitely very impressive what you've been able to achieve in such a short amount of time, given the number of different capabilities that you've needed to ramp up on. So it's been interesting, and I look forward to seeing what you come up to in the coming years. So thank you for that, and I hope you enjoy the rest of your day. Thank you very much. Thank you for having
[00:47:58] Unknown:
us.
Introduction and Guest Introduction
Keld Antonsson's Background
Jesper Sogard's Background
Getting Involved in Data Management
Building the Data Team at LEGO
Challenges in Data Ingestion and Management
Types of Data Products and Analytics
Overview of the Data Platform
Strategies for Maintaining Competency
Business Requirements and Team Growth
Sources and Types of Data
Ensuring Data Reliability and Compliance
Useful and Problematic Technologies
Prototyping and Scaling
Global Design Strategies
Lessons Learned and Team Culture
Excitement for the Future
Biggest Gaps in Data Management Technology
Closing Remarks