Big Data

Building An Enterprise Data Fabric At CluedIn - Episode 74

Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur naturally across business units. The CluedIn team experienced this issue first-hand in their previous roles, leading them to build a business aimed at building a managed data fabric for the enterprise. In this episode Tim Ward, CEO of CluedIn, joins me to explain how their platform is architected, how they manage the task of integrating with third-party platforms, automating entity extraction and master data management, and the work of providing multiple views of the same data for different use cases. I highly recommend listening closely to his explanation of how they manage consistency of the data that they process across different storage backends.

Read More

A DataOps vs DevOps Cookoff In The Data Kitchen - Episode 73

Delivering a data analytics project on time and with accurate information is critical to the success of any business. DataOps is a set of practices to increase the probability of success by creating value early and often, and using feedback loops to keep your project on course. In this episode Chris Bergh, head chef of Data Kitchen, explains how DataOps differs from DevOps, how the industry has begun adopting DataOps, and how to adopt an agile approach to building your data platform.

Read More

Deep Learning For Data Engineers - Episode 71

Deep learning is the latest class of technology that is gaining widespread interest. As data engineers we are responsible for building and managing the platforms that power these models. To help us understand what is involved, we are joined this week by Thomas Henson. In this episode he shares his experiences experimenting with deep learning, what data engineers need to know about the infrastructure and data requirements to power the models that your team is building, and how it can be used to supercharge our ETL pipelines.

Read More

The Alluxio Distributed Storage System - Episode 70

Distributed storage systems are the foundational layer of any big data stack. There are a variety of implementations which support different specialized use cases and come with associated tradeoffs. Alluxio is a distributed virtual filesystem which integrates with multiple persistent storage systems to provide a scalable, in-memory storage layer for scaling computational workloads independent of the size of your data. In this episode Bin Fan explains how he got involved with the project, how it is implemented, and the use cases that it is particularly well suited for. If your storage and compute layers are too tightly coupled and you want to scale them independently then Alluxio is the tool for the job.

Read More

Building Machine Learning Projects In The Enterprise - Episode 69

Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change. In this episode he discusses why machine learning projects require a new set of capabilities, how to build a team from internal and external candidates, and how an example project progressed through each phase of maturity. This was a great conversation for anyone who wants to understand the benefits and tradeoffs of machine learning for their own projects and how to put it into practice.

Read More

Building Enterprise Big Data Systems At LEGO - Episode 66

Building internal expertise around big data in a large organization is a major competitive advantage. However, it can be a difficult process due to compliance needs and the need to scale globally on day one. In this episode Jesper S√łgaard and Keld Antonsen share the story of starting and growing the big data group at LEGO. They discuss the challenges of being at global scale from the start, hiring and training talented engineers, prototyping and deploying new systems in the cloud, and what they have learned in the process. This is a useful conversation for engineers, managers, and leadership who are interested in building enterprise big data systems.

Read More