With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more important to have a canonical view of the entities that you care about. Is customer number 342 in your ERP the same as Bob Smith on Twitter? Building a master data set helps you answer these questions reliably and simplify the process of building your business intelligence reports. In this episode the head of product at Tamr, Mark Marinelli, discusses the challenges of building a master data set, why you should have one, and some of the techniques that modern platforms and systems provide for maintaining it.
Do you wish that you could track the changes in your data the same way that you track the changes in your code? Pachyderm is a platform for building a data lake with a versioned file system. It also lets you use whatever languages you want to run your analysis with its container based task graph. This week Daniel Whitenack shares the story of how the project got started, how it works under the covers, and how you can get started using it today!