Monte Carlo

Completing The Feedback Loop Of Data Through Operational Analytics With Census - Episode 231

The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there have been a number of trends in data that take advantage of the warehouse as a single focal point. Among those trends is the advent of operational analytics, which completes the cycle of data from collection, through analysis, to driving further action. In this episode Boris Jabes, CEO of Census, explains how the work of synchronizing cleaned and consolidated data about your customers back into the systems that you use to interact with those customers allows for a powerful feedback loop that has been missing in data systems until now. He also discusses how Census makes that synchronization easy to manage, how it fits with the growth of data quality tooling, and how you can start using it today.

Read More

How And Why To Become Data Driven As A Business - Episode 229

Organizations of all sizes are striving to become data driven, starting in earnest with the rise of big data a decade ago. With the never-ending growth in data sources and methods for aggregating and analyzing them, the use of data to direct the business has become a requirement. Randy Bean has been helping enterprise organizations define and execute their data strategies since before the age of big data. In this episode he discusses his experiences and how he approached the work of distilling them for his book “Fail Fast, Learn Faster”. This is an entertaining and enlightening exploration of the business side of data with an industry veteran.

Read More

Adding Support For Distributed Transactions To The Redpanda Streaming Engine - Episode 227

Transactions are a necessary feature for ensuring that a set of actions are all performed as a single unit of work. In streaming systems this is necessary to ensure that a set of messages or transformations are all executed together across different queues. In this episode Denis Rystsov explains how he added support for transactions to the Redpanda streaming engine. He discusses the use cases for transactions, the different strategies, semantics, and guarantees that they might need to support, and how his implementation ended up improving the performance of bulk write operations. This is an interesting deep dive into the internals of a high performance streaming engine and the details that are involved in building distributed systems.

Read More

Delivering Your Personal Data Cloud With Prifina - Episode 225

The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience. The team at Prifina have built a platform that allows users to create their own personal data cloud and install applications built by developers that power useful experiences while keeping you in full control. In this episode Markus Lampinen shares the goals and vision of the company, the technical aspects of making it a reality, and the future vision for how services can be designed to respect user’s privacy while still providing compelling experiences.

Read More

Massively Parallel Data Processing In Python Without The Effort Using Bodo - Episode 223

Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any re-work.

Read More

An Exploration Of The Data Engineering Requirements For Bioinformatics - Episode 221

Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities. In this episode Jillian Rowe shares her experience of working in the field and supporting teams of scientists and analysts with the data infrastructure that they need to get their work done. This is a fascinating exploration of the collaboration between data professionals and scientists.

Read More

Enabling Version Controlled Data Collaboration With TerminusDB - Episode 167

As data professionals we have a number of tools available for storing, processing, and analyzing data. We also have tools for collaborating on software and analysis, but collaborating on data is still an underserved capability. Gavin Mendel-Gleason encountered this problem first hand while working on the Sesshat databank, leading him to create TerminusDB and TerminusHub. In this episode he explains how the TerminusDB system is architected to provide a versioned graph storage engine that allows for branching and merging of data sets, how that opens up new possibilities for individuals and teams to work together on building new data repositories. This is a fascinating conversation on the technical challenges involved, the opportunities that such as system provides, and the complexities inherent to building a successful business on open source.

Read More

Bringing Feature Stores and MLOps to the Enterprise At Tecton - Episode 166

As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner. As a result the feature store is becoming a required piece of the data platform. To fill that need Kevin Stumpf and the team at Tecton are building an enterprise feature store as a service. In this episode he explains how his experience building the Michelanagelo platform at Uber has informed the design and architecture of Tecton, how it integrates with your existing data systems, and the elements that are required for well engineered feature store.

Read More

Low Friction Data Governance With Immuta - Episode 164

Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure. In this episode Steve Touw and Stephen Bailey share what they have built at Immuta, how it is implemented, and how it streamlines the workflow for everyone involved in working with sensitive data. If you are starting down the path of implementing a data governance strategy then this episode will provide a great overview of what is involved.

Read More