CluedIn

Data Labeling That You Can Feel Good About With CloudFactory - Episode 89

Successful machine learning and artificial intelligence projects require large volumes of data that is properly labelled. The challenge is that most data is not clean and well annotated, requiring a scalable data labeling process. Ideally this process can be done using the tools and systems that already power your analytics, rather than sending data into a black box. In this episode Mark Sears, CEO of CloudFactory, explains how he and his team built a platform that provides valuable service to businesses and meaningful work to developing nations. He shares the lessons learned in the early years of growing the business, the strategies that have allowed them to scale and train their workforce, and the benefits of working within their customer’s existing platforms. He also shares some valuable insights into the current state of the art for machine learning in the real world.

Read More

Scale Your Analytics On The Clickhouse Data Warehouse - Episode 88

The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production. It was interesting to learn about some of the custom data types and performance optimizations that are included.

Read More

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection - Episode 87

Anomaly detection is a capability that is useful in a variety of problem domains, including finance, internet of things, and systems monitoring. Scaling the volume of events that can be processed in real-time can be challenging, so Paul Brebner from Instaclustr set out to see how far he could push Kafka and Cassandra for this use case. In this interview he explains the system design that he tested, his findings for how these tools were able to work together, and how they behaved at different orders of scale. It was an interesting conversation about how he stress tested the Instaclustr managed service for benchmarking an application that has real-world utility.

Read More