ThreatStack: Data Driven Cloud Security with Pete Cheslock and Patrick Cable

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

01 April 2018

ThreatStack: Data Driven Cloud Security with Pete Cheslock and Patrick Cable - Episode 25 - E25

0:00/0:00

Share on social media:

Summary

Cloud computing and ubiquitous virtualization have changed the ways that our applications are built and deployed. This new environment requires a new way of tracking and addressing the security of our systems. ThreatStack is a platform that collects all of the data that your servers generate and monitors for unexpected anomalies in behavior that would indicate a breach and notifies you in near-realtime. In this episode ThreatStack’s director of operations, Pete Cheslock, and senior infrastructure security engineer, Patrick Cable, discuss the data infrastructure that supports their platform, how they capture and process the data from client systems, and how that information can be used to keep your systems safe from attackers.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management
When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt.
Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
Your host is Tobias Macey and today I’m interviewing Pete Cheslock and Pat Cable about the data infrastructure and security controls at ThreatStack

Interview

Introduction
How did you get involved in the area of data management?
Why don’t you start by explaining what ThreatStack does?
- What was lacking in the existing options (services and self-hosted/open source) that ThreatStack solves for?

Can you describe the type(s) of data that you collect and how it is structured?

What is the high level data infrastructure that you use for ingesting, storing, and analyzing your customer data?
- How do you ensure a consistent format of the information that you receive?
- How do you ensure that the various pieces of your platform are deployed using the proper configurations and operating as intended?
- How much configuration do you provide to the end user in terms of the captured data, such as sampling rate or additional context?

I understand that your original architecture used RabbitMQ as your ingest mechanism, which you then migrated to Kafka. What was your initial motivation for that change?
- How much of a benefit has that been in terms of overall complexity and cost (both time and infrastructure)?