Making Data Collection In Your Code Easy With Rookout

Your Host

About this Episode


The software applications that we build for our businesses are a rich source of data, but accessing and extracting that data is often a slow and error-prone process. Rookout has built a platform to separate the data collection process from the lifecycle of your code. In this episode, CTO Liran Haimovitch discusses the benefits of shortening the iteration cycle and bringing non-engineers into the process of identifying useful data. This was a great conversation about the importance of democratizing the work of data collection.


  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • Your host is Tobias Macey and today I’m interviewing Liran Haimovitch, CTO of Rookout, about the business value of operations metrics and other dark data in your organization


  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing the types of data that we typically collect for the systems operations context?
    • What are some of the business questions that can be answered from these data sources?
  • What are some of the considerations that developers and operations engineers need to be aware of when they are defining the collection points for system metrics and log messages?
    • What are some effective strategies that you have found for including business stake holders in the process of defining these collection points?
  • One of the difficulties in building useful analyses from any source of data is maintaining the appropriate context. What are some of the necessary metadata that should be maintained along with operational metrics?
    • What are some of the shortcomings in the systems we design and use for operational data stores in terms of making the collected data useful for other purposes?
  • How does the existing tooling need to be changed or augmented to simplify the collaboration between engineers and stake holders for defining and collecting the needed information?
  • The types of systems that we use for collecting and analyzing operations metrics are often designed and optimized for different access patterns and data formats than those used for analytical and exploratory purposes. What are your thoughts on how to incorporate the collected metrics with behavioral data?
  • What are some of the other sources of dark data that we should keep an eye out for in our organizations?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at


The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast