Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

21 October 2021

Completing The Feedback Loop Of Data Through Operational Analytics With Census - E231

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Share on social media:


Summary

The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there have been a number of trends in data that take advantage of the warehouse as a single focal point. Among those trends is the advent of operational analytics, which completes the cycle of data from collection, through analysis, to driving further action. In this episode Boris Jabes, CEO of Census, explains how the work of synchronizing cleaned and consolidated data about your customers back into the systems that you use to interact with those customers allows for a powerful feedback loop that has been missing in data systems until now. He also discusses how Census makes that synchronization easy to manage, how it fits with the growth of data quality tooling, and how you can start using it today.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!
  • Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
  • Your host is Tobias Macey and today I’m interviewing Boris Jabes about Census and the growing category of operational analytics

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what Census is and the story behind it?
  • The terms "reverse ETL" and "operational analytics" have started being used for similar, and often interchangeable, purposes. What are your thoughts on the semantic and concrete differences between these phrases?
  • What are the motivating factors for adding operational analytics or "data activation" to an organization’s data platform?
    • This is a nascent but quickly growing market with a number of products and projects operating in the space. How would you characterize the current state of the segment and Census’ position in it?
  • Can you describe how the Census platform is implemented?
    • What are some of the early design choices that have had to be refactored or augmented as you have evolved the product and worked with customers?
    • What are some of the assumptions that you had about the needs and uses for the platform which have been challenged or changed as you dug deeper into the problem?
  • Can you describe the workflow for a customer adopting Census?
    • What are some of the data modeling practices that make it easier to "activate" the organization’s data?
  • Another recent trend in the data industry is the growth of data quality and data lineage tools. What is involved in using the measured quality or lineage information as a signal in the operational systems, or to prevent a synchronization?
  • How can users test and validate their workflows in Census?
    • What are the options for propagating Census’ runtime information back into lineage and data quality tracking?
  • Census supports incremental syncs from the warehouse. What are the opportunities for bringing streaming architectures to the space of operational analytics?
    • What are the challenges/complexities in the current set of technologies that act as a barrier?
  • What are the most interesting, innovative, or unexpected ways that you have seen Census used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Census?
  • When is Census the wrong choice?
  • What do you have planned for the future of Census?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast


Share on social media:


Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey