The key to making data valuable to business users is the ability to calculate meaningful metrics and explore them along useful dimensions. Business intelligence tools have provided this capability for years, but they don’t offer a means of exposing those metrics to other systems. Metriql is an open source project that provides a headless BI system where you can define your metrics and share them with all of your other processes. In this episode Burak Kabakcı shares the story behind the project, how you can use it to create your metrics definitions, and the benefits of treating the semantic layer as a dedicated component of your platform.
Datafold is a data observability platform that helps companies prevent data catastrophes. It has a unique ability to identify, prioritize and investigate data quality issues proactively before they affect production. Datafold gives you visibility and confidence in the quality of your analytical data with fast dataset diffing, profiling, column-level lineage, and intelligent anomaly detection. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI, so in a few minutes you can get from 0 to automated testing of your analytical code.
Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $100 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!
Have you ever woken up to a crisis because a number on a dashboard is broken and no one knows why? Or sent out frustrating slack messages trying to find the right data set? Or tried to understand what a column name means?
Our friends at Atlan started out as a data team themselves and faced all this collaboration chaos themselves, and started building Atlan as an internal tool for themselves. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more.
Go to dataengineeringpodcast.com/atlan and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription.
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask.
- Your host is Tobias Macey and today I’m interviewing Burak Emre Kabakcı about Metriql, a headless BI and metrics layer for your data stack
- How did you get involved in the area of data management?
- Can you describe what Metriql is and the story behind it?
- What are the characteristics and benefits of a "headless BI" system?
- What was your motivation to create and open-source Metriql as an independent project outside of your business?
- How are you approaching governance and sustainability of the project?
- How does Metriql compare to projects such as AirBnB’s Minerva or Transform’s platform?
- How does the industry/vertical of a business impact their ability to benefit from a metrics layer/headless BI?
- What are the limitations to the logical complexity that can be applied to the calculation of a given metric/set of metrics?
- Can you describe how Metriql is implemented?
- How have the design and goals of the project changed or evolved since you began working on it?
- What are the most complex/difficult engineering elements of building a metrics layer?
- Can you describe the workflow of defining metrics?
- What have been your guiding principles in defining the user experience for working with metriql?
- What are the opportunities for including business users in the definition of metrics? (e.g. pushing down/generating definitions from a BI layer)
- What are the biggest challenges and limitations of creating metrics definitions purely in SQL?
- What are the options for exposing metrics back to the warehouse and other operational systems such as reverse ETL vendors?
- What are the missing elements in the data ecosystem for taking full advantage of a headless BI/metrics layer?
- What are the most interesting, innovative, or unexpected ways that you have seen Metriql used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Metriql?
- When is Metriql the wrong choice?
- What do you have planned for the future of Metriql?
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Headless BI
- Google Data Studio
- The Missing Piece Of The Modern Data Stack article by Benn Stancil