Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

21 December 2025

From Context to Semantics: How Metadata Powers Agentic AI - E493

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Share on social media:


Summary 
In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms are evolving from human-centric catalogs into the foundational context layer for AI and agentic systems. They discuss the origins and growth of OpenMetadata and Collate, why “context” is necessary but “semantics” is critical for precise AI outcomes, and how a schema-first, API-first, unified platform enables discovery, observability, and governance in one workflow. They share how AI agents can now automate documentation, classification, data quality testing, and enforcement of policies, and why aligning governance with user identity and intent is essential as agentic access scales. They also dig into scalability strategies, MCP-based agent workflows, AI governance (including model/agent tracking), and the emerging convergence of big data with ontologies to deliver machine-understandable meaning. 

Announcements 
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
  • Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
  • Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.
  • You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at MongoDB.com/Build
  • Your host is Tobias Macey and today I'm interviewing Suresh Srinivas and Sriharsha Chintalapani about how metadata catalogs provide the context clues necessary to give meaning to your data for AI systems
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by giving an overview of the roles that metadata catalogs are playing in the current state of the ecosystem?
  • How has the OpenMetadata platform evolved over the past 4 years?
    • How has the focus on LLMs/generative AI changed the trajectory of services like OpenMetadata?
  • The initial set of use cases for data catalogs was to facilitate discovery and documentation of data assets for human consumption. What are the structural elements of that effort that have paid dividends for an AI audience?
    • How does the AI audience change the requirements around the cataloging and presentation of metadata?
  • One of the constant challenges in data infrastructure now is the tension of making data accessible to AI systems (agentic or otherwise) and incorporating AI into the inner loop of the service. What are the opportunities for bringing AI inside the boundaries of a system like OpenMetadata vs. as a client or consumer of the platform?
  • The key phrase of the past ~2 years is "context engineering". What role does the metadata catalog play in that undertaking?
    • What are the capabilities that the catalog needs to be able to effectively populate and curate that context?
    • How much awareness does the LLM or agent need to have to be able to use the catalog effectively?
  • What does a typical workflow/agent loop look like when it is using something like OpenMetadata in pursuit of knowledge that it needs to achieve an objective?
  • How do agentic use cases strain the existing set of governance frameworks?
    • What new considerations (procedural or technical) need to be factored into governance practices to balance velocity with security?
  • What are the most interesting, innovative, or unexpected ways that you have seen OpenMetadata/Collate used in AI/agentic contexts?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on OpenMetadata/Collate?
  • When is OpenMetadata/Collate the wrong choice?
  • What do you have planned for the future of OpenMetadata?
Contact Info

Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
 

Share on social media:


Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

© 2025 Boundless Notions, LLC.
EPISODE SPONSORS MongoDB
MongoDB

You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at MongoDB.com/Build

https://MongoDB.com/Build
Prefect
Prefect

Your data stack has evolved but your orchestration hasn't. While you've added ML models, streaming pipelines, and real-time analytics, you're still forcing them through tools designed for yesterday's simple ETL jobs. Today, companies like Cash App, Whoop, and 1Password trust Prefect to orchestrate their entire data lifecycle. Your ETL, ML training, streaming, everything from ingestion to activation in one platform. Stop building orchestration logic and start shipping AI-ready data products. Visit https://prefec.tv/4nGV4yv to get started for free.

https://prefec.tv/4nGV4yv
Bruin
Bruin

Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.

https://getbruin.com/?utm_source=dataengineeringpodcast
Datafold
Datafold

Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.

https://get.datafold.com/monitors-blog-de-podcast