Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

15 February 2026

Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops - E501

0:00/0:00

Share on social media:

Description
Transcript
Chapters

Summary
In this episode of the Data Engineering Podcast, Aman Agarwal, creator of OpenLit, discusses the operational groundwork required to run LLM-powered applications reliably and cost-effectively. He highlights common blind spots that teams face, including opaque model behavior, runaway token costs, and brittle prompt management, and explains how OpenTelemetry-native observability can turn these black-box interactions into stepwise, debuggable traces across models, tools, and data stores. Aman showcases OpenLit's approach to open standards, vendor-neutral integrations, and practical features such as fleet-managed OTEL collectors, zero-code Kubernetes instrumentation, prompt and secret management, and evaluation workflows. They also explore experimentation patterns, routing across models, and closing the loop from evals to prompt/dataset improvements, demonstrating how better visibility reshapes design choices from prototype to production. Aman shares lessons learned building in the open, where OpenLit fits and doesn't, and what's next in context management, security, and ecosystem integrations, providing resources and examples of multi-database observability deployments for listeners.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at dataengineeringpodcast.com/retool today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.
Your host is Tobias Macey and today I'm interviewing Aman Agarwal about the operational investments that are necessary to ensure you get the most out of your AI models

Interview

Introduction
How did you get involved in the area of AI/data management?
Can you start by giving your assessment of the main blind spots that are common in the existing AI application patterns?
As teams adopt agentic architectures, how common is it to fall prey to those same blind spots?
There are numerous tools/services available now focused on various elements of "LLMOps". What are the major components necessary for a minimum viable operational platform for LLMs?
There are several areas of overlap, as well as disjoint features, in the ecosystem of tools (both open source and commercial). How do you advise teams to navigate the selection process? (point solutions vs. integrated tools, and handling frameworks with only partial overlap)
Can you describe what OpenLit is and the story behind it?
How would you characterize the feature set and focus of OpenLit compared to what you view as the "major players"?
Once you have invested in a platform like OpenLit, how does that change the overall development workflow for the lifecycle of AI/agentic applications?
What are the most complex/challenging elements of change management for LLM-powered systems? (e.g. prompt tuning, model changes, data changes, etc.)
How can the information collected in OpenLit be used to develop a self-improvement flywheel for agentic systems?
Can you describe the architecture and implementation of OpenLit?
How have the scope and goals of the project changed since you started working on it?
Given the foundational aspects of the project that you have built, what are some of the adjacent capabilities that OpenLit is situated to expand into?
What are the sharp edges and blind spots that are still challenging even when you have OpenLit or similar integrated?
What are the most interesting, innovative, or unexpected ways that you have seen OpenLit used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on OpenLit?
When is OpenLit the wrong choice?
What do you have planned for the future of OpenLit?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data/AI management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Intro, guest setup, and Amana Agarwalafs background