From Data Engineering to AI Engineering: Where the Lines Blur

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

14 December 2025

From Data Engineering to AI Engineering: Where the Lines Blur - E492

0:00/0:00

Share on social media:

Description
Transcript
Chapters

Summary
In this solo episode of the Data Engineering Podcast, host Tobias Macey reflects on how AI has transformed the practice and pace of data engineering over time. Starting from its origins in the Hadoop and cloud warehouse era, he explores the discipline's evolution through ML engineering and MLOps to today's blended boundaries between data, ML, and AI engineering. The conversation covers how unstructured data is becoming more prominent, vectors and knowledge graphs are emerging as key components, and reliability expectations are changing due to interactive user-facing AI. The host also delves into process changes, including tighter collaboration, faster dataset onboarding, new governance and access controls, and the importance of treating experimentation and evaluation as fundamental testing practices.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.
Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at MongoDB.com/Build
Your host is Tobias Macey and today I'm interviewing reflecting about the increasingly blurry boundaries between data engineering and AI engineering

Interview

Introduction
I started this podcast in 2017, right when the term "Data Engineer" was becoming widely used for a specific job title with a reasonably well-understood set of responsibilities. This was in response to the massive hype around "data science" and consequent hiring sprees that characterized the mid-2000s to mid-2010s. The introduction of generative AI and AI Engineering to the technical ecosystem is changing the scope of responsibilities for data engineers and other data practitioners. Of note is the fact that:
AI models can be used to process unstructured data sources into structured data assets
AI applications require new types of data assets
The SLAs for data assets related to AI serving are different from BI/warehouse use cases
The technology stacks for AI applications aren't necessarily the same as for analytical data pipelines
Because everything is so new there is not a lot of prior art, and the prior art that does exist isn't necessarily easy to find because of differences in terminology
Experimentation has moved from being just an MLOps capability into being a core need for organizations

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

AI Engineering Podcast

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Introduction and purpose: AI is reshaping data work