Summary
In this episode Robert Nishihara, co-founder of Anyscale and co-creator of Ray, talks about maximizing hardware utilization for AI and data-intensive workloads. He explores Ray’s evolution alongside Kubernetes and PyTorch, and why consolidation at these layers has enabled a new generation of complex, heterogeneous workloads. Robert explains how data preparation has shifted to GPU- and inference-heavy, multimodal pipelines; where Ray fits compared to Spark and workflow orchestrators; and why Ray excels at composing heterogeneous pools of compute, handling failures, and scaling complex systems like multi-node LLM inference and reinforcement learning. He digs into practical strategies for boosting GPU utilization across training and inference, elasticity and prioritization of workloads, topology-aware scheduling, and the importance of fast failure recovery as hardware scales from nodes to racks. If you’re wrestling with expensive GPUs, multimodal data curation, or cross-node LLM inference, this conversation offers concrete mental models and architectural guidance.
Announcements
Parting Question
In this episode Robert Nishihara, co-founder of Anyscale and co-creator of Ray, talks about maximizing hardware utilization for AI and data-intensive workloads. He explores Ray’s evolution alongside Kubernetes and PyTorch, and why consolidation at these layers has enabled a new generation of complex, heterogeneous workloads. Robert explains how data preparation has shifted to GPU- and inference-heavy, multimodal pipelines; where Ray fits compared to Spark and workflow orchestrators; and why Ray excels at composing heterogeneous pools of compute, handling failures, and scaling complex systems like multi-node LLM inference and reinforcement learning. He digs into practical strategies for boosting GPU utilization across training and inference, elasticity and prioritization of workloads, topology-aware scheduling, and the importance of fast failure recovery as hardware scales from nodes to racks. If you’re wrestling with expensive GPUs, multimodal data curation, or cross-node LLM inference, this conversation offers concrete mental models and architectural guidance.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Your host is Tobias Macey and today I'm interviewing Robert Nishihara about the challenges of maximizing the utility of your available hardware for AI applications
- Introduction
- How did you get involved in the area of data management?
- Can you start by giving an overview of the major contributors to wasted or idle compute?
- Why does it matter if the available compute isn't being maximized?
- What are some of the typical ad-hoc methods that teams might use to try to get the most out of their available hardware (especially GPUs)?
- What are the most interesting, innovative, or unexpected ways that you have seen Ray used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Ray and distributed compute for data and AI?
- When is Ray the wrong choice?
- What do you have planned for the future of Ray?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- AnyScale
- Ray
- Deep Learning
- Computer Vision
- Kubernetes
- Cursor
- Claude Code
- Kube-Ray
- PyTorch
- Tensorflow
- Theano
- Caffe
- vLLM
- SGLang
- Ray Tune
- Neural Network
- Learning Rates
- Reinforcement Learning
- AlphaGo
- Cursor Composer 2
- ImageNet
- Transformer Architecture
- Stochastic Gradient Descent
- Airflow
- Dagster
- Flyte
- Mixture of Experts
- Prefill
- Temporal
- Actor Framework
- RDMA == Remote Direct Memory Access
- Neoclouds
- AI Engineering Podcast Episode