Your LLM issues are really data issues
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “Your LLM issues are really data issues” inside PodZeus.
In this episode of The Stack Overflow Podcast, host Ryan Donovan sits down with Harsha Chintalapani, co-founder and CTO of Collate and co-creator of the open-source metadata platform OpenMetadata. The conversation centers on a critical but often overlooked truth: the real challenges with AI and large language models (LLMs) aren't technical limitations of the models themselves, but rather deep-rooted data issues—especially around structured, real-time production data. Harsha draws from his extensive experience at Yahoo, Hortonworks, and Uber to illustrate how even massive organizations struggle with data semantics, ownership, lineage, and discoverability. He explains that while cloud platforms have solved the problem of data storage and processing, they haven't addressed the human and organizational challenges of understanding what the data actually means. The episode argues that AI’s failure to deliver on promises like 'understanding unstructured data' stems from a lack of explicit, shared business semantics—such as what constitutes a 'customer' or 'customer health'—which are often tacit and inconsistent across teams. The solution, Harsha emphasizes, lies in building robust metadata infrastructure early, using platforms like OpenMetadata to create a centralized, searchable, and semantically rich knowledge graph of data assets. This enables not only better data discovery and governance but also empowers AI to function effectively by giving it context. The episode concludes with practical advice: organizations should start investing in metadata and semantic governance as soon as they form a data team, not after they’ve scaled into chaos.
AI failures with structured data are not due to model limitations, but due to poor data semantics, ownership, and discoverability.
The core problem is organizational: business concepts like 'customer' or 'ARR' are defined differently across teams, creating ambiguity.
Metadata is the foundation of trustworthy AI—without it, LLMs can't understand context or find the right data.
Start building metadata and semantic governance as soon as you form a data team, not after scaling into data chaos.
Open-source platforms like OpenMetadata can automate metadata collection, lineage tracking, and semantic cataloging to enable AI-ready data.
The AI Data Reality Check
“AI's failure isn't because the models aren't smart enough—it's because the data they're fed lacks context, ownership, and shared meaning.”
Uber's Data Chaos: A Case Study
“When a data pipeline fails and no one notices, it’s not a technical failure—it’s a cultural failure to treat data like code.”
Building the Data Knowledge Graph
“AI should not make data ready for AI—AI should make data ready for itself, using a shared, documented understanding of business concepts.”
Actionable Steps for Teams
Harsha offers practical advice: start with metadata automation, build glossaries and metrics catalogs, and use open-source tools like OpenMetadata to create a unified, governed data ecosystem before scaling.
“AI's failure isn't because the models aren't smart enough—it's because the data they're fed lacks context, ownership, and shared meaning.”
“AI should not make data ready for AI—AI should make data ready for itself, using a shared, documented understanding of business concepts.”
“When a data pipeline fails and no one notices, it’s not a technical failure—it’s a cultural failure to treat data like code.”
Host
Guest
Harsha Chintalapani
person
Uber
organization
OpenMetadata
organization
Ryan Donovan
person
LLM
other
Colate
organization
Yahoo
organization
Snowflake
other
Hortonworks
organization
Hadoop
other
Seizing the means of messenger production
The Stack Overflow Podcast • 28m • 4/3/2026
He designed C++ to solve your code problems
The Stack Overflow Podcast • 33m • 4/7/2026
The messy truth of your AI strategies
The Stack Overflow Podcast • 31m • 4/10/2026
Who needs VCs when you have friends like these?
The Stack Overflow Podcast • 33m • 4/14/2026
No country left behind with sovereign AI
The Stack Overflow Podcast • 33m • 4/17/2026
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “Your LLM issues are really data issues” inside PodZeus.
Start discovering podcast insights today
Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.
No credit card required • 7-day trial • Cancel anytime
