Your LLM issues are really data issues

The Stack Overflow Podcast31mApril 28, 2026

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “Your LLM issues are really data issues” inside PodZeus.

AI-Generated Summary

In this episode of The Stack Overflow Podcast, host Ryan Donovan sits down with Harsha Chintalapani, co-founder and CTO of Collate and co-creator of the open-source metadata platform OpenMetadata. The conversation centers on a critical but often overlooked truth: the real challenges with AI and large language models (LLMs) aren't technical limitations of the models themselves, but rather deep-rooted data issues—especially around structured, real-time production data. Harsha draws from his extensive experience at Yahoo, Hortonworks, and Uber to illustrate how even massive organizations struggle with data semantics, ownership, lineage, and discoverability. He explains that while cloud platforms have solved the problem of data storage and processing, they haven't addressed the human and organizational challenges of understanding what the data actually means. The episode argues that AI’s failure to deliver on promises like 'understanding unstructured data' stems from a lack of explicit, shared business semantics—such as what constitutes a 'customer' or 'customer health'—which are often tacit and inconsistent across teams. The solution, Harsha emphasizes, lies in building robust metadata infrastructure early, using platforms like OpenMetadata to create a centralized, searchable, and semantically rich knowledge graph of data assets. This enables not only better data discovery and governance but also empowers AI to function effectively by giving it context. The episode concludes with practical advice: organizations should start investing in metadata and semantic governance as soon as they form a data team, not after they’ve scaled into chaos.

Key Takeaways
1

AI failures with structured data are not due to model limitations, but due to poor data semantics, ownership, and discoverability.

2

The core problem is organizational: business concepts like 'customer' or 'ARR' are defined differently across teams, creating ambiguity.

3

Metadata is the foundation of trustworthy AI—without it, LLMs can't understand context or find the right data.

4

Start building metadata and semantic governance as soon as you form a data team, not after scaling into data chaos.

5

Open-source platforms like OpenMetadata can automate metadata collection, lineage tracking, and semantic cataloging to enable AI-ready data.

Chapters
0:00
10 min

The AI Data Reality Check

AI's failure isn't because the models aren't smart enough—it's because the data they're fed lacks context, ownership, and shared meaning.

Highlight
10:00
10 min

Uber's Data Chaos: A Case Study

When a data pipeline fails and no one notices, it’s not a technical failure—it’s a cultural failure to treat data like code.

Highlight
20:00
10 min

Building the Data Knowledge Graph

AI should not make data ready for AI—AI should make data ready for itself, using a shared, documented understanding of business concepts.

Highlight
30:00
2 min

Actionable Steps for Teams

Harsha offers practical advice: start with metadata automation, build glossaries and metrics catalogs, and use open-source tools like OpenMetadata to create a unified, governed data ecosystem before scaling.

High-Impact Quotes
AI's failure isn't because the models aren't smart enough—it's because the data they're fed lacks context, ownership, and shared meaning.
Harsha Chintalapani9:40
Viral: 92.0
AI should not make data ready for AI—AI should make data ready for itself, using a shared, documented understanding of business concepts.
Harsha Chintalapani44:10
Viral: 90.0
When a data pipeline fails and no one notices, it’s not a technical failure—it’s a cultural failure to treat data like code.
Harsha Chintalapani35:50
Viral: 88.0
Speakers

Host

Ryan Donovan

Guest

Harsha Chintalapani
Topics Discussed
AI and Data Semantics95%Metadata Management92%Data Governance90%Data Lineage88%Business Metrics and Definitions87%Organizational Culture and Data85%Data Discovery and Access83%Open Source Data Tools80%
People & Brands

Harsha Chintalapani

person

18xPositive

Uber

organization

15xNeutral

OpenMetadata

organization

14xPositive

Ryan Donovan

person

12xNeutral

LLM

other

10xNeutral

Colate

organization

6xPositive

Yahoo

organization

6xNeutral

Snowflake

other

5xNeutral

Hortonworks

organization

4xNeutral

Hadoop

other

4xPositive

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “Your LLM issues are really data issues” inside PodZeus.

Start discovering podcast insights today

Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.

No credit card required • 7-day trial • Cancel anytime