SE Radio 719: Birol Yildiz on Building an Agentic AI SRE
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “SE Radio 719: Birol Yildiz on Building an Agentic AI SRE” inside PodZeus.
In this episode of Software Engineering Radio, host Kanchan interviews Birol Yildiz, CEO and co-founder of Islet, a SaaS company building an AI-powered SRE (Incident Response) system called AI SRE. The conversation dives deep into the architecture, evolution, and philosophy behind creating an agentic AI system that autonomously performs root cause analysis for production incidents. Yildiz explains how the team evolved from a rigid, prescriptive approach using complex workflows and vector databases to a minimalist, agentic model that relies on reasoning loops and 'agentic search'—using simple command-line tools like grep, jq, and bash to query data without overloading context. The AI SRE is designed to complete root cause analysis in under four minutes, a dramatic improvement over manual processes that can take 10 to 60 minutes. The discussion covers key components: orchestration, knowledge layer (using plain-text long-term memory instead of vector databases), evaluation via semantic tests and LLM judges, and the use of sub-agents and forks to manage context. A real-world example illustrates how the AI SRE diagnosed a self-inflicted incident caused by an overly broad network policy during a penetration test, demonstrating its ability to handle ambiguous, novel problems beyond standard runbooks. The episode also explores guardrails, autonomy, GDPR compliance, and the future of AI agents, with Yildiz cautioning against over-engineering and advocating for simplicity, full context control, and letting the model decide the 'how' while humans define the 'what'.
Agentic AI systems should focus on the 'what' (goal) and let the model decide the 'how' (execution), avoiding over-prescriptive scaffolding.
Use 'agentic search'—command-line tools like grep, jq, and bash—to analyze large datasets without polluting context, outperforming vector databases in many cases.
Prioritize full control over context: avoid frameworks like LangChain and MCP servers unless forked and customized to your use case.
Evaluate AI agents using real-world semantic tests and LLM judges, not just synthetic benchmarks, to ensure robustness.
Guardrails are critical: start with human-in-the-loop, use pre-approved actions, and implement hard rules to prevent destructive commands.
…and 2 more takeaways available in PodZeus
Introduction to AI SRE and Birol Yildiz
Host Kanchan introduces the episode and guest Birol Yildiz, CEO of Islet, a Cologne-based SaaS company building an AI SRE for incident response. The focus is on how AI agents are transforming production incident resolution.
Defining Agentic AI: Beyond Workflows
Yildiz distinguishes true AI agents from automated workflows, emphasizing that agentic AI uses reasoning loops to make independent decisions, not just follow pre-defined scripts.
Evolution of the AI SRE: From Simulation to Reasoning
The team initially tried simulating human behavior with browser automation but pivoted to reasoning models and the Model Context Protocol (MCP) in 2024, leading to a more flexible, agentic system.
Agentic Search: The Power of CLI Tools
“Agentic search is a fancy way of just using old school terminal commands, grep, z, jq, yeah.”
Architecture: Orchestrator, Knowledge, and Sub-agents
The AI SRE uses an orchestrator service, a lightweight knowledge layer (plain text memory), and dynamic sub-agents/forks to manage context and scale analysis during incidents.
“The more we hand this task over to agents, there will be incidents that are novel in the sense that whatever contributed to that incident was maybe due to the fact that there is a large amount of code being generated by AI.”
“This would never made it into a runbook. No runbook would tell you that when you have a penetration test and this happens, here's the solution.”
“Benchmark it against Cloud Code, right? Just try to create a similar environment for Cloud Code... If you perform a lot better than Cloud Code, then you know that's probably something is, there is a reason for being right.”
Host
Guest
AI SRE
product
iLert
organization
Birol Yildiz
person
Islet
organization
Model Context Protocol
other
Cloud Code
product
OpenAI
organization
Kubernetes
other
Cursor
product
GitHub
other
SE Radio 714: Costa Alexoglou on Remote Pair Programming
Software Engineering Radio - the podcast for professional software developers • 51m • 4/1/2026
SE Radio 715: Sahaj Garg on Designing for Ambiguity in Human Input
Software Engineering Radio - the podcast for professional software developers • 48m • 4/8/2026
SE Radio 716: Martin Kleppmann Local-First Software
Software Engineering Radio - the podcast for professional software developers • 55m • 4/15/2026
SE Radio 717: Eric Tschetter on Decoupling Observability
Software Engineering Radio - the podcast for professional software developers • 1h 0m • 4/23/2026
SE Radio 718: Will Sentance on JS Modernization
Software Engineering Radio - the podcast for professional software developers • 58m • 4/29/2026
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “SE Radio 719: Birol Yildiz on Building an Agentic AI SRE” inside PodZeus.
Start discovering podcast insights today
Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.
No credit card required • 7-day trial • Cancel anytime
