Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast2h 37mMay 15, 2026

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “Eric Jang – Building AlphaGo from scratch” inside PodZeus.

AI-Generated Summary

In this comprehensive three-part episode of the Dwarkesh Podcast, host Dwarkesh Patel engages in a deep and insightful conversation with Eric Jang, former VP of AI at 1x Technologies and senior research scientist at Google DeepMind Robotics, about his sabbatical project to rebuild AlphaGo from scratch. Jang unpacks the immense computational challenge of Go—its astronomical game tree complexity—and explains how AlphaGo overcame it by fusing Monte Carlo Tree Search (MCTS) with deep neural networks: a policy network to predict promising moves and a value network to estimate win probabilities. These networks are trained via self-play and supervised learning, then used to guide MCTS, which iteratively refines the policy through simulated game paths. The breakthrough lies not just in the algorithm but in the elegant insight that a 10-layer neural network can compress an intractable search into a single forward pass, effectively turning an NP-hard problem into a tractable one. Jang contrasts this with modern LLM training, where sparse, end-of-trajectory rewards lead to high-variance gradients and inefficient learning, calling it 'supervision through a straw.' He emphasizes the power of MCTS as a low-variance, high-quality supervision engine that enables stable, scalable improvement. The discussion extends to neural fictitious self-play as a scalable alternative to MCTS in complex environments and the potential of LLMs to automate AI research—though with limitations in strategic planning and recognizing dead ends. Jang also reflects on the broader implications: the transferability of skills from game-based AI research to LLM development, the importance of outer verification loops like win rates, and the profound conceptual link between MCTS and reasoning in LLMs, suggesting that studying Go may offer deep insights into general intelligence with minimal compute. The episode closes with a call to explore Jang’s open-source project and blog posts for deeper understanding. The conversation reveals a deep reverence for the foundational principles behind AlphaGo’s success, highlighting its role as a paradigm for efficient, structured learning in complex domains. Jang’s reflections underscore that while scaling compute (the 'bitter lesson') is powerful, the real breakthroughs come from architectural insights that compress complexity into manageable forms. The episode consistently emphasizes the importance of high-quality supervision, iterative refinement, and the need for robust evaluation mechanisms in AI self-improvement. Despite acknowledging the uncertainties around long-term transferability of game AI skills, the overall sentiment remains highly positive, celebrating the elegance, efficiency, and philosophical depth of AlphaGo’s design. The discussion bridges technical detail with big-picture thinking, positioning AlphaGo not just as a milestone in game AI, but as a blueprint for future advances in artificial general intelligence.

Key Takeaways
1

A 10-layer neural network can compress an intractable search problem like Go into a single forward pass, enabling efficient solutions to NP-hard problems.

2

Monte Carlo Tree Search (MCTS) combined with policy and value networks generates high-quality, low-variance supervision, making reinforcement learning far more stable and efficient than naive policy gradients.

3

AlphaGo’s success demonstrates that many complex problems are tractable in practice due to hidden structure, suggesting our understanding of computational complexity may be incomplete.

4

Modern LLM training suffers from 'supervision through a straw'—sparse, end-of-trajectory rewards that cause high gradient variance, unlike AlphaGo’s continuous, high-fidelity feedback.

5

Outer verification loops (e.g., win rates) are crucial for guiding AI self-improvement, but designing effective ones for broader utility remains a major challenge.

…and 3 more takeaways available in PodZeus

Chapters
0:13
2 min

Why AlphaGo Matters: The Birth of a Vision

It was just profound to see how smart AI systems could become and the kind of computational complexity class that they could tackle with deep learning.

Highlight
2:30
5 min

The Rules and Complexity of Go: A Game of Deep Strategy

Jang walks through the rules of Go, emphasizing its simplicity and deep strategic complexity. He contrasts Chinese, Japanese, and Trump-Taylor rules, highlighting how Trump-Taylor’s unambiguous scoring enables algorithmic resolution. He illustrates key concepts like capturing stones, territory control, and the endgame, showing how the game’s structure—where losing a battle can win the war—creates rich micro-macro dynamics that challenge both humans and AIs.

7:30
5 min

The Search Problem: Why Go Was Thought Intractable

Jang explains why Go was considered computationally intractable: the game tree has an astronomical number of possible paths—far exceeding the number of atoms in the universe. He introduces the concept of tree search and the explosive branching factor, showing why exhaustive search is impossible. He traces the evolution of search algorithms, from early bandit methods like UCB1 to the PUCKED criterion used in AlphaGo, which balances exploration and exploitation.

12:30
84 min

AlphaGo’s Core: Neural Networks as Search Accelerators

A 10-layer neural network pass... is able to amortize and approximate to a very, very high fidelity a nearly intractable search problem.

Highlight
1:27:43
2 min

The Problem with Naive Policy Gradient in Go

It's interesting that this thing you're saying which would be intractable and prevents you from actually getting beyond a certain level in Go is just by default how LLMs are trained?

Highlight
High-Impact Quotes
A 10-layer neural network pass... is able to amortize and approximate to a very, very high fidelity a nearly intractable search problem.
Eric Jang77:40
Viral: 92.0
The major reason is that you never have to initialize at a 0% success rate and solve the exploration problem of how to get a non-zero success rate.
Eric Jang91:30
Viral: 90.0
It was just profound to see how smart AI systems could become and the kind of computational complexity class that they could tackle with deep learning.
Eric Jang0:46
Viral: 85.0
Speakers

Host

Dwarkesh Patel

Guest

Eric Jang
Topics Discussed
Thinking as a Primitive in AI95%AlphaGo Architecture95%monte carlo tree search95%reinforcement learning variance92%Outer Verification Loops90%credit assignment in rl88%Neural Network Training in Games85%Transfer Learning in AI Research80%
People & Brands

AlphaGo

other

42xPositive

Monte Carlo Tree Search

other

35xPositive

Eric Jang

person

31xPositive

Go

media

22xNeutral

LLMs

other

16xNeutral

Dwarkesh Patel

person

8xPositive

Neural Fictitious Self Play

other

6xPositive

KataGo

other

6xPositive

Google DeepMind

organization

5xPositive

Q-learning

other

5xPositive

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “Eric Jang – Building AlphaGo from scratch” inside PodZeus.

Start discovering podcast insights today

Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.

No credit card required • 7-day trial • Cancel anytime