Notes - Dwarkesh Patel - Ilya Sutskever

November 23, 2025

I. The Current Disconnect: Performance Versus Real-World Impact

Current rapid AI advancement feels like science fiction, yet the "slow takeoff" feels surprisingly normal and abstract. A confusing disconnect exists between the excellent performance of large models on evaluations (evals) and their dramatically lagging economic impact. The impact of AI is not yet strongly felt in the economy.

Models sometimes exhibit strange behavior, such as repeating mistakes or alternating between a first and second bug when trying to fix code. This suggests that "something strange is going on".

II. The Evolution of AI Development: Scaling to Research

The history of modern AI development is defined by two eras:

  1. Age of Research (2012–2020): Characterized by tinkering to achieve interesting results.
  2. Age of Scaling (2020–2025): Driven by the realization that combining compute, data, and neural net size ("the recipe") leads to predictable improvements. This scaling insight offered companies a very low-risk way of investing resources.

Now that the scale is immense, merely multiplying the scale by 100x may not transform the field. Consequently, the industry is transitioning back to the age of research, but armed with significantly larger computers. The bottleneck is shifting back towards ideas and the ability to bring them to life, rather than just compute. Research does not always require the absolute maximum scale of compute; past breakthroughs like AlexNet (two GPUs) and the Transformer (8 to 64 2017-era GPUs) were developed using minimal compute.

III. The Core Problem: Generalization and RL Training Inadequacies

The most fundamental issue is that current models generalize dramatically worse than people. Humans display greater sample efficiency and robustness in learning. Human capability in domains like language, math, and coding suggests they possess fundamentally better machine learning principles, rather than complex evolutionary priors for these skills.

Two possible explanations for poor generalization include:

  1. Narrow RL Training: Reinforcement Learning (RL) might make models "a little too single-minded and narrowly focused, a little bit too unaware".
  2. Eval-Inspired Environments: While pre-training uses essentially all available data, RL training requires researchers to produce new environments. Researchers may inadvertently create these environments by taking inspiration from evals, aiming for positive performance metrics upon release, leading to human researchers "reward hacking" the evals.

The result is that models resemble a student who practices 10,000 hours, memorizing all techniques but lacking the ability to generalize, unlike an ideal learner who achieves high performance with only 100 hours of practice because they have the fundamental "it" factor.

IV. Value Functions, Emotions, and Robustness

To address inefficiency, value functions are necessary to make RL more efficient. In naive RL, learning is slow because the reward signal is only given at the end of a long trajectory. A value function allows the agent to short-circuit the wait by assigning intermediate values, indicating if an action is good or bad before the final outcome (like losing a chess piece).

The ML analogy for emotions should be some kind of a value function. A case study involving a person with brain damage who lost emotional processing ability highlights their utility: the person remained articulate but became extremely bad at making decisions (e.g., taking hours to decide which socks to wear), demonstrating that built-in emotions are crucial for making a human a "viable agent". Emotions are relatively simple but offer enormous utility and robustness across a broad range of situations.

V. Defining Superintelligence and Alignment Strategy

The terms "AGI" and "pre-training" have "overshot the target". A human is not an AGI in the sense of a finished mind, but instead relies on continual learning. The future superintelligence will not be a "finished mind which knows how to do every single job," but rather a mind that can learn to do every single job.

The deployment of these continually learning agents, where instances learn different jobs and amalgamate their learnings, could lead to very rapid economic growth.

The speaker now places more importance on incrementally and gradually deploying AI. This shift is based on the idea that complex systems (like Linux or airplanes) become robust and safe by being deployed to the world, having failures noticed, and corrections made. Deployment is essential because it is currently very difficult to imagine the power and future impact of AI.

The aspiration for future AI companies should be to build an AI robustly aligned to care about sentient life specifically. A system capable of human-like learning that subsequently becomes superhuman is forecasted to be 5 to 20 years away.

A potential, though disliked, long-term equilibrium solution is if people become part-AI via technologies like Neuralink++, allowing understanding to be transmitted wholesale, ensuring human participation.

VI. Competitive Dynamics and Research Taste

In the competitive market, advances made by one company are quickly matched by others, leading to competition that pushes prices down. This competition drives specialization into different economic niches (e.g., one company is good at complicated economic activity, another at litigation). The current lack of diversity among LLMs stems from similar pre-training data; differentiation and diversity are more likely to emerge during RL/post-training through competition.

Research Taste is a personal, guiding aesthetic built on finding beauty, simplicity, and elegance, drawing correct inspiration from the brain (like artificial neurons or distributed representation). This aesthetic generates a top-down belief that helps researchers persist when experimental data seems contradictory, guiding the decision to keep debugging versus abandoning a direction.