Welcome to the Learning Mechanics DeCal!

Learning Mechanics is the emerging discipline that treats deep learning the way physics treats the natural world: seeking compact mathematical principles, tight connections between theory and experiment, and simple, intuitive explanations for complex phenomena. Pieces of a scientific theory for deep learning are beginning to fit together, and in this course, we will examine what has been assembled so far, what remains contested, and where the field is heading.

Deep learning is among the most powerful technologies humans have ever built, and understanding it promises to be one of the defining intellectual challenges of the early 21st century. As of 2026, the engineering success of deep learning has dramatically outpaced our scientific understanding of it. Closing that gap may amount to founding a genuinely new field of science—one whose implications for our understanding of intelligence, data, and learning extend well beyond the neural networks that motivated it.

Readings draw heavily from the whitepaper There Will Be a Scientific Theory of Deep Learning (Simon et al., 2026) and the primary literature it synthesizes. We will work through the theoretical tools, empirical regularities, and open questions that are laying the groundwork for a physics-like understanding of deep learning.

Course Calendar

Schedule is subject to change.

Wk 1
First Week — No Class
Wk 2

Lecture 1 Introduction I: Learning Mechanics

What’s the evidence for an emerging scientific theory of deep learning?

Reading: Simon et al. (2026)
Wk 3

Lecture 2 Introduction II: Neural Networks

What exactly are neural networks? Why are they hard to study? How will we study them anyways?

Reading: Nielsen (2019) Lecture Notes Homework: optional math review
Wk 4

Lecture 3 Analytically Solvable Settings I: Deep Linear Networks

What can we learn about deep learning from a highly mathematically tractable toy model in deep linear networks?

Wk 5

Lecture 4 Analytically Solvable Settings II + Insightful Limits I: The Neural Tangent Kernel and Kernel Regression

How do neural networks simplify in the infinite-width limit?

Wk 6

Lecture 5 Analytically Solvable Settings III: Eigenlearning and the HEA

How can we develop a mathematical framework to study kernel regression? Can we predict how kernel regression will perform on real data?

Wk 7

Lecture 6 Disentangling Hyperparameters I + Insightful Limits II: The Lazy (NTK) and Rich (μP) Regimes

In the lazy (NTK) regime, neural networks don’t learn any structure. Is there a regime where they do?

Wk 8

Lecture 7 Analytically Solvable Settings IV: Balancedness and Feature Learning

Are there toy models where we can exactly characterize a lazy/rich phase transition?

Wk 9

Lecture 8 Universality I: The Platonic Representation Hypothesis

Do deep learning models learn similar representations of data across diverse architectures?

Wk 10
Thanksgiving Break — No Class
Wk 11

Lecture 10 Universality II: Fourier Features in Learned Representations

What kind of features are learned by language models? How might we characterize where such features come from and how they’re learned?

Wk 12

Lecture 11 Empirical Laws I: The Edge of Stability

Why do neural networks routinely train successfully while hovering on the very brink of numerical divergence?

Wk 13
Buffer Week
Wk 14

Lecture 13 Final Project Hypothesis Presentations

Wk 15

Lecture 14 Final Project Office Hours

Wk 16
RRR Week — No Class

Q&A

What is a DeCal?


This site uses Just the Docs, a documentation theme for Jekyll.