A collection of notes from college courses, self-study and research. Domains span mathematics, physics, computer science and occasionally philosophy. I hope to rigorously work through ideas, establish connections across disciplines and build a deep understanding of how the world works.
🛠️ A Learning Mechanic’s Toolkit
A learning mechanic studies learning mechanics—a dynamical and mechanistic perspective on traditional deep learning theory. This toolkit collects instruments for characterizing important properties and statistics of the training process, hidden representations, and final weights of neural networks.
🔧 Deep Dives
Step-by-step derivations, refined expositions
- Deep Linear Networks; A deep dive into Saxe et al. and the role of depth in learning
- exact solutions · training dynamics · deep linear networks
- Deep linear networks are mathematically tractable yet retain some of the mysterious phenomena of deep learning. We derive the exact training dynamics of these toy models and prove that long plateaus and rapid transitions are inherent to depth.
🔨 Notes
Summaries of important phenomena and models and some useful math
-
The lazy (NTK) and rich (muP) regimes
- infinite limits · lazy/rich
- By enforcing stable training criteria on a simple 3-layer linear network, we entirely determine all initialization hyperparameters with a single degree of freedom defined as the richness parameter.
-
When (wide) neural networks become linear
- infinite limits · neural tangent kernel
- As the widths of the layers in a neural network become large, the network becomes approximately equal to its first-order (linear) approximation.
-
Quadratic word embedding model (QWEM)
- exact solutions · feature learning · word embeddings
- The second-order approximation of the Word2Vec loss yields an equivalent supervised matrix factorization loss. This means we can study a minimal language model through a highly mathematically tractable model in matrix factorization.
-
Maximal stable learning rate derivation
- optimization phenomena · edge of stability
- Given a simple and well-behaved loss (constant Hessian), we analytically derive the maximal stable learning rate under gradient descent.
-
Singular values under perturbation
🧮 Math Proofs
Proving cool math theorems
- Weierstrass Approximation Theorem
- Polynomials approximate continuous functions very well.
- Riesz Representation Theorem
- A vector space and its dual are always in bijection.
🌱 Exploratory Notes
Raw notes, incomplete thoughts and ongoing learning
Mathematics
Computer Science
Margins
A small subset of my many thoughts
Readings that influence how I think
- The Unreasonable Effectiveness of Mathematics in the Natural Sciences
- AI, Values, and Alignment
- Why Greatness Cannot Be Planned
- Slow Productivity
- Man’s Search for Meaning
- More is Different
- Sometimes there is nothing wrong with letting a child drown