A deeper pass through the MLP training dynamics from karpathy-03-mlp: initial loss calibration, saturated tanh activations, gradient flow, principled weight initialization, and batch normalization. The goal is to inspect what the network is doing internally, not just whether the final loss goes down.
1 notebook: activations, gradients, and batch normalization
Notebook Summary 01BatchNormRebuild the MLP, diagnose bad initialization and tanhsaturation, then stabilize training with BatchNormSources: Andrej Karpathy - Zero to Hero · makemore