A deeper pass through the MLP training dynamics from karpathy-03-mlp: initial loss calibration, saturated tanh activations, gradient flow, principled weight initialization, and batch normalization. The goal is to inspect what the network is doing internally, not just whether the final loss goes down.

1 notebook: activations, gradients, and batch normalization

NotebookSummary
01 BatchNormRebuild the MLP, diagnose bad initialization and tanh saturation, then stabilize training with BatchNorm

Sources: Andrej Karpathy - Zero to Hero · makemore