Activations, Gradients & BatchNorm

A deeper pass through the MLP training dynamics from karpathy-03-mlp: initial loss calibration, saturated tanh activations, gradient flow, principled weight initialization, and batch normalization. The goal is to inspect what the network is doing internally, not just whether the final loss goes down.

1 notebook: activations, gradients, and batch normalization

Notebook Summary
01 BatchNorm Rebuild the MLP, diagnose bad initialization and tanh saturation, then stabilize training with BatchNorm

Sources: Andrej Karpathy - Zero to Hero · makemore

Activations, Gradients & BatchNorm

Graph View

Backlinks

Explorer