Bigram Language Model & Neural Net Equivalent

A bigram model predicts the next character based only on the current one — the simplest possible language model. Built first as a counting model (character pair frequencies → probabilities), then rebuilt as an equivalent single-layer neural network trained with gradient descent, showing both approaches converge to the same solution.

5 notebooks + 2 helpers: bigram model and NN equivalent

NotebookSummary
01 Define Bigram ModelCharacter pair counting; building the probability table
02 SamplingGenerating names by sampling from the distribution
helper Broadcasting TensorsPyTorch broadcasting rules with worked examples
03 Loss & SmoothingNegative log-likelihood loss; Laplace smoothing
04 Bigrams → Neural NetOne-hot input → linear layer → softmax; mathematically equivalent to counting
helper One-Hot EncodingHow one-hot vectors encode categorical inputs
05 OptimisationTraining loop; gradient descent convergence

Sources: Andrej Karpathy - Zero to Hero