Bigram Language Model & Neural Net Equivalent

A bigram model predicts the next character based only on the current one — the simplest possible language model. Built first as a counting model (character pair frequencies → probabilities), then rebuilt as an equivalent single-layer neural network trained with gradient descent, showing both approaches converge to the same solution.

5 notebooks + 2 helpers: bigram model and NN equivalent

Notebook Summary
01 Define Bigram Model Character pair counting; building the probability table
02 Sampling Generating names by sampling from the distribution
helper Broadcasting Tensors PyTorch broadcasting rules with worked examples
03 Loss & Smoothing Negative log-likelihood loss; Laplace smoothing
04 Bigrams → Neural Net One-hot input → linear layer → softmax; mathematically equivalent to counting
helper One-Hot Encoding How one-hot vectors encode categorical inputs
05 Optimisation Training loop; gradient descent convergence

Sources: Andrej Karpathy - Zero to Hero

Bigram Language Model & Neural Net Equivalent

Bigram Language Model & Neural Net Equivalent

Graph View

Backlinks

Explorer