Bigram Language Model & Neural Net Equivalent
A bigram model predicts the next character based only on the current one — the simplest possible language model. Built first as a counting model (character pair frequencies → probabilities), then rebuilt as an equivalent single-layer neural network trained with gradient descent, showing both approaches converge to the same solution.
5 notebooks + 2 helpers: bigram model and NN equivalent
Notebook Summary 01Define Bigram ModelCharacter pair counting; building the probability table 02SamplingGenerating names by sampling from the distribution helperBroadcasting TensorsPyTorch broadcasting rules with worked examples 03Loss & SmoothingNegative log-likelihood loss; Laplace smoothing 04Bigrams → Neural NetOne-hot input → linear layer → softmax; mathematically equivalent to counting helperOne-Hot EncodingHow one-hot vectors encode categorical inputs 05OptimisationTraining loop; gradient descent convergence Sources: Andrej Karpathy - Zero to Hero