Character-Level MLP Language Model

A character-level language model built by scaling the earlier bigram approach into an MLP: token embeddings, a hidden layer, logits, cross-entropy loss, mini-batch gradient descent, train/validation/test splits, and a few manual architecture experiments. The implementation follows Karpathy’s makemore path and uses the names.txt dataset to predict the next character from a fixed context window.

5 notebooks: MLP language model

Notebook Summary
01 Build MLP Move from bigram counts to learned character embeddings and a one-hidden-layer MLP
02 Train MLP Overfit a small batch, train with mini-batches, choose a learning rate, and decay it after plateaus
03 Test Split Diagnose overfitting with held-out validation data and keep the test set clean
04 Experiment 1: Larger Hidden Layer Increase hidden units, inspect noisier optimization, and visualize 2D character embeddings
05 Experiment 2: Larger Embeddings Increase embedding dimensions, reshape inputs correctly, and sample from the improved model

Sources: Andrej Karpathy - Zero to Hero · makemore · Bengio et al. 2003 · exercises

Notebook	Summary
`01` Build MLP	Move from bigram counts to learned character embeddings and a one-hidden-layer MLP
`02` Train MLP	Overfit a small batch, train with mini-batches, choose a learning rate, and decay it after plateaus
`03` Test Split	Diagnose overfitting with held-out validation data and keep the test set clean
`04` Experiment 1: Larger Hidden Layer	Increase hidden units, inspect noisier optimization, and visualize 2D character embeddings
`05` Experiment 2: Larger Embeddings	Increase embedding dimensions, reshape inputs correctly, and sample from the improved model

Character-Level MLP Language Model

Graph View

Backlinks

Explorer