A character-level language model built by scaling the earlier bigram approach into an MLP: token embeddings, a hidden layer, logits, cross-entropy loss, mini-batch gradient descent, train/validation/test splits, and a few manual architecture experiments. The implementation follows Karpathy’s makemore path and uses the names.txt dataset to predict the next character from a fixed context window.
5 notebooks: MLP language model
Notebook Summary 01Build MLPMove from bigram counts to learned character embeddings and a one-hidden-layer MLP 02Train MLPOverfit a small batch, train with mini-batches, choose a learning rate, and decay it after plateaus 03Test SplitDiagnose overfitting with held-out validation data and keep the test set clean 04Experiment 1: Larger Hidden LayerIncrease hidden units, inspect noisier optimization, and visualize 2D character embeddings 05Experiment 2: Larger EmbeddingsIncrease embedding dimensions, reshape inputs correctly, and sample from the improved model Sources: Andrej Karpathy - Zero to Hero · makemore · Bengio et al. 2003 · exercises