A character-level language model built by scaling the earlier bigram approach into an MLP: token embeddings, a hidden layer, logits, cross-entropy loss, mini-batch gradient descent, train/validation/test splits, and a few manual architecture experiments. The implementation follows Karpathy’s makemore path and uses the names.txt dataset to predict the next character from a fixed context window.

5 notebooks: MLP language model

NotebookSummary
01 Build MLPMove from bigram counts to learned character embeddings and a one-hidden-layer MLP
02 Train MLPOverfit a small batch, train with mini-batches, choose a learning rate, and decay it after plateaus
03 Test SplitDiagnose overfitting with held-out validation data and keep the test set clean
04 Experiment 1: Larger Hidden LayerIncrease hidden units, inspect noisier optimization, and visualize 2D character embeddings
05 Experiment 2: Larger EmbeddingsIncrease embedding dimensions, reshape inputs correctly, and sample from the improved model

Sources: Andrej Karpathy - Zero to Hero · makemore · Bengio et al. 2003 · exercises