Template
nanochat: train your own ChatGPT
A template for training a ChatGPT-style LLM end to end with Andrej Karpathy's nanochat: tokenizer, pretraining, SFT, and eval in a single script.
OpenResearch lets you fork ML templates and launch experiments effortlessly. A project by alphaXiv.
Pretraining loss
The base model's loss over the pretraining run, dropping as it learns to predict the next token on ClimbMix. This template defaults to a depth-8 model for rapid testing, so you can validate the full pipeline in minutes. Extend it to depth-24 to train your own ChatGPT-style model you can actually chat with.
Pretraining loss, 15-step moving average. From an actual run of this template.
The pipeline
One script, runs/speedrun.sh, runs every stage of building an LLM from scratch. It auto-detects your GPU count and compute capability to size the batch and toggle fp8, so the same run works on 1 GPU or 8.
Under the hood
| Entrypoint | bash runs/speedrun.sh |
| Architecture | Vanilla PyTorch GPT, Muon + AdamW optimizer |
| Tokenizer | Custom BPE (rustbpe), vocab 32,768 |
| Data | ClimbMix shards, streamed and cached on the sandbox |
| Stages | Tokenizer, pretrain, base eval, SFT, chat eval, report |
| Hardware | 1–8 GPU (8×H100 to match upstream timings) |
The default depth-8 model is a fast smoke test that exercises every stage. Bump --depth to 20 or 26 and run on 8×H100 to reproduce a real GPT-2-grade model, swap in your own SFT data to give it a personality, or add a task under tasks/ to teach a targeted skill.