nanochat: train your own ChatGPT

A template for training a ChatGPT-style LLM end to end with Andrej Karpathy's nanochat: tokenizer, pretraining, SFT, and eval in a single script.

Launch templateLLearn about OpenResearch →

OpenResearch lets you fork ML templates and launch experiments effortlessly. A project by alphaXiv.

Pretraining loss

The base model's loss over the pretraining run, dropping as it learns to predict the next token on ClimbMix. This template defaults to a depth-8 model for rapid testing, so you can validate the full pipeline in minutes. Extend it to depth-24 to train your own ChatGPT-style model you can actually chat with.

Pretraining loss, 15-step moving average. From an actual run of this template.

The pipeline

One script, runs/speedrun.sh, runs every stage of building an LLM from scratch. It auto-detects your GPU count and compute capability to size the batch and toggle fp8, so the same run works on 1 GPU or 8.

Tokenizer

Train a BPE tokenizer on ~2B chars

↓→

Pretrain

Train the base GPT on ClimbMix

↓→

Base eval

CORE score, bits-per-byte, samples

↓→

SFT

Fine-tune on SmolTalk + identity

↓→

Chat + report

Chat eval, then a markdown report

Under the hood

Entrypoint	bash runs/speedrun.sh
Architecture	Vanilla PyTorch GPT, Muon + AdamW optimizer
Tokenizer	Custom BPE (rustbpe), vocab 32,768
Data	ClimbMix shards, streamed and cached on the sandbox
Stages	Tokenizer, pretrain, base eval, SFT, chat eval, report
Hardware	1–8 GPU (8×H100 to match upstream timings)

The default depth-8 model is a fast smoke test that exercises every stage. Bump --depth to 20 or 26 and run on 8×H100 to reproduce a real GPT-2-grade model, swap in your own SFT data to give it a personality, or add a task under tasks/ to teach a targeted skill.

Launch template