Template

nanochat: train your own ChatGPT

A template for training a ChatGPT-style LLM end to end with Andrej Karpathy's nanochat: tokenizer, pretraining, SFT, and eval in a single script.

OpenResearch lets you fork ML templates and launch experiments effortlessly. A project by alphaXiv.

Pretraining loss

The base model's loss over the pretraining run, dropping as it learns to predict the next token on ClimbMix. This template defaults to a depth-8 model for rapid testing, so you can validate the full pipeline in minutes. Extend it to depth-24 to train your own ChatGPT-style model you can actually chat with.

2.502.813.133.443.75077155232309TRAINING STEPLOSS

Pretraining loss, 15-step moving average. From an actual run of this template.

The pipeline

One script, runs/speedrun.sh, runs every stage of building an LLM from scratch. It auto-detects your GPU count and compute capability to size the batch and toggle fp8, so the same run works on 1 GPU or 8.

01
Tokenizer
Train a BPE tokenizer on ~2B chars
02
Pretrain
Train the base GPT on ClimbMix
03
Base eval
CORE score, bits-per-byte, samples
04
SFT
Fine-tune on SmolTalk + identity
05
Chat + report
Chat eval, then a markdown report

Under the hood

Entrypointbash runs/speedrun.sh
ArchitectureVanilla PyTorch GPT, Muon + AdamW optimizer
TokenizerCustom BPE (rustbpe), vocab 32,768
DataClimbMix shards, streamed and cached on the sandbox
StagesTokenizer, pretrain, base eval, SFT, chat eval, report
Hardware1–8 GPU (8×H100 to match upstream timings)

The default depth-8 model is a fast smoke test that exercises every stage. Bump --depth to 20 or 26 and run on 8×H100 to reproduce a real GPT-2-grade model, swap in your own SFT data to give it a personality, or add a task under tasks/ to teach a targeted skill.