Bebop MTP TV-Loss

Public

Reproduce the core claim of 2606.12370 (Bebop): training an EAGLE3/MTP draft head with the paper's TV / e2e-TV loss yields higher rejection-sampling acceptance and a flatter entropy-acceptance slope than the CE baseline. Minimal single-GPU PoC on open Qwen3.

alphaXiv/specforge-e6f78362

No experiments yet.

GitHubspecforge-e6f78362