MaxRL Opus
Public
GRPO vs MaxRL on GSM8K (Qwen3-1.7B + LoRA): comparing advantage normalization by group mean vs std.
rehaanahmad2013/qwen-maxrl-b7d23c6b
No experiments yet.
GitHub
qwen-maxrl-b7d23c6b
Reports