MaxRL Opus

Public

GRPO vs MaxRL on GSM8K (Qwen3-1.7B + LoRA): comparing advantage normalization by group mean vs std.

rehaanahmad2013/qwen-maxrl-b7d23c6b
No experiments yet.