knoveleng/open-rs
Viewer • Updated • 7k • 919 • 11
OpenRS-Star extends the OpenRS project and shows that reinforcement learning can further improve reasoning in small LLMs under tight compute constraints.
This model fine-tunes Qwen3-1.7B using a two-stage length training approach and DAPO-style optimizations on a 7,000-sample mathematical reasoning dataset.
Training was completed using 2× A100s and 2× H200s, for a total cost of under $100.
Applied GRPO + DAPO-style tricks for stability and learning signal quality:
For full details, see the GitHub repository.