WALAR - a lyf07 Collection

lyf07 's Collections

WALAR

updated Mar 17

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

lyf07/LLaMAX3-8B-Alpaca-WALAR

Translation • 8B • Updated Mar 21 • 7
lyf07/Qwen3-8B-WALAR

Translation • 8B • Updated Mar 21 • 6
lyf07/Translategemma-4B-it-WALAR

Translation • 769k • Updated Mar 21 • 7
Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Paper • 2603.13045 • Published Mar 13 • 2