EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
Abstract
EvoTrainer autonomously evolves both language model policies and training harnesses through empirical feedback, demonstrating superior performance in complex reasoning and coding tasks compared to traditional handcrafted approaches.
Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions, and accumulates reusable skills. Evaluated on mathematical reasoning, competitive-programming code generation, and repository-level software engineering, EvoTrainer matches or exceeds the human-engineered RL references under the same data, codebase, and evaluation protocol, with the largest gain on long-horizon agentic SWE. Trajectory analyses show that retained strategies diverge across domains, evolving diagnostics prevent invalid high-scoring branches from being promoted, and reusable skills shape later search. Autonomous LLM RL should move beyond recipe search toward joint evolution of policies and the training harnesses that interpret them.
Community
Autonomous training for large language models (LLMs) is entering a new era. Rather than relying on a static recipe, EvoTrainer enables LLM policies and their training harnesses to evolve jointly over time. This is more than conventional AI development, it is AI evolution in action.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CODESKILL: Learning Self-Evolving Skills for Coding Agents (2026)
- SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision (2026)
- Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills (2026)
- ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL (2026)
- SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories (2026)
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills (2026)
- Skill-R1: Agent Skill Evolution via Reinforcement Learning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.03108 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper