ProtoCycle-7B
RL checkpoint for ProtoCycle — an agentic protein design model that performs multi-step, tool-augmented sequence design.
This is the GRPO-TCR (Group Relative Policy Optimization with Tool-Call
Reward) stage, initialised from the SFT checkpoint
Huggggooo/ProtoCycle-7B-SFT.
- Base model:
Huggggooo/ProtoCycle-7B-SFT(itself fine-tuned fromQwen/Qwen2.5-7B-Instruct) - Training framework: VeRL / Open-AgentRL
- Stage: agentic RL with GRPO-TCR
- Rollouts per prompt: 8, max turns: 16
- Max prompt / response: 8k / 20k tokens
- Reward manager:
protein(see ProtoCycle/verl/workers/reward_manager/protein.py)
See
recipe/protein/reward.py
for the exact formulation.
Training Data
10,000 RL prompts for GRPO-TCR training, available at
Huggggooo/ProtoCycle-Data (rl/ subset).}
Agent Protocol
<think> ... reasoning ... </think>
<plan> ... stage plan ... </plan>
<tool_call>{"name": "...", "arguments": {...}}</tool_call>
...
<answer>MAEGEITPLKTF...</answer>
How to Use
See the ProtoCycle repository: ProtoCycle repo.
License
Apache-2.0.
Citation
If you find this work useful, please cite ProtoCycle (forthcoming) and the upstream frameworks: VeRL, Open-AgentRL, ProTrek, ESM.
- Downloads last month
- 763