# Methodology — Composer 2.5 Replication Framework Research This document records *how* the research synthesis in this repo was produced, so the methodology is reproducible and the cross-family verification claim is auditable. ## Research dispatch On 2026-05-25, five parallel research subagents were dispatched via the [`delegate_task`](https://hermes-agent.nousresearch.com/) parallel-research pattern, one per topic. Each was given: - A specific research scope (one of: Composer 2.5 internals; DiLoCo family; Monarch / TorchForge / OpenEnv; VeRL / TRL; trace-replay distillation novelty assessment). - An explicit instruction to write findings to a known path (`~/wiki/research/post-training-framework/0X-.md`). - ~2000–2500 word target depth. - Web-research toolset (Tavily, Exa, AWS docs, MCP doc readers). Each subagent ran independently — no cross-agent communication, no shared intermediate state. They were given a uniform research scope but **routed to five different LLM families** for cross-family signal: | File | Author model | Rationale | |---|---|---| | `research/01-composer-2.5.md` | `google/gemini-3.1-pro-preview` | Long-context grounded research is Gemini's strong suit | | `research/02-diloco-family.md` | `deepseek/deepseek-v4-pro` | Strong on distributed-systems and pretraining literature | | `research/03-monarch-torchforge-openenv.md` | `openai/gpt-5` | Best at reading framework / SDK source code | | `research/04-verl-trl.md` | `anthropic/claude-sonnet-4.6` | Best at algorithmic precision (loss math, importance sampling) | | `research/05-trace-replay-distillation.md` | `moonshotai/kimi-k2-thinking` | Strong at novelty assessment and prior-art discovery | All routes were **verified post-hoc** via the per-task `model` field returned in the delegated agent's session metadata — i.e. the synthesis is not based on a single model's biases. ## Synthesis The master synthesis (`framework/composer-replication-framework.md`) was produced by reading all five reports in full and reconciling: - **Convergent claims** (≥2 independent reports agree) → promoted to framework-level decisions in the TL;DR table. - **Divergent claims** (reports recommend different stacks for the same layer) → noted explicitly with "use X today, switch to Y when Z" rationale rather than picking one arbitrarily. - **Single-source claims** (only one report makes the claim) → kept but flagged as "single-source — may be model bias" where consequential. Convergent findings (verified across reports): - **GRPO+DAPO is the consensus algorithm.** Reports 04 (TRL/VeRL deep-dive), 02 (PRIME-RL section), and 03 (Forge algorithm catalog) all converge on GRPO with DAPO patches as the production default for long-horizon agentic RL. - **PRIME-RL is the most production-ready decentralized substrate.** Reports 02 and 04 independently cite INTELLECT-2 (32B QwQ trained globally distributed) as the only production-scale decentralized RL run to date. - **OpenEnv is the env-format winner.** Reports 03 (Meta's stack), 04 (TRL's Oct 2025 OpenEnv integration), and 05 (env-substrate analysis) all converge on OpenEnv + verifiers as the emerging standard. - **Trace-replay multi-teacher is genuinely under-explored.** Report 05's primary finding, corroborated by the fact that none of the other 4 reports (which surveyed the algorithm and framework literature widely) mention per-step multi-teacher distillation as an existing technique. ## Sources The synthesis cites primary sources inline. Major primary sources include: - **Cursor blog**: (the Composer 2.5 release post that motivated the whole project). - **Moonshot K2 paper**: (Kimi K2 base model, the predecessor to K2.5). - **DeepMind DiLoCo paper**: ; **Streaming DiLoCo**: . - **Prime Intellect INTELLECT-2 announcement**: . - **VeRL paper**: . - **HuggingFace TRL**: . - **Microsoft rStar / rStar-Math**: . - **Meta OpenEnv**: . - **Meta Monarch**: . The five research notes link to many more secondary sources (blog posts, twitter threads, individual repo READMEs). Those are auxiliary context, not primary evidence. ## Limitations - **No primary-source access to Cursor's training pipeline.** Composer 2.5's exact recipe is reconstructed from public statements; details like the text-hint generator architecture remain unverifiable. The biggest known gap is flagged in `framework/composer-replication-framework.md` § "Open questions." - **Pre-spike speculation.** The TL;DR table's stack picks are literature-backed but not yet empirically validated on this codebase. The v0.0 spike will produce the first empirical result. - **Single-snapshot research.** All five reports were produced on 2026-05-25. The field moves fast — TorchForge may un-pause, OpenEnv may fork, PRIME-RL may consolidate. Re-run the dispatch every 6 months. ## Reproducibility If you want to reproduce this research dispatch (or extend it with new topics), the pattern is: 1. Use the `delegate_task` parallel-research pattern (or any equivalent: one subagent per topic, all running in parallel, all writing to known paths). 2. **Route different topics to different model families** explicitly — this is the cross-family signal, and it requires a multi-model gateway like OpenRouter or your local equivalent. 3. Give each subagent a web-research toolset (Tavily, Exa, AWS docs, etc.) and ~10 min wall-clock budget. 4. After all reports return, verify each one's served `model` matches the intended route (per the route-fidelity discipline). 5. Read all reports in full (do not skim) and reconcile in a master synthesis doc that explicitly flags convergent vs single-source claims. This pattern generalizes beyond this project; it's the same approach used for any meaty literature-review task where a single model's perspective is suspect.