MemRM β€” 1.7B QLoRA Reward Model

Description

MemRM is a 1.7B-parameter QLoRA adapter trained on top of Qwen/Qwen3-1.7B-Base to predict whether a memory compaction event in an LLM agent trajectory is HARMFUL (degrades task completion) or SAFE. The model is trained on the memgym-rm-train dataset (15,630 IID training pairs from SWE-Gym trajectories) and evaluated on three held-out distributions.

This model is the paper's primary published model. The 8B model (rm_v2_8b_yn_cw3) was used as a development baseline and is NOT the published model.


CRITICAL β€” Checkpoint and Eval File Mapping

The paper reports headline metrics from checkpoint-500 evaluated on the IID held-out set. There are two eval_results.json files in the repository β€” they describe different models:

File Model AUROC ECE Safe-F1 Use this?
training_output/lightweight_comparison/eval_results_1p7b_ckpt500_iid.json 1.7B ckpt-500 (THIS MODEL) 0.985 0.009 0.849 YES β€” paper source
training_output/reward_model_v2_run1/eval_results.json 8B baseline 0.985 0.011 0.828 NO β€” different model

Both models happen to round to AUROC = 0.985, but secondary metrics diverge. All paper Table 3 values correspond to the 1.7B ckpt-500 file.


Model Configuration

Parameter Value
Base model Qwen/Qwen3-1.7B-Base
PEFT type LoRA (QLoRA with 4-bit quantization during training)
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05 (paper text incorrectly states 0 β€” M3)
Target modules q_proj, k_proj, v_proj, o_proj
Max sequence length 32,768 tokens
Adapter size 24.5 MB (paper text incorrectly states 25.7 MB β€” M4)
Training checkpoints 6 (steps 100–600, every 100 steps)

Performance

All metrics below from checkpoint-500, eval file eval_results_1p7b_ckpt500_iid.json:

IID (in-distribution, n=3,007 held-out rows, 7 held-out repos)

Metric Value
AUROC 0.985
ECE 0.009
Accuracy 97.3%
Harmful-F1 0.985
Safe-F1 0.849

Strategy-OOD (n=166 pre-rebuild bundle, coverage 26.5%)

Metric Value
Covered AUROC 0.714
Coverage rate 26.5% (44/166)

These numbers were computed on the pre-rebuild 166-row Strategy-OOD bundle, captured in evals/eval_results_1p7b_ckpt500_strategy_ood_v2.json shipped alongside this checkpoint. The numbers match what is reported in Paper Table 3.

Public dataset note: The Strategy-OOD dataset published as MemGym/memgym-rm-strategy-ood is a rebuilt 22-pair bundle (2026-05-06) that renders prompts in the long-context training-distribution shape (~22–38K tokens) rather than the short-template ~600-token form used for the historical eval above. The two are different artifacts:

  • Use the 166-row numbers above to compare against Paper Table 3.
  • Re-running this checkpoint on MemGym/memgym-rm-strategy-ood will yield different metrics β€” the dataset is intentionally smaller and prompt-distribution-matched, intended for downstream users to validate their own RM checkpoints rather than to reproduce paper headline numbers.

Note (Stage 1 M2): Paper Table 3 reports Strategy-OOD ECE = 0.850. Disk computation on the same 166-row file gives 0.578. Recompute on a covered subset (n=44) is pending paper-cross-check.

Scenario-OOD β€” WebArena V2 (n=426, coverage 20.4%)

Metric Value
Covered AUROC 0.748
Coverage rate 20.4% (87/426)

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-1.7B-Base",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(
    base_model,
    "MemGym/memgym-rm-1p7b",
    subfolder="checkpoint-500"  # paper checkpoint β€” stored as a subdir, not a git revision
)
tokenizer = AutoTokenizer.from_pretrained("MemGym/memgym-rm-1p7b")

# Format the prompt (see memgym.training.scripts.eval_three_distributions for full format)
prompt = "[System]\nYou are a helpful assistant...\n[User]\n<trajectory>\n[Assistant]"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    logits = model(**inputs).logits
    # Token IDs for " Y" and " N"
    y_id = tokenizer.encode(" Y", add_special_tokens=False)[0]
    n_id = tokenizer.encode(" N", add_special_tokens=False)[0]
    last_logits = logits[0, -1, [y_id, n_id]]
    prob_harmful = torch.softmax(last_logits, dim=0)[0].item()

print(f"P(HARMFUL): {prob_harmful:.3f}")

Training Recipe

See docs/release/provenance.md artifact #13. Requires 8Γ— A100 and ~6 hours:

# Override VENV to your environment path
VENV=/your/venv bash memgym/training/scripts/launch_train.sh \
    --dataset data/world_model/training_output/reward_model_v2/reward_model_pairs_v2.jsonl \
    --output-dir checkpoints/rm_v2_1p7b_qlora_32k \
    --base-model Qwen/Qwen3-1.7B-Base

Note: launch_train.sh ships with a deploy-host-specific default for VENV β€” always override via the VENV= environment variable to point at your own virtualenv.

License

Apache 2.0 (code + model weights, inherited from Qwen3-1.7B-Base Apache 2.0 license).

Citation

@inproceedings{xu2026memgym,
  title     = {MemGym: a Long-Horizon Memory Environment for LLM Agents},
  author    = {Anonymous Authors},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2026},
  note      = {Under review}
}

Known Limitations and Paper-Text Discrepancies

  • M3 β€” LoRA dropout: adapter_config.json shows lora_dropout = 0.05. Paper Appendix Β§MemRM states dropout = 0. The config file is authoritative; the paper text needs a correction.
  • M4 β€” Adapter size: Disk file is 24.5 MB. Paper Appendix states 25.7 MB. The disk size is authoritative (4.7% discrepancy, likely from an earlier export).
  • Checkpoint choice: Paper reports checkpoint-500 metrics. Checkpoint-600 is also available and shows marginal improvement in some secondary metrics (see threshold_sweep_1p7b_ckpt600_iid.json). Use checkpoint-500 for paper reproducibility.
  • 8B confusion: The repository also contains reward_model_v2_run1/eval_results.json from an 8B development model. That file is NOT this model. Its AUROC happens to round to the same 0.985 but its other metrics differ.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MemGym/memgym-rm-1p7b

Adapter
(47)
this model