MemRM — 1.7B QLoRA Reward Model

Description

MemRM is a 1.7B-parameter QLoRA adapter trained on top of Qwen/Qwen3-1.7B-Base to predict whether a memory compaction event in an LLM agent trajectory is HARMFUL (degrades task completion) or SAFE. The model is trained on the memgym-rm-train dataset (15,630 IID training pairs from SWE-Gym trajectories) and evaluated on three held-out distributions.

This model is the paper's primary published model. The 8B model (rm_v2_8b_yn_cw3) was used as a development baseline and is NOT the published model.

CRITICAL — Checkpoint and Eval File Mapping

The paper reports headline metrics from checkpoint-500 evaluated on the IID held-out set. There are two eval_results.json files in the repository — they describe different models:

File	Model	AUROC	ECE	Safe-F1	Use this?
`training_output/lightweight_comparison/eval_results_1p7b_ckpt500_iid.json`	1.7B ckpt-500 (THIS MODEL)	0.985	0.009	0.849	YES — paper source
`training_output/reward_model_v2_run1/eval_results.json`	8B baseline	0.985	0.011	0.828	NO — different model

Both models happen to round to AUROC = 0.985, but secondary metrics diverge. All paper Table 3 values correspond to the 1.7B ckpt-500 file.

Model Configuration

Parameter	Value
Base model	`Qwen/Qwen3-1.7B-Base`
PEFT type	LoRA (QLoRA with 4-bit quantization during training)
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05 (paper text incorrectly states 0 — M3)
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Max sequence length	32,768 tokens
Adapter size	24.5 MB (paper text incorrectly states 25.7 MB — M4)
Training checkpoints	6 (steps 100–600, every 100 steps)

Performance

All metrics below from checkpoint-500, eval file eval_results_1p7b_ckpt500_iid.json:

IID (in-distribution, n=3,007 held-out rows, 7 held-out repos)

Metric	Value
AUROC	0.985
ECE	0.009
Accuracy	97.3%
Harmful-F1	0.985
Safe-F1	0.849

Strategy-OOD (n=166 pre-rebuild bundle, coverage 26.5%)

Metric	Value
Covered AUROC	0.714
Coverage rate	26.5% (44/166)

These numbers were computed on the pre-rebuild 166-row Strategy-OOD bundle, captured in evals/eval_results_1p7b_ckpt500_strategy_ood_v2.json shipped alongside this checkpoint. The numbers match what is reported in Paper Table 3.

Public dataset note: The Strategy-OOD dataset published as MemGym/memgym-rm-strategy-ood is a rebuilt 22-pair bundle (2026-05-06) that renders prompts in the long-context training-distribution shape (~22–38K tokens) rather than the short-template ~600-token form used for the historical eval above. The two are different artifacts:

Use the 166-row numbers above to compare against Paper Table 3.
Re-running this checkpoint on MemGym/memgym-rm-strategy-ood will yield different metrics — the dataset is intentionally smaller and prompt-distribution-matched, intended for downstream users to validate their own RM checkpoints rather than to reproduce paper headline numbers.

Note (Stage 1 M2): Paper Table 3 reports Strategy-OOD ECE = 0.850. Disk computation on the same 166-row file gives 0.578. Recompute on a covered subset (n=44) is pending paper-cross-check.

Scenario-OOD — WebArena V2 (n=426, coverage 20.4%)

Metric	Value
Covered AUROC	0.748
Coverage rate	20.4% (87/426)

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-1.7B-Base",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(
    base_model,
    "MemGym/memgym-rm-1p7b",
    subfolder="checkpoint-500"  # paper checkpoint — stored as a subdir, not a git revision
)
tokenizer = AutoTokenizer.from_pretrained("MemGym/memgym-rm-1p7b")

# Format the prompt (see memgym.training.scripts.eval_three_distributions for full format)
prompt = "[System]\nYou are a helpful assistant...\n[User]\n<trajectory>\n[Assistant]"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    logits = model(**inputs).logits
    # Token IDs for " Y" and " N"
    y_id = tokenizer.encode(" Y", add_special_tokens=False)[0]
    n_id = tokenizer.encode(" N", add_special_tokens=False)[0]
    last_logits = logits[0, -1, [y_id, n_id]]
    prob_harmful = torch.softmax(last_logits, dim=0)[0].item()

print(f"P(HARMFUL): {prob_harmful:.3f}")

Training Recipe

See docs/release/provenance.md artifact #13. Requires 8× A100 and ~6 hours:

# Override VENV to your environment path
VENV=/your/venv bash memgym/training/scripts/launch_train.sh \
    --dataset data/world_model/training_output/reward_model_v2/reward_model_pairs_v2.jsonl \
    --output-dir checkpoints/rm_v2_1p7b_qlora_32k \
    --base-model Qwen/Qwen3-1.7B-Base

Note: launch_train.sh ships with a deploy-host-specific default for VENV — always override via the VENV= environment variable to point at your own virtualenv.

License

Apache 2.0 (code + model weights, inherited from Qwen3-1.7B-Base Apache 2.0 license).

Citation

@inproceedings{xu2026memgym,
  title     = {MemGym: a Long-Horizon Memory Environment for LLM Agents},
  author    = {Anonymous Authors},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2026},
  note      = {Under review}
}

Known Limitations and Paper-Text Discrepancies

M3 — LoRA dropout: adapter_config.json shows lora_dropout = 0.05. Paper Appendix §MemRM states dropout = 0. The config file is authoritative; the paper text needs a correction.
M4 — Adapter size: Disk file is 24.5 MB. Paper Appendix states ~~25.7 MB. The disk size is authoritative (~~4.7% discrepancy, likely from an earlier export).
Checkpoint choice: Paper reports checkpoint-500 metrics. Checkpoint-600 is also available and shows marginal improvement in some secondary metrics (see threshold_sweep_1p7b_ckpt600_iid.json). Use checkpoint-500 for paper reproducibility.
8B confusion: The repository also contains reward_model_v2_run1/eval_results.json from an 8B development model. That file is NOT this model. Its AUROC happens to round to the same 0.985 but its other metrics differ.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MemGym/memgym-rm-1p7b

Base model

Qwen/Qwen3-1.7B-Base

Adapter

(47)

this model