SpatialReward is a state-of-the-art reward model for instruction-guided image editing that addresses the critical "Attention Collapse" problem through explicit spatial reasoning. By anchoring semantic judgments to predicted edit regions via bounding boxes, SpatialReward achieves unprecedented accuracy and reliability as both an evaluator and RL training signal.

Visualizing the Attention Collapse problem vs. SpatialReward's spatial grounding.

🔥 News

2026-05-05: 🎉 We have open-sourced the SpatialReward-8B model weights, MER-Bench benchmark, and SpatialReward-Train (260k spatial-aware training data)!
2026-05-01: 🎉 SpatialReward has been accepted to ICML 2026!
2026-02-12: We have released the inference code, reward server, and training configurations!
2026-02-07: The paper is available on arXiv.

📌 Introduction

Online Reinforcement Learning (RL) holds immense potential for advancing instruction-guided image editing, but its progress has been severely hindered by a critical perception gap we term "Attention Collapse". Existing reward models frequently neglect cross-image comparisons and fail to capture fine-grained editing details, leading to inaccurate evaluations and unstable RL training.

To overcome this, we propose SpatialReward, which:

Introduces MER-Bench: A new benchmark featuring multi-edit scenarios and expert human annotations for measuring reward model quality.
Enforces spatial reasoning: Predicts bounding boxes for edit regions and anchors semantic judgments to pixel-level evidence.

Comprehensive benchmark results. SpatialReward achieves SOTA performance, outperforming GPT-4.1 and GPT-5 on MER-Bench.

MER-Bench performance breakdown by editing category.

🚀 Quick Start

Installation

git clone https://github.com/Kwai-Keye/SpatialReward.git
cd SpatialReward

conda create -n spatialreward python=3.11 -y
conda activate spatialreward

pip install torch==2.8.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

Reward Server

# Start reward server
cd example/reward/server
bash start_servers.sh
bash start_proxy.sh

# Query from client
from example.reward.client.reward_client_edit import RewardClient

client = RewardClient(proxy_host="127.0.0.1", proxy_port=23456)
scores, rewards, reasoning, meta_data = client.evaluate(
    input_images=[input_img],
    output_image=[output_img],
    meta_datas=[{"instruction": "Remove the dog"}]
)

📊 Benchmark Evaluation

Model and data are loaded directly from HuggingFace by default.

# MER-Bench
bash eval/MERBench/run.sh

# MMRB2
bash eval/MMRB2/run.sh

# EditReward-Bench
bash eval/EditReward-Bench/run.sh

📚 Datasets

Dataset	Description	Link
SpatialReward-Train	260k spatial-aware training data (SFT + RL)	🤗 Hub
MER-Bench	MultiEditReward-Bench evaluation benchmark	🤗 Hub

🎯 Training

SFT (LLaMA-Factory)

llamafactory-cli train example/SpatialReward-train/sft/qwen3vl_lora_spatial_reward.yaml

RL (ms-swift / GRPO)

# Replace ORM first
cp example/SpatialReward-train/rl/orm.py <ms-swift>/swift/plugin/orm.py
bash example/SpatialReward-train/rl/run_mater.sh

RL Results on OmniGen2

SpatialReward delivers +0.90 on GEdit-EN Overall, doubling GPT-4.1's gain (+0.45).

Stable RL training dynamics with SpatialReward as reward signal.

🙏 Acknowledgements

We thank EditScore and EditReward for valuable references.

❤️ Citing Us

@article{long2026spatialreward,
  title={SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning},
  author={Long, Yancheng and Yang, Yankai and Wei, Hongyang and Chen, Wei and Zhang, Tianke and Fan, Haonan and Liu, Changyi and Jiang, Kaiyu and Chen, Jiankang and Tang, Kaiyu and Wen, Bin and Yang, Fan and Gao, Tingting and Li, Han and Yang, Shuo},
  journal={arXiv preprint arXiv:2602.07458},
  year={2026}
}

📄 License

Apache 2.0

Downloads last month: -

Safetensors

Model size

770k params

Tensor type

BF16

Model tree for SpatialReward/SpatialReward-8B

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(252)

this model

Dataset used to train SpatialReward/SpatialReward-8B

Paper for SpatialReward/SpatialReward-8B

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

Paper • 2602.07458 • Published Mar 6

SpatialReward
/

SpatialReward-8B

News | Introduction | Quick Start | Benchmark Evaluation | Citation