ContextRL-Qwen2.5-VL-7B

This is the multimodal model released with the paper Context-Aware RL for Agentic and Multimodal LLMs. It is fine-tuned from Qwen2.5-VL-7B-Instruct using ContextRL, a context-aware reinforcement learning method that augments standard GRPO with an auxiliary context-selection objective to improve fine-grained visual grounding.

Results

Across 12 diverse multimodal benchmarks, ContextRL improves over the standard GRPO baseline by +2.0 points on average, while improving every individual benchmark.

Benchmark Base RL (GRPO) ContextRL (Ours)
MathVista 68.2 72.5 73.6
MathVerse 43.9 45.3 49.1
MathVision 22.8 25.5 26.8
MMMU-Pro 36.6 41.3 42.8
MMMU 50.7 53.3 54.6
V* 70.1 70.7 73.3
MMStar 62.6 64.1 65.1
BLINK 55.3 56.5 58.9
ScienceQA 88.2 91.0 95.4
PhyX 25.4 48.7 50.0
OlympiadBench Phy 1.5 3.1 4.6
MME-RealWorld Lite 38.4 45.1 46.7
Overall Avg. 47.0 51.4 53.4

The +2.0 average gain over GRPO also exceeds the +0.8 of PAPO, a method purpose-built for multimodal perception (see the paper for the full comparison).

Usage

This model follows the same interface as Qwen2.5-VL-7B-Instruct and can be loaded with transformers. Training and evaluation code, data construction pipelines, and detailed configurations are available in the repository: 👉 https://github.com/xupy2003/ContextAwareRL Please refer to the repo's README for environment setup, inference scripts, and reproduction instructions.

Downloads last month
28
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xupy21/ContextRL_Qwen2.5_VL_7B

Finetuned
(1113)
this model

Collection including xupy21/ContextRL_Qwen2.5_VL_7B