SkyJM(RubricRM) is a reward model for visual generation, covering both text-to-image generation and image editing. Given a prompt and two candidate images, it predicts which one better satisfies the instruction. RubricRM performs the following in a single forward pass:

  • Dynamically produces an evaluation rubric conditioned on the prompt — including evaluation dimensions, per-dimension weights, and graded scoring descriptors;
  • Scores both candidate images at the dimension level under that rubric;
  • Aggregates the dimension scores via the rubric weights to derive the final preference.

We release two model sizes built on the Qwen3.5 backbone:

  • SkyJM-Gen-4B / SkyJM-Gen-9B — for text-to-image generation
  • SkyJM-Edit-4B / SkyJM-Edit-9B — for image editing

Performance

Text-to-image generation

Model MMRB2 GenAI-Bench GenAI-Bench-Verified
Proprietary MLLMs
Claude Sonnet 4.6 70.8 65.8 75.3
GPT-5.4 67.5 64.2 74.2
Gemini 2.5 Pro 70.5 67.8 77.4
Gemini 3.1 Pro 74.4 73.9 84.8
Open-source MLLMs
Qwen3-VL-8B 61.2 63.3 72.5
Qwen3-VL-235B-A22B 66.6 61.5 69.7
Qwen3.5-9B 66.3 63.3 70.7
Qwen3.5-397B-A17B 72.7 66.2 77.0
Reward Models
HPSv2 55.0 68.8 78.1
PickScore 57.6 70.0 79.2
HPSv3 60.2 70.9 81.0
UnifiedReward-9B 57.9 69.2 72.8
UnifiedReward-Think-9B 65.5 72.8 81.7
UnifiedReward-Flex-8B 69.2 73.4 84.2
SkyJM-Gen-4B (Ours) 70.5 73.2 83.1
SkyJM-Gen-9B (Ours) 72.0 74.1 84.5

Image editing

Model MMRB2 EditReward-ERB Avg EditScore-ERB Avg
Proprietary MLLMs
Claude Sonnet 4.6 71.7 44.1 79.3
GPT-5.4 68.5 42.5 74.6
Gemini 2.5 Pro 71.3 42.2 75.2
Gemini 3.1 Pro 74.9 45.0 81.6
Open-source MLLMs
Qwen3-VL-8B 63.4 40.9 76.9
Qwen3-VL-235B-A22B 64.8 34.6 78.8
Qwen3.5-9B 64.4 37.4 72.0
Qwen3.5-397B-A17B 73.7 43.9 81.2
Reward Models
EditReward-7B 67.2 38.4 78.3
EditScore-7B 55.6 28.8 61.9
SkyJM-Edit-4B (Ours) 73.2 45.5 85.5
SkyJM-Edit-9B (Ours) 75.4 46.4 85.6

Training Strategy

Stage 1: Rubric-trajectory SFT. We use Gemini 3.1 Pro to synthesize rubric-based evaluation trajectories conditioned on human preference labels, then filter them with structural and label-consistency checks for SFT.

Stage 2: Dimension-level GRPO. During RL, we fix the rubric and optimize only the scoring process using rewards based on per-dimension score gaps, with saturated-group filtering to suppress noisy low-variance updates.

Quick Start

For detailed usage instructions, installation guide, and inference examples (supporting both vLLM and Transformers backends), please refer to the official inference framework:

SKYLENAGE-JUDGER — Unified inference framework for SkyJM judge models.

Link

Downloads last month
47
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support