SkyJM(RubricRM) is a reward model for visual generation, covering both text-to-image generation and image editing. Given a prompt and two candidate images, it predicts which one better satisfies the instruction. RubricRM performs the following in a single forward pass:
- Dynamically produces an evaluation rubric conditioned on the prompt — including evaluation dimensions, per-dimension weights, and graded scoring descriptors;
- Scores both candidate images at the dimension level under that rubric;
- Aggregates the dimension scores via the rubric weights to derive the final preference.
We release two model sizes built on the Qwen3.5 backbone:
SkyJM-Gen-4B/SkyJM-Gen-9B— for text-to-image generationSkyJM-Edit-4B/SkyJM-Edit-9B— for image editing
Performance
Text-to-image generation
| Model | MMRB2 | GenAI-Bench | GenAI-Bench-Verified |
|---|---|---|---|
| Proprietary MLLMs | |||
| Claude Sonnet 4.6 | 70.8 | 65.8 | 75.3 |
| GPT-5.4 | 67.5 | 64.2 | 74.2 |
| Gemini 2.5 Pro | 70.5 | 67.8 | 77.4 |
| Gemini 3.1 Pro | 74.4 | 73.9 | 84.8 |
| Open-source MLLMs | |||
| Qwen3-VL-8B | 61.2 | 63.3 | 72.5 |
| Qwen3-VL-235B-A22B | 66.6 | 61.5 | 69.7 |
| Qwen3.5-9B | 66.3 | 63.3 | 70.7 |
| Qwen3.5-397B-A17B | 72.7 | 66.2 | 77.0 |
| Reward Models | |||
| HPSv2 | 55.0 | 68.8 | 78.1 |
| PickScore | 57.6 | 70.0 | 79.2 |
| HPSv3 | 60.2 | 70.9 | 81.0 |
| UnifiedReward-9B | 57.9 | 69.2 | 72.8 |
| UnifiedReward-Think-9B | 65.5 | 72.8 | 81.7 |
| UnifiedReward-Flex-8B | 69.2 | 73.4 | 84.2 |
| SkyJM-Gen-4B (Ours) | 70.5 | 73.2 | 83.1 |
| SkyJM-Gen-9B (Ours) | 72.0 | 74.1 | 84.5 |
Image editing
| Model | MMRB2 | EditReward-ERB Avg | EditScore-ERB Avg |
|---|---|---|---|
| Proprietary MLLMs | |||
| Claude Sonnet 4.6 | 71.7 | 44.1 | 79.3 |
| GPT-5.4 | 68.5 | 42.5 | 74.6 |
| Gemini 2.5 Pro | 71.3 | 42.2 | 75.2 |
| Gemini 3.1 Pro | 74.9 | 45.0 | 81.6 |
| Open-source MLLMs | |||
| Qwen3-VL-8B | 63.4 | 40.9 | 76.9 |
| Qwen3-VL-235B-A22B | 64.8 | 34.6 | 78.8 |
| Qwen3.5-9B | 64.4 | 37.4 | 72.0 |
| Qwen3.5-397B-A17B | 73.7 | 43.9 | 81.2 |
| Reward Models | |||
| EditReward-7B | 67.2 | 38.4 | 78.3 |
| EditScore-7B | 55.6 | 28.8 | 61.9 |
| SkyJM-Edit-4B (Ours) | 73.2 | 45.5 | 85.5 |
| SkyJM-Edit-9B (Ours) | 75.4 | 46.4 | 85.6 |
Training Strategy
Stage 1: Rubric-trajectory SFT. We use Gemini 3.1 Pro to synthesize rubric-based evaluation trajectories conditioned on human preference labels, then filter them with structural and label-consistency checks for SFT.
Stage 2: Dimension-level GRPO. During RL, we fix the rubric and optimize only the scoring process using rewards based on per-dimension score gaps, with saturated-group filtering to suppress noisy low-variance updates.
Quick Start
For detailed usage instructions, installation guide, and inference examples (supporting both vLLM and Transformers backends), please refer to the official inference framework:
SKYLENAGE-JUDGER — Unified inference framework for SkyJM judge models.
Link
- GitHub: SKYLENAGE-AI/SKYLENAGE-JUDGER
- Hugging Face Models:
- Hugging Face Dataset: skylenage-ai/RubricRM-Data
- ModelScope Models:
- Downloads last month
- 47