ImageWAM-FLUX.2-9B-LIBERO

This repository contains the ImageWAM FLUX.2 9B checkpoint for LIBERO from ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

ImageWAM is a family of world action models built on image-editing foundation models. This checkpoint is intended for evaluation and research use with the accompanying ImageWAM codebase.

Model Details

  • Model family: ImageWAM
  • Image-editing backbone: FLUX.2 [klein] base
  • Variant: FLUX.2 klein-base-9B
  • Benchmark: LIBERO
  • Training code: yuyangalin/ImageWAM
  • Base model weights: Users must separately prepare the FLUX.2 klein-base-9B weights and FLUX.2 autoencoder as described in the ImageWAM README.

Files

Expected file layout:

.
β”œβ”€β”€ model.pt
β”œβ”€β”€ dataset_stats.json
└── config.yaml
  • model.pt: ImageWAM checkpoint used by the evaluation scripts.
  • dataset_stats.json: normalization statistics required for policy evaluation.
  • config.yaml: original training configuration for provenance and reproducibility.

Usage

Install and prepare the ImageWAM repository following the project README. Then download this model repository:

mkdir -p checkpoints/imagewam_release/libero/flux2_klein_9b

huggingface-cli download yuyangalin/ImageWAM-FLUX.2-9B-LIBERO \
  --repo-type model \
  --local-dir checkpoints/imagewam_release/libero/flux2_klein_9b

Prepare FLUX.2 9B weights and set:

export FLUX2_VARIANT=9b
export FLUX2_MODEL_PATH=/path/to/flux-2-klein-base-9b.safetensors
export FLUX2_AE_MODEL_PATH=/path/to/ae.safetensors
export FLUX2_QWEN3_MODEL_SPEC=Qwen/Qwen3-8B

Evaluate on LIBERO:

export CKPT_PATH="$(pwd)/checkpoints/imagewam_release/libero/flux2_klein_9b/model.pt"
export DATASET_STATS_PATH="$(pwd)/checkpoints/imagewam_release/libero/flux2_klein_9b/dataset_stats.json"

NUM_GPUS=8 FLUX2_VARIANT=9b bash scripts/flux2/run_eval_flux2_libero.sh

Intended Use

This checkpoint is intended for:

  • Reproducing ImageWAM LIBERO evaluations.
  • Research on robot policy learning, world action models, and image-editing-based action generation.
  • Comparison against other LIBERO policy models under the same evaluation setup.

This checkpoint is not intended for safety-critical or real-world robot deployment without additional validation.

Limitations

  • Evaluation requires the ImageWAM codebase and the LIBERO benchmark environment.
  • The checkpoint assumes the same model variant and configuration used during training. See train_config.yaml.
  • Users must separately prepare the matching FLUX.2 9B base model and autoencoder weights.
  • Performance may differ if the simulator version, dataset preprocessing, action normalization statistics, or evaluation settings differ from the release setup.
  • The 9B variant has higher GPU memory requirements than the 4B variant.

Citation

If you use this checkpoint, please cite the ImageWAM paper:

@misc{zhang2026imagewam,
      title={ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?}, 
      author={Yuyang Zhang and Wenyao Zhang and Zekun Qi and He Zhang and Haitao Lin and Jingbo Zhang and Yao Mu and Xiaokang Yang and Wenjun Zeng and Xin Jin},
      year={2026},
      eprint={2606.19531},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.19531}, 
}

Acknowledgements

ImageWAM builds on several open-source projects and model families, including FLUX.2, FastWAM, LIBERO, LIBERO-plus, and RoboTwin. Please also follow the licenses and citation requirements of the corresponding upstream projects.

Downloads last month
-
Video Preview
loading

Dataset used to train yuyangalin/ImageWAM-FLUX.2-9B-LIBERO

Collection including yuyangalin/ImageWAM-FLUX.2-9B-LIBERO

Paper for yuyangalin/ImageWAM-FLUX.2-9B-LIBERO