KinDER β€” Diffusion Policy + Environment States (DPES) Checkpoints

Trained DP + Environment States (DPES) checkpoints for the KinDER physical-reasoning benchmark (RSS 2026).

Each checkpoint is an imitation learning policy trained from ~100 human demonstrations per environment. The training code lives in kinder-diffusion-policy (a fork of diffusion_policy). Demonstrations are available at kinder-bench/kinder-datasets.


Checkpoints

Path KinDER environment Trained epochs Final train loss
motion2d/epoch=1000-train_loss=0.000.ckpt Motion2D-p0 1 000 0.000
stickbutton2d/epoch=2000-train_loss=0.001.ckpt StickButton2D-b1 2 000 0.001
dynobstruction2d/epoch=2000-train_loss=0.000.ckpt DynObstruction2D-o1 2 000 0.000
dynpushpullhook2d/epoch=0900-train_loss=0.001.ckpt DynPushPullHook2D-o5 900 0.001
basemotion3d/epoch=2000-train_loss=0.000.ckpt BaseMotion3D 2 000 0.000
shelf3d/epoch=0300-train_loss=0.000.ckpt Shelf3D 300 0.000
sweep3d/epoch=0300-train_loss=0.001.ckpt SweepIntoDrawer3D 300 0.001
transport3d/epoch=0100-train_loss=0.000.ckpt Transport3D-o2 100 0.000

Method

DP + Environment States (DPES) extends standard Diffusion Policy by incorporating additional low-level environment state vectors as input alongside RGB images. The environment states are encoded with MLPs before being fused with the image features and passed to the diffusion model.

Comparison with Diffusion Policy (DP)

DP DPES
RGB image input βœ“ βœ“
Environment state input β€” βœ“ (MLP-encoded)
Robot state input β€” βœ“ (MLP-encoded)
Action output diffusion diffusion

Inputs

2D environments β€” single overhead RGB image (224 Γ— 224) + flat state vector:

State vector Content
robot_state Robot proprioception (joint positions, velocities)
env_state Object / scene state (positions, orientations of all relevant entities)

3D TidyBot environments β€” three RGB cameras (base 224 Γ— 224, wrist 224 Γ— 224, overview 224 Γ— 224) + flat state vectors as above.

The environment and robot state vectors are each passed through a small MLP encoder before concatenation with the visual features, giving the policy direct access to precise geometric information that may be hard to extract from pixels alone.

Output

Action chunk predicted by iterative DDPM denoising, identical to DP.


Usage

1. Install dependencies

# Clone and set up kinder-diffusion-policy
git clone git@github.com:Princeton-Robot-Planning-and-Learning/kinder-diffusion-policy.git
cd kinder-diffusion-policy
# Follow the environment setup instructions in the repo README
mamba activate robodiff
# Install the kinder-imitation-learning inference utilities
cd kinder-baselines/kinder-imitation-learning
uv pip install -r prpl_requirements.txt
uv pip install -e ".[develop]"

2. Launch the policy server

cd ~/kinder-diffusion-policy
mamba activate robodiff
python policy_server.py --ckpt-path /path/to/sweep3d/epoch=0300-train_loss=0.001.ckpt

3. Run evaluation

cd kinder-baselines/kinder-models/scripts
python inference.py \
    --env-name kinder/SweepIntoDrawer3D-o5-v0 \
    --save-videos \
    --num-seeds 1 \
    --num-episodes 5 \
    --max-steps 200

Replace --env-name and --ckpt-path with the environment and checkpoint of your choice.


Training from scratch

# Convert raw teleoperation recordings to HDF5
cd kinder-baselines/kinder-models/scripts
python demos_to_hdf5.py \
    --teleop_data_dir $YOUR_DATA_DIR \
    --output_path $OUTPUT_HDF5_PATH \
    --render_images

# Train with the DPES config (includes state inputs)
cd ~/kinder-diffusion-policy
mamba activate robodiff
python train.py --config-name=train_sweep3d_image_state

Related resources

Resource Link
KinDER benchmark kindergarden
Training code kinder-diffusion-policy
Demonstration datasets kinder-bench/kinder-datasets
DP (image-only) checkpoints kinder-bench/kinder-DP-checkpoints
Finetuned Ο€0.5 VLA checkpoints kinder-openpi

Citation

If you use these datasets, please cite the paper: KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning:

@inproceedings{huang2026kinder,
  title     = {KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning},
  author    = {Huang, Yixuan and Li, Bowen and Saxena, Vaibhav and Liang, Yichao and Mishra, Utkarsh and Ji, Liang and Zha, Lihan and Wu, Jimmy and Kumar, Nishanth and Scherer, Sebastian and Xu, Danfei and Silver, Tom},
  booktitle = {Robotics: Science and Systems (RSS)},
  year      = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for kinder-bench/kinder-DPES-checkpoints