PhyCo: Learning Controllable Physical Priors for Generative Motion

CVPR 2026 · Sriram Narayanan¹², Ziyu Jiang², Srinivasa G. Narasimhan¹, Manmohan Chandraker²³ (¹ Carnegie Mellon University · ² NEC Labs America · ³ UC San Diego)

Project Page · arXiv · Paper (PDF) · Simulation Code (PhyCo-Sim) · Dataset

TL;DR. PhyCo learns controllable physical priors — friction, restitution, deformation, and force — from simple block-sliding and ball-bouncing simulations, enabling physically grounded and continuously controllable video generation without any simulator at inference.

Abstract

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded control into video generation. Our approach integrates three key components: (i) a large-scale dataset of over 100K photorealistic simulation videos where friction, restitution, deformation, and force are systematically varied across diverse scenarios; (ii) physics-supervised fine-tuning of a pretrained diffusion model using a ControlNet conditioned on pixel-aligned physical property maps; and (iii) VLM-guided reward optimization, where a fine-tuned vision-language model evaluates generated videos with targeted physics queries and provides differentiable feedback. This combination enables a generative model to produce physically consistent and controllable outputs through variations in physical attributes — without any simulator or geometry reconstruction at inference. On the Physics-IQ benchmark, PhyCo significantly improves physical realism over strong baselines, and human studies confirm clearer and more faithful control over physical attributes. Our results demonstrate a scalable path toward physically consistent, controllable generative video models that generalize beyond synthetic training environments.

Model

phyco.pt is a single self-contained checkpoint built on the frozen NVIDIA Cosmos-Predict2-2B-Video2World DiT, with PhyCo's LoRA fine-tuning merged in and all three physics-conditioning ControlNet branches baked in:

Branch	Physical property
`controlnet_1`	friction / restitution (bounciness)
`controlnet_2`	deformability
`controlnet_3`	applied force / motion direction

Given a single input frame and pixel-aligned physical-property maps, PhyCo generates a short video whose motion follows the specified physics. No simulator or geometry reconstruction is used at inference.

Usage

This checkpoint is consumed by the PhyCo video-generation / evaluation code (see the project page for the code release). This repo contains:

phyco.pt — the model checkpoint (base DiT + 3 ControlNet branches).
physics_iq_segmentation_pkl/ — per-scenario segmentation masks (198 .pkl) used by the Physics-IQ benchmark JSON (physiq_controlnet_v1.json).

Download both and run the canonical generation command:

# checkpoint -> checkpoints/phyco/phyco.pt
hf download nnsriram97/phyco phyco.pt --local-dir checkpoints/phyco

# Physics-IQ segmentation masks -> assets/physics_iq_segmentation_pkl/
hf download nnsriram97/phyco --include "physics_iq_segmentation_pkl/*" --local-dir assets

# generate (single-file checkpoint; ControlNet branches are inside phyco.pt)
PYTHONPATH=$(pwd) bash scripts/launch/run_physprop_kubric_mpgu.sh \
  --batch_input_json scripts/batch_jsons/benchmark/physiq_controlnet_v1.json \
  --dit_path checkpoints/phyco/phyco.pt \
  --controlnet_branch_names controlnet_1,controlnet_2,controlnet_3 \
  --active_controlnets   controlnet_1,controlnet_2,controlnet_3 \
  --controlnet_branch_scales "1.0,1.0,1.0" --dynamic_controlnet \
  --pipeline_config controlnet_multi_24fps_57frames \
  --conditioning_type image_blob --blob_type circle \
  --resolution 480 --fps 24 --total_seconds 5.0 \
  --physprop_type friction_bounciness_deformable_force_move_dir \
  --seed 42 --post_dit_subfolder physics-IQ

--dynamic_controlnet selects the relevant branch per scenario (friction/restitution → controlnet_1, deformability → controlnet_2, force → controlnet_3).

Intended use & limitations

Intended use: research on physically grounded, controllable video generation; reproducing the paper's Physics-IQ results.
Non-commercial: released under CC BY-NC-ND 4.0 (see License).
Limitations: trained on Kubric-style simulation scenarios (block sliding, ball bouncing, etc.); behavior on out-of-distribution scenes is not guaranteed. The model inherits the capabilities and biases of its Cosmos-Predict2 base.

License

This model is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. It is built on top of NVIDIA's Cosmos-Predict2, whose code/weights remain subject to their original Apache-2.0 / NVIDIA Open Model License terms.

Citation

@InProceedings{Narayanan_2026_CVPR,
    author    = {Narayanan, Sriram and Jiang, Ziyu and Narasimhan, Srinivasa and Chandraker, Manmohan},
    title     = {PhyCo: Learning Controllable Physical Priors for Generative Motion},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {41892-41902}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for nnsriram97/phyco

Base model

nvidia/Cosmos-Predict2-2B-Video2World

Adapter

(2)

this model

Dataset used to train nnsriram97/phyco

Paper for nnsriram97/phyco

PhyCo: Learning Controllable Physical Priors for Generative Motion

Paper • 2604.28169 • Published Apr 30 • 13