U0 Final — Underwater Robot VLA Model

Model ID: Vincent2025hello/u0_final
Base Model: nvidia/GR00T-N1.5-3B
License: Apache 2.0
Paper: USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots


Model Description

This model is a Vision-Language-Action (VLA) policy fine-tuned from NVIDIA GR00T N1.5 (3B parameters) for the U0 underwater robot (based on BlueROV2). It takes dual-camera visual observations and multi-sensor state inputs, and outputs 16-step action trajectories for autonomous underwater tasks.

Fine-Tuning Details

Item Value
Base Model GR00T-N1.5-3B
Fine-Tuning Method Full Fine-Tuning (with visual tuning)
Action Horizon 16 steps
Denoising Steps 4 (inference)
Embodiment Tag new_embodiment
Data Config u0_bot

Input / Output

Inputs

  • Video (dual camera): ego-view + wrist-view images (224×224)
  • State (29-dim):
    • joint_pos (6): joint positions
    • pwm (8): thruster PWM values
    • joint_v (5): joint velocities
    • dvl_v (3): DVL velocity
    • imu_av (3): IMU angular velocity
    • imu_la (3): IMU linear acceleration
    • pressure (1): depth pressure
    • dvl_h (1): DVL altitude
  • Language: natural language task description

Outputs

  • Action (13-dim × 16 steps):
    • joint_pos (6): target joint positions
    • pwm (8): target thruster PWM values

Download Model

pip install huggingface_hub
hf download Vincent2025hello/u0_final --local-dir ./u0_final

Training Code

The complete fine-tuning and evaluation framework is available at: https://github.com/VincentGu2000/u0

Citation

@misc{gu2025usimu0visionlanguageactiondataset,
      title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots}, 
      author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu},
      year={2025},
      eprint={2510.07869},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2510.07869}, 
}

Acknowledgments

This model is fine-tuned from NVIDIA GR00T N1.5. We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.

Downloads last month
30
Safetensors
Model size
3B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for Vincent2025hello/u0_final

Finetuned
(38)
this model

Dataset used to train Vincent2025hello/u0_final

Paper for Vincent2025hello/u0_final