U0 Final — Underwater Robot VLA Model

Model ID: Vincent2025hello/u0_final
Base Model: nvidia/GR00T-N1.5-3B
License: Apache 2.0
Paper: USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots

Model Description

This model is a Vision-Language-Action (VLA) policy fine-tuned from NVIDIA GR00T N1.5 (3B parameters) for the U0 underwater robot (based on BlueROV2). It takes dual-camera visual observations and multi-sensor state inputs, and outputs 16-step action trajectories for autonomous underwater tasks.

Fine-Tuning Details

Item	Value
Base Model	GR00T-N1.5-3B
Fine-Tuning Method	Full Fine-Tuning (with visual tuning)
Action Horizon	16 steps
Denoising Steps	4 (inference)
Embodiment Tag	`new_embodiment`
Data Config	`u0_bot`

Input / Output

Inputs

Video (dual camera): ego-view + wrist-view images (224×224)
State (29-dim):
- joint_pos (6): joint positions
- pwm (8): thruster PWM values
- joint_v (5): joint velocities
- dvl_v (3): DVL velocity
- imu_av (3): IMU angular velocity
- imu_la (3): IMU linear acceleration
- pressure (1): depth pressure
- dvl_h (1): DVL altitude
Language: natural language task description

Outputs

Action (13-dim × 16 steps):
- joint_pos (6): target joint positions
- pwm (8): target thruster PWM values

Download Model

pip install huggingface_hub
hf download Vincent2025hello/u0_final --local-dir ./u0_final

Training Code

The complete fine-tuning and evaluation framework is available at: https://github.com/VincentGu2000/u0

Citation

@misc{gu2025usimu0visionlanguageactiondataset,
      title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots}, 
      author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu},
      year={2025},
      eprint={2510.07869},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2510.07869}, 
}

Acknowledgments

This model is fine-tuned from NVIDIA GR00T N1.5. We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.

Downloads last month: 30

Safetensors

Model size

3B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for Vincent2025hello/u0_final

Base model

nvidia/GR00T-N1.5-3B

Finetuned

(38)

this model

Dataset used to train Vincent2025hello/u0_final

Paper for Vincent2025hello/u0_final

USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots

Paper • 2510.07869 • Published Oct 15, 2025