Robotics
Transformers
Safetensors
LeRobot
English
Gr00tN1d6
vision-language-action
manipulation
gr00t
nvidia
physical-ai
humanoid
reachy2
Instructions to use ganatrask/NOVA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ganatrask/NOVA with Transformers:
# Load model directly from transformers import Gr00tN1d6 model = Gr00tN1d6.from_pretrained("ganatrask/NOVA", dtype="auto") - LeRobot
How to use ganatrask/NOVA with LeRobot:
- Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: nvidia-open-model-license | |
| license_link: https://developer.nvidia.com/open-model-license | |
| language: | |
| - en | |
| library_name: transformers | |
| tags: | |
| - robotics | |
| - vision-language-action | |
| - manipulation | |
| - gr00t | |
| - nvidia | |
| - physical-ai | |
| - humanoid | |
| - reachy2 | |
| - lerobot | |
| datasets: | |
| - ganatrask/NOVA | |
| base_model: | |
| - nvidia/GR00T-N1.6-3B | |
| pipeline_tag: robotics | |
| # NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2 | |
| <p align="center"> | |
| <img src="https://img.shields.io/badge/NVIDIA-GR00T%20N1.6-76B900?style=for-the-badge&logo=nvidia" alt="GR00T N1.6"/> | |
| <img src="https://img.shields.io/badge/Robot-Reachy%202-0066CC?style=for-the-badge" alt="Reachy 2"/> | |
| <img src="https://img.shields.io/badge/Task-Pick%20%26%20Place-green?style=for-the-badge" alt="Pick & Place"/> | |
| </p> | |
| **NOVA** (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for [Pollen Robotics' Reachy 2](https://www.pollen-robotics.com/reachy/) humanoid robot. | |
| ## Model Description | |
| This model is part of an end-to-end Physical AI pipeline that combines: | |
| - **Voice Input**: Parakeet CTC 0.6B for speech-to-text | |
| - **Scene Reasoning**: Cosmos Reason 2 for object detection and spatial understanding | |
| - **Action Policy**: This fine-tuned GR00T N1.6 model for manipulation | |
| ### Model Details | |
| | Property | Value | | |
| |----------|-------| | |
| | **Base Model** | [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) | | |
| | **Parameters** | ~3B | | |
| | **Embodiment** | Reachy 2 (custom embodiment tag) | | |
| | **Action Space** | 8-DOF (7 arm joints + gripper) | | |
| | **Training Steps** | 30,000 | | |
| | **Final Loss** | ~0.008-0.01 | | |
| ### Action Space | |
| ```python | |
| action = [ | |
| shoulder_pitch, # -180° to 90° | |
| shoulder_roll, # -180° to 10° | |
| elbow_yaw, # -90° to 90° | |
| elbow_pitch, # -125° to 0° | |
| wrist_roll, # -100° to 100° | |
| wrist_pitch, # -45° to 45° | |
| wrist_yaw, # -30° to 30° | |
| gripper, # 0 (closed) to 1 (open) | |
| ] | |
| ``` | |
| ## Intended Use | |
| This model is designed for: | |
| - **Pick-and-place manipulation** tasks on Reachy 2 robot | |
| - **Language-conditioned control** ("Pick up the red cube") | |
| - **Research** in vision-language-action models and robotic manipulation | |
| ### Supported Tasks | |
| - Pick up objects (cube, cylinder, capsule, rectangular box) | |
| - Place objects in target locations | |
| - Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple) | |
| ## Training | |
| ### Training Data | |
| Trained on the [ganatrask/NOVA dataset](https://huggingface.co/datasets/ganatrask/NOVA): | |
| - **100 episodes** of expert demonstrations | |
| - **32 task variations** (4 objects × 8 colors) | |
| - Domain randomization (position, lighting, camera jitter) | |
| - LeRobot v2.1 format | |
| ### Training Configuration | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | GPU | NVIDIA A100-SXM4-80GB | | |
| | GPUs | 2 | | |
| | Batch Size | 64 | | |
| | Max Steps | 30,000 | | |
| | Save Steps | 3,000 | | |
| | Video Backend | decord | | |
| ### Training Command | |
| ```bash | |
| python -m gr00t.train \ | |
| --dataset_repo_id ganatrask/NOVA \ | |
| --embodiment_tag reachy2 \ | |
| --video_backend decord \ | |
| --num_gpus 2 \ | |
| --batch_size 64 \ | |
| --max_steps 30000 \ | |
| --save_steps 3000 \ | |
| --output_dir ./checkpoints/groot-reachy2 | |
| ``` | |
| ## Usage | |
| ### Prerequisites | |
| You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag: | |
| ```bash | |
| cd Isaac-GR00T | |
| patch -p1 < ../patches/add_reachy2_embodiment.patch | |
| ``` | |
| ### Inference | |
| ```python | |
| from gr00t.data.embodiment_tags import EmbodimentTag | |
| from gr00t.policy.gr00t_policy import Gr00tPolicy | |
| import importlib.util | |
| # Load modality config first | |
| spec = importlib.util.spec_from_file_location( | |
| "modality_config", | |
| "configs/reachy2_modality_config.py" | |
| ) | |
| module = importlib.util.module_from_spec(spec) | |
| spec.loader.exec_module(module) | |
| # Load policy | |
| policy = Gr00tPolicy( | |
| embodiment_tag=EmbodimentTag.REACHY2, | |
| model_path="ganatrask/NOVA", # or local checkpoint path | |
| device="cuda", | |
| strict=True, | |
| ) | |
| # Run inference | |
| obs = { | |
| "video": {"front_cam": image[None, None, :, :, :]}, # (1, 1, H, W, 3) | |
| "state": {"arm_joints": joints[None, None, :]}, # (1, 1, 7) | |
| "language": {"annotation.human.task_description": [["Pick up the red cube"]]}, | |
| } | |
| action, _ = policy.get_action(obs) | |
| ``` | |
| ## Performance | |
| | Metric | Value | | |
| |--------|-------| | |
| | Inference Speed | ~40ms/step (A100) | | |
| | VRAM Usage | ~44GB / 80GB | | |
| | Training Time | ~6 hours (30K steps) | | |
| ## Limitations | |
| - **Simulation-trained**: Primarily trained on MuJoCo simulation data | |
| - **Single-arm**: Currently supports right arm manipulation only | |
| - **Fixed camera setup**: Expects front camera input at 224×224 resolution | |
| - **Task scope**: Optimized for pick-and-place; may not generalize to other manipulation tasks | |
| ## Ethical Considerations | |
| - This model should be used for research purposes | |
| - Human supervision recommended for real robot deployment | |
| - Not intended for safety-critical applications without extensive testing | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{nova2025, | |
| title={NOVA: Neural Open Vision Actions}, | |
| author={ganatrask}, | |
| year={2025}, | |
| publisher={HuggingFace}, | |
| url={https://huggingface.co/ganatrask/NOVA} | |
| } | |
| ``` | |
| ## Acknowledgments | |
| - **[NVIDIA](https://developer.nvidia.com/)** - GR00T N1.6 base model | |
| - **[Pollen Robotics](https://www.pollen-robotics.com/)** - Reachy 2 robot | |
| - **[HuggingFace](https://huggingface.co/)** - LeRobot framework | |
| - **[VESSL AI](https://vessl.ai/)** - GPU compute for training | |
| ## License | |
| This model inherits the [NVIDIA Open Model License](https://developer.nvidia.com/open-model-license) from the base GR00T N1.6 model. | |
| ## Links | |
| - **GitHub**: [ganatrask/NOVA](https://github.com/ganatrask/NOVA) | |
| - **Dataset**: [ganatrask/NOVA](https://huggingface.co/datasets/ganatrask/NOVA) | |
| - **Base Model**: [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) | |