Physical AI Studio - VLA model fine-tuning for robotics

Action Chunking Transformer (ACT)

Action Chunking with Transformers (ACT) is an imitation-learning policy that predicts short action chunks from robot state and visual observations. The robot can execute those chunks as a sequence of real-world movements.

This model was trained and exported with Physical AI Studio for local or Hugging Face-hosted robot inference.

Model Details

Policy: act
Runtime library: physicalai
Generated by: Physical AI Studio

Intended Use

Use this model for robot imitation-learning inference in setups matching the training dataset, robot embodiment, camera viewpoints, and task instructions. Validate behavior in simulation or a safe test cell before running on hardware.

Dataset

This model was trained from the Physical AI Studio dataset named dice-combined.

Model Package

Load the model from the root directory when possible. The root manifest.json is the package entry point, and backend-specific manifests live under exports/<backend>/manifest.json.

Backend	Artifact	Intended Use
torch	`exports/torch/act.pt`	Canonical checkpoint and Python inference

Training Environment

Environment: Empty

name: Empty
robots: []
cameras: []

I/O Specification

torch

Inputs

Name	Type	Shape	Dtype
state	STATE	[6]	float32
images.overview	VISUAL	[3, 480, 640]	float32
images.gripper	VISUAL	[3, 480, 640]	float32

Outputs

Name	Type	Shape	Dtype
action	ACTION	[100, 6]	float32

Running Inference

Installation

uv pip install physicalai numpy

The following smoke test verifies that the package loads and accepts tensors with the declared shapes. Replace the dummy values with observations from your robot runtime before using the model for control.

import numpy as np
from physicalai.inference import InferenceModel

MODEL_PATH = "path/to/model"
model = InferenceModel.load(MODEL_PATH, device="CPU")

observation = {
    "state": np.random.rand(1, 6).astype(np.float32),
    "images.overview": np.random.rand(1, 3, 480, 640).astype(np.float32),
    "images.gripper": np.random.rand(1, 3, 480, 640).astype(np.float32),
}

chunk = model.predict_action_chunk(observation)

Set MODEL_PATH to this local model directory or to the Hugging Face repository id after upload.

Running A Robot Control Loop

For a blocking control loop similar to PhysicalAI's examples/runtime/sync_inference.py, start from the training robot and camera names exported above. Local device handles are placeholders because ports, camera paths, and stream URLs are not included in published model metadata.

python examples/runtime/sync_inference.py \
  --robot so101 \
  --port /dev/ttyACM0 \
  --calibration ./calibration.json \
  --model path/to/model \
  --camera overhead:uvc:/dev/video0 \
  --task "Move the dice into the cup" \
  --device CPU

Training / Reproducing Training

Import this model in Physical AI Studio and start a new training job using it as the base model. Studio will preserve the training lineage through the parent model relationship.

To reproduce behavior on your own hardware, match the exported I/O specification, robot type, camera viewpoints, control frequency, and calibration values from environment.json as closely as possible.

Evaluation

No task-specific evaluation metrics were exported with this generated card. Add validation results, success rates, and hardware test conditions before publishing externally.

Limitations And Safety

Robot policies can behave unpredictably outside their training distribution. Validate camera viewpoints, lighting, object placement, calibration values, robot embodiment, and task wording before autonomous operation. Use hardware limits, emergency stops, supervision, and staged validation.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Paper for MarkRedeman/dice-cleanup-combined-act-pas

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Paper • 2304.13705 • Published Apr 23, 2023 • 7