TST: Temporal Segment Transformer for Action Segmentation
Pre-trained DiffAct+TST checkpoints for temporal action segmentation on three benchmark datasets.
Model Description
TST (Temporal Segment Transformer) is a plug-in segment-level refinement module that improves frame-level action segmentation backbones. It uses DETR-style Hungarian matching to refine backbone predictions at the segment level via cross-attention and self-attention.
These checkpoints use DiffAct (ICCV'23) as the backbone with TST applied as a Stage 2 refinement head (frozen backbone).
Results
GTEA
| Method | F1@10 | F1@25 | F1@50 | Edit | Acc |
|---|---|---|---|---|---|
| DiffAct | 92.5 | 91.5 | 84.7 | 89.6 | 80.3 |
| DiffAct+TST | 94.2 | 93.0 | 87.1 | 90.9 | 81.4 |
50Salads
| Method | F1@10 | F1@25 | F1@50 | Edit | Acc |
|---|---|---|---|---|---|
| DiffAct | 90.1 | 89.2 | 83.7 | 85.0 | 88.9 |
| DiffAct+TST | 92.3 | 91.8 | 87.4 | 87.4 | 89.7 |
Breakfast
| Method | F1@10 | F1@25 | F1@50 | Edit | Acc |
|---|---|---|---|---|---|
| DiffAct | 80.3 | 75.9 | 64.6 | 78.4 | 76.4 |
| DiffAct+TST | 81.2 | 77.1 | 65.9 | 79.0 | 76.9 |
Checkpoints
gtea/
βββ split_1_best.pth (21 MB)
βββ split_2_best.pth (21 MB)
βββ split_3_best.pth (21 MB)
βββ split_4_best.pth (21 MB)
50salads/
βββ split_1_best.pth (21 MB)
βββ split_2_best.pth (21 MB)
βββ split_3_best.pth (21 MB)
βββ split_4_best.pth (21 MB)
βββ split_5_best.pth (21 MB)
breakfast/
βββ split_1_best.pth (65 MB)
βββ split_2_best.pth (65 MB)
βββ split_3_best.pth (65 MB)
βββ split_4_best.pth (65 MB)
Total: ~525 MB
Usage
import torch
from tst.wrapper import BackboneWithTST, DiffActAdapter
from tst.tst_refiner import TSTRefiner
# Load checkpoint
ckpt = torch.load("gtea/split_1_best.pth", map_location="cpu")
# Build model (see main repo for full setup)
refiner = TSTRefiner(n_classes=11, feat_dim=192, inner_dim=64)
model = BackboneWithTST(adapter, refiner, freeze_backbone=True)
model.load_state_dict(ckpt, strict=False)
See the GitHub repository for full training and evaluation code.
Training Details
| Dataset | lr | lr_transformer | inner_dim | Epochs | Backbone |
|---|---|---|---|---|---|
| GTEA | 5e-4 | 5e-5 | 64 | 60 | DiffAct |
| 50Salads | 5e-4 | 5e-5 | 64 | 60 | DiffAct |
| Breakfast | 5e-5 | 5e-6 | 128 | 60 | DiffAct |
All trained with Adam optimizer, cosine annealing LR schedule, batch size 1.
Citation
@article{tst2024,
title={Temporal Segment Transformer for Action Segmentation},
author={},
year={2024}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support