TST: Temporal Segment Transformer for Action Segmentation

Pre-trained DiffAct+TST checkpoints for temporal action segmentation on three benchmark datasets.

Model Description

TST (Temporal Segment Transformer) is a plug-in segment-level refinement module that improves frame-level action segmentation backbones. It uses DETR-style Hungarian matching to refine backbone predictions at the segment level via cross-attention and self-attention.

These checkpoints use DiffAct (ICCV'23) as the backbone with TST applied as a Stage 2 refinement head (frozen backbone).

Results

GTEA

Method	F1@10	F1@25	F1@50	Edit	Acc
DiffAct	92.5	91.5	84.7	89.6	80.3
DiffAct+TST	94.2	93.0	87.1	90.9	81.4

50Salads

Method	F1@10	F1@25	F1@50	Edit	Acc
DiffAct	90.1	89.2	83.7	85.0	88.9
DiffAct+TST	92.3	91.8	87.4	87.4	89.7

Breakfast

Method	F1@10	F1@25	F1@50	Edit	Acc
DiffAct	80.3	75.9	64.6	78.4	76.4
DiffAct+TST	81.2	77.1	65.9	79.0	76.9

Checkpoints

gtea/
├── split_1_best.pth   (21 MB)
├── split_2_best.pth   (21 MB)
├── split_3_best.pth   (21 MB)
└── split_4_best.pth   (21 MB)

50salads/
├── split_1_best.pth   (21 MB)
├── split_2_best.pth   (21 MB)
├── split_3_best.pth   (21 MB)
├── split_4_best.pth   (21 MB)
└── split_5_best.pth   (21 MB)

breakfast/
├── split_1_best.pth   (65 MB)
├── split_2_best.pth   (65 MB)
├── split_3_best.pth   (65 MB)
└── split_4_best.pth   (65 MB)

Total: ~525 MB

Usage

import torch
from tst.wrapper import BackboneWithTST, DiffActAdapter
from tst.tst_refiner import TSTRefiner

# Load checkpoint
ckpt = torch.load("gtea/split_1_best.pth", map_location="cpu")

# Build model (see main repo for full setup)
refiner = TSTRefiner(n_classes=11, feat_dim=192, inner_dim=64)
model = BackboneWithTST(adapter, refiner, freeze_backbone=True)
model.load_state_dict(ckpt, strict=False)

See the GitHub repository for full training and evaluation code.

Training Details

Dataset	lr	lr_transformer	inner_dim	Epochs	Backbone
GTEA	5e-4	5e-5	64	60	DiffAct
50Salads	5e-4	5e-5	64	60	DiffAct
Breakfast	5e-5	5e-6	128	60	DiffAct

All trained with Adam optimizer, cosine annealing LR schedule, batch size 1.

Citation

@article{tst2024,
  title={Temporal Segment Transformer for Action Segmentation},
  author={},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support