TST: Temporal Segment Transformer for Action Segmentation

Pre-trained DiffAct+TST checkpoints for temporal action segmentation on three benchmark datasets.

Model Description

TST (Temporal Segment Transformer) is a plug-in segment-level refinement module that improves frame-level action segmentation backbones. It uses DETR-style Hungarian matching to refine backbone predictions at the segment level via cross-attention and self-attention.

These checkpoints use DiffAct (ICCV'23) as the backbone with TST applied as a Stage 2 refinement head (frozen backbone).

Results

GTEA

Method F1@10 F1@25 F1@50 Edit Acc
DiffAct 92.5 91.5 84.7 89.6 80.3
DiffAct+TST 94.2 93.0 87.1 90.9 81.4

50Salads

Method F1@10 F1@25 F1@50 Edit Acc
DiffAct 90.1 89.2 83.7 85.0 88.9
DiffAct+TST 92.3 91.8 87.4 87.4 89.7

Breakfast

Method F1@10 F1@25 F1@50 Edit Acc
DiffAct 80.3 75.9 64.6 78.4 76.4
DiffAct+TST 81.2 77.1 65.9 79.0 76.9

Checkpoints

gtea/
β”œβ”€β”€ split_1_best.pth   (21 MB)
β”œβ”€β”€ split_2_best.pth   (21 MB)
β”œβ”€β”€ split_3_best.pth   (21 MB)
└── split_4_best.pth   (21 MB)

50salads/
β”œβ”€β”€ split_1_best.pth   (21 MB)
β”œβ”€β”€ split_2_best.pth   (21 MB)
β”œβ”€β”€ split_3_best.pth   (21 MB)
β”œβ”€β”€ split_4_best.pth   (21 MB)
└── split_5_best.pth   (21 MB)

breakfast/
β”œβ”€β”€ split_1_best.pth   (65 MB)
β”œβ”€β”€ split_2_best.pth   (65 MB)
β”œβ”€β”€ split_3_best.pth   (65 MB)
└── split_4_best.pth   (65 MB)

Total: ~525 MB

Usage

import torch
from tst.wrapper import BackboneWithTST, DiffActAdapter
from tst.tst_refiner import TSTRefiner

# Load checkpoint
ckpt = torch.load("gtea/split_1_best.pth", map_location="cpu")

# Build model (see main repo for full setup)
refiner = TSTRefiner(n_classes=11, feat_dim=192, inner_dim=64)
model = BackboneWithTST(adapter, refiner, freeze_backbone=True)
model.load_state_dict(ckpt, strict=False)

See the GitHub repository for full training and evaluation code.

Training Details

Dataset lr lr_transformer inner_dim Epochs Backbone
GTEA 5e-4 5e-5 64 60 DiffAct
50Salads 5e-4 5e-5 64 60 DiffAct
Breakfast 5e-5 5e-6 128 60 DiffAct

All trained with Adam optimizer, cosine annealing LR schedule, batch size 1.

Citation

@article{tst2024,
  title={Temporal Segment Transformer for Action Segmentation},
  author={},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support