mascot-pose-detect

Two-stage mascot pose detector for chibi, kemono, and other stylized mascot characters.

This repository provides ONNX artifacts for portable inference:

Stage 1: a 7-class RTMDet-tiny bounding-box detector.
Stage 2: a DINOv2-Large backbone with a ViTPose-style COCO-17 heatmap head.

The keypoint model is fine-tuned for stylized mascot bodies whose proportions differ strongly from real human pose datasets. The consumer should lift the COCO-17 keypoints to DWPose-25 / POSE_KEYPOINT format and derive toe points from the foot bounding boxes. Hand keypoints are expected to be generated by a separate hand-template fitter when required.

License

This model package is released under the Apache License 2.0.

The Stage 2 keypoint model is based on facebook/dinov2-large, which is also released under Apache 2.0. It does not use the older MAE-pretrained ViTPose-L checkpoint that constrained the previous test bundle to non-commercial use.

The training annotations and source images are not included in this repository.

Repository Contents

grmchn/mascot-pose-detect/
├── bbox/
│   ├── model.onnx
│   ├── classes.json
│   └── decode_params.json
└── keypoint/
    └── dinov2_vitpose_l/
        ├── model.onnx
        └── meta.json

Stage 1: BBox Detector

The bbox model detects seven mascot body regions:

index	name
0	`full`
1	`head`
2	`body`
3	`hand_left`
4	`hand_right`
5	`foot_left`
6	`foot_right`

Left and right follow anatomical / character-view naming. For a front-facing character, the character's right side usually appears on the screen-left side.

Stage 2: Keypoint Detector

The keypoint model input is a top-down crop from the Stage 1 full or body bbox.

field	value
Architecture	`dinov2_vitpose_l`
Backbone	`facebook/dinov2-large`
Input	`1x3x224x168` NCHW RGB, ImageNet-normalized
Output	`heatmap`
Keypoint layout	COCO-17
Post-process layout	DWPose-25 / POSE_KEYPOINT-compatible

See keypoint/dinov2_vitpose_l/meta.json for exact input size, normalization values, output names, and post-processing notes.

Download

from huggingface_hub import snapshot_download

local_dir = snapshot_download(
    repo_id="grmchn/mascot-pose-detect",
    allow_patterns=[
        "bbox/*",
        "keypoint/dinov2_vitpose_l/*",
    ],
)

Notes

This is not an OpenPose implementation and does not include OpenPose weights. It produces keypoints that can be converted into an OpenPose-compatible JSON schema for downstream tools.

The model was trained for stylized mascot characters. It may not generalize to realistic human photos without additional fine-tuning.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for grmchn/mascot-pose-detect

Base model

facebook/dinov2-large

Quantized

(7)

this model