mascot-pose-detect
Two-stage mascot pose detector for chibi, kemono, and other stylized mascot characters.
This repository provides ONNX artifacts for portable inference:
- Stage 1: a 7-class RTMDet-tiny bounding-box detector.
- Stage 2: a DINOv2-Large backbone with a ViTPose-style COCO-17 heatmap head.
The keypoint model is fine-tuned for stylized mascot bodies whose proportions differ strongly from real human pose datasets. The consumer should lift the COCO-17 keypoints to DWPose-25 / POSE_KEYPOINT format and derive toe points from the foot bounding boxes. Hand keypoints are expected to be generated by a separate hand-template fitter when required.
License
This model package is released under the Apache License 2.0.
The Stage 2 keypoint model is based on facebook/dinov2-large, which is also released under Apache 2.0.
It does not use the older MAE-pretrained ViTPose-L checkpoint that constrained the previous test bundle to non-commercial use.
The training annotations and source images are not included in this repository.
Repository Contents
grmchn/mascot-pose-detect/
βββ bbox/
β βββ model.onnx
β βββ classes.json
β βββ decode_params.json
βββ keypoint/
βββ dinov2_vitpose_l/
βββ model.onnx
βββ meta.json
Stage 1: BBox Detector
The bbox model detects seven mascot body regions:
| index | name |
|---|---|
| 0 | full |
| 1 | head |
| 2 | body |
| 3 | hand_left |
| 4 | hand_right |
| 5 | foot_left |
| 6 | foot_right |
Left and right follow anatomical / character-view naming. For a front-facing character, the character's right side usually appears on the screen-left side.
Stage 2: Keypoint Detector
The keypoint model input is a top-down crop from the Stage 1 full or body bbox.
| field | value |
|---|---|
| Architecture | dinov2_vitpose_l |
| Backbone | facebook/dinov2-large |
| Input | 1x3x224x168 NCHW RGB, ImageNet-normalized |
| Output | heatmap |
| Keypoint layout | COCO-17 |
| Post-process layout | DWPose-25 / POSE_KEYPOINT-compatible |
See keypoint/dinov2_vitpose_l/meta.json for exact input size, normalization values, output names, and post-processing notes.
Download
from huggingface_hub import snapshot_download
local_dir = snapshot_download(
repo_id="grmchn/mascot-pose-detect",
allow_patterns=[
"bbox/*",
"keypoint/dinov2_vitpose_l/*",
],
)
Notes
This is not an OpenPose implementation and does not include OpenPose weights. It produces keypoints that can be converted into an OpenPose-compatible JSON schema for downstream tools.
The model was trained for stylized mascot characters. It may not generalize to realistic human photos without additional fine-tuning.
Model tree for grmchn/mascot-pose-detect
Base model
facebook/dinov2-large