mascot-pose-detect

Two-stage mascot pose detector for chibi, kemono, and other stylized mascot characters.

This repository provides ONNX artifacts for portable inference:

  1. Stage 1: a 7-class RTMDet-tiny bounding-box detector.
  2. Stage 2: a DINOv2-Large backbone with a ViTPose-style COCO-17 heatmap head.

The keypoint model is fine-tuned for stylized mascot bodies whose proportions differ strongly from real human pose datasets. The consumer should lift the COCO-17 keypoints to DWPose-25 / POSE_KEYPOINT format and derive toe points from the foot bounding boxes. Hand keypoints are expected to be generated by a separate hand-template fitter when required.

License

This model package is released under the Apache License 2.0.

The Stage 2 keypoint model is based on facebook/dinov2-large, which is also released under Apache 2.0. It does not use the older MAE-pretrained ViTPose-L checkpoint that constrained the previous test bundle to non-commercial use.

The training annotations and source images are not included in this repository.

Repository Contents

grmchn/mascot-pose-detect/
β”œβ”€β”€ bbox/
β”‚   β”œβ”€β”€ model.onnx
β”‚   β”œβ”€β”€ classes.json
β”‚   └── decode_params.json
└── keypoint/
    └── dinov2_vitpose_l/
        β”œβ”€β”€ model.onnx
        └── meta.json

Stage 1: BBox Detector

The bbox model detects seven mascot body regions:

index name
0 full
1 head
2 body
3 hand_left
4 hand_right
5 foot_left
6 foot_right

Left and right follow anatomical / character-view naming. For a front-facing character, the character's right side usually appears on the screen-left side.

Stage 2: Keypoint Detector

The keypoint model input is a top-down crop from the Stage 1 full or body bbox.

field value
Architecture dinov2_vitpose_l
Backbone facebook/dinov2-large
Input 1x3x224x168 NCHW RGB, ImageNet-normalized
Output heatmap
Keypoint layout COCO-17
Post-process layout DWPose-25 / POSE_KEYPOINT-compatible

See keypoint/dinov2_vitpose_l/meta.json for exact input size, normalization values, output names, and post-processing notes.

Download

from huggingface_hub import snapshot_download

local_dir = snapshot_download(
    repo_id="grmchn/mascot-pose-detect",
    allow_patterns=[
        "bbox/*",
        "keypoint/dinov2_vitpose_l/*",
    ],
)

Notes

This is not an OpenPose implementation and does not include OpenPose weights. It produces keypoints that can be converted into an OpenPose-compatible JSON schema for downstream tools.

The model was trained for stylized mascot characters. It may not generalize to realistic human photos without additional fine-tuning.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for grmchn/mascot-pose-detect

Quantized
(7)
this model