Tibetan page orientation classifier (DINOv3 ViT-S)

Predict whether a tibetan manuscript page image is upright (non_flipped) or upside-down (flipped, 180°).

Dataset (HF): BDRC/tibetan-page-orientation-classifier-dataset
Experiment: dinov3_binary_flip_center_cropped
Checkpoint: final_model.pt (best validation macro-F1 across stages A + B + C)

Preprocessing (inference)

Split	Mode
train	`center_crop`
val	`center_crop`
test	`center_crop`

Use the same mode as training before the DINO image processor (size 448).

Test metrics (n=854)

Metric	Value
Accuracy	100.0%
Macro F1	1.000
AUC-ROC	1.000
Loss	0.1237

Training config

Setting	Value
Stages	A + B + C
Epochs A / B	5 / 10
LR head A	0.0005
LR backbone B	5e-06
LR head B	5e-05
Unfreeze blocks B	4
Scheduler	`cosine_warmup`
Class weights	`none`
Label smoothing	0.05

Training history

Per-epoch metrics: training_history.json and results.json → history.

Files

File	Description
`final_model.pt`	Weights + `idx_to_label` + test metrics
`results.json`	Full training config, history, report
`training_history.json`	Stage A/B epoch logs
`confusion_matrix.json`	Machine-readable CM
`confusion_matrix.png`	Plot
`training_history.png`	Loss / F1 curves
`model_card.json`	Summary metadata
`config.yaml`	Training hyperparameters (copy)
`inference_classifier.py`	CLI inference on image paths

Load weights

import torch
ckpt = torch.load("final_model.pt", map_location="cpu", weights_only=False)
print(ckpt["test_metrics"])
print(ckpt["idx_to_label"])

Inference

Use the same preprocessing as training (center_crop, size 448). The bundled script reads defaults from model_card.json when flags are omitted.

pip install -r requirements-inference.txt
python inference_classifier.py --checkpoint final_model.pt --image page.jpg

Explicit flags (recommended for reproducibility):

python inference_classifier.py \
  --checkpoint final_model.pt \
  --image page.jpg \
  --preprocess center_crop \
  --preprocess-size 448

Classes: non_flipped = upright page; flipped = upside-down (180° rotation).

Citation

@misc{bdrcpageorientationmodel,
  title  = {Tibetan Page Orientation Classifier (DINOv3)},
  author = {Buddhist Digital Resource Center and OpenPecha},
  year   = {2026},
  url    = {https://huggingface.co/BDRC/tibetan-page-orieantation-classifier},
  dataset = {https://hunggingface.co/BDRC/tibetan-page-orientation-classifier-dataset},
  experiment = {center-cropped}
  note   = {Trained on BDRC manuscript images}
}

License

Model weights and inference code: Apache License 2.0.

Acknowledgements

All images are provided by the Buddhist Digital Resource Center (BDRC). This dataset was developed by Dharmaduta from specifications provided by BDRC for the project "The BDRC Etext Corpus", with funding from the Khyentse Foundation. Buddhist Digital Resource Center (BDRC). Developed by Dharmaduta / OpenPecha.

Downloads last month: 2

Model tree for BDRC/tibetan-page-orientation-classifier

Base model

facebook/dinov3-vit7b16-pretrain-lvd1689m

Finetuned

facebook/dinov3-vits16-pretrain-lvd1689m