Instructions to use james-burgess/miewid with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use james-burgess/miewid with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="james-burgess/miewid", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("james-burgess/miewid", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
MiewID
MiewID is a multi-species individual animal re-identification model. Given a cropped photo of an animal it returns a 2,152-dimensional embedding vector that captures the animal's identity. Compare two embeddings with cosine similarity to decide whether they show the same individual.
It covers 64 terrestrial and aquatic species and runs in production on the Wildbook wildlife monitoring platform.
This repository ships both the PyTorch model (loadable via HuggingFace
transformers) and an ONNX export for runtimes that can't pull in PyTorch.
| Try it live | miewid-onnx · miewid-pytorch |
| Paper | arXiv:2412.05602 |
| Finetune | finetune.ipynb |
| Code | wbia-plugin-miew-id |
| Data sources | DATA_SOURCES.md |
What the paper found
Otarashvili et al. (2024) trained a single embedding network on 49 species, 37K individuals, and 225K expert-curated annotations and measured it against three baselines:
Multispecies beats single-species. The joint model outperformed 49 separate models each trained on one species. The gap was largest for species with the least data; averaged across all species the multispecies model gained 12.5% top-1 accuracy.
Zero-shot on unseen species beats MegaDescriptor. Tested on 33 species never seen during training, MiewID beat MegaDescriptor‑L‑384 by an average of 19.2 percentage points per species.
Fine-tuning works with few examples. When only a handful of annotated individuals are available for a new species, fine-tuning the pretrained model consistently outperforms training from scratch. Incorporating the few examples directly into full multispecies retraining does even better.
Vision transformers didn't help. SwinV2‑B was 4.5% worse than EfficientNetV2‑M on this task.
The production model extends the paper's approach: it uses a GeM pooling layer and BatchNorm head (yielding 2,152‑dim embeddings instead of 2,048), operates at 440×440 resolution (vs 256 in the paper experiments), and covers 64 species.
How it works
- A detector (YOLO, MegaDetector, etc.) finds animals and crops out each chip.
- Resize to 440×440 and apply ImageNet normalisation.
- MiewID maps the chip to a 2,152‑dimension embedding.
- Cosine similarity against a database of known individuals returns the closest match.
Architecture
| Component | Detail |
|---|---|
| Backbone | EfficientNetV2‑M, ~51M parameters, ImageNet‑1K pretrained |
| Pooling | GeM (Generalised Mean Pooling, p=3) |
| Head | BatchNorm1d → L2‑normalised output |
| Loss (training) | Sub-center ArcFace (k=3 sub-centers), dynamic margins |
| Input | (B, 3, 440, 440) float32, ImageNet‑normalised |
| Output | (B, 2152) float32, unit L2 norm |
Loss design
The model was trained with sub-center ArcFace (k=3 sub-centers per class), which improves robustness to label noise by allowing each class to occupy a small region of embedding space rather than a single point. Combined with dynamic margins that adapt the angular penalty per class based on sample count, this handles the heavy class imbalance typical of wildlife data.
Data split
Annotations from each individual are split so roughly half the individuals appear in both train and test (with different images) and the other half appear only in test. Evaluation uses a one-vs-all scheme on the test set: each annotation is a query matched against all other test annotations (excluding itself), avoiding the soft data leak of using the training set as a reference gallery.
Training data
49 species, 59 source datasets, 37K individuals, 225K annotations. Sources fall into three groups:
Contributed through Wildbook — species experts manually curate individual identities on Wildbook-managed platforms. Each annotation carries a per-sighting ID verified by a human. Data partners include NOAA, Sarasota Dolphin Research Project, Botswana Predator Conservation Trust, ECOCEAN, Giraffe Conservation Foundation, Norwegian Orca Survey, Cascadia Research Collective, African Parks, and many others.
Public re-ID datasets — DogFaceNet, PrimFace, ChimpFace, MacaqueFaces, LemurFace, THoDBRL2015, SeaTurtleID, SealID, C-Tai, C-Zoo, Lomas Capuchin, wildlife-datasets.
Community science — the Happywhale Kaggle competition dataset covers multiple cetacean species, including blue whales, dusky dolphins, orcas, and spinner dolphins.
A public subset is available through LILA.science.
The full per-species breakdown is in DATA_SOURCES.md.
Viewpoints matter
For species where markings differ across the body, annotations from left and right views are treated as different individuals during training. For species identified by outline shape (e.g., dorsal fins), opposite-side views can match.
Training recipe
The paper reports these hyperparameters (optimised with Optuna):
| Parameter | Value |
|---|---|
| Image size (experiments) | 256×256 |
| Image size (production) | 440×440 |
| Batch size | 112 |
| Warmup | 15 epochs, linear 1.5e‑5 → 1.5e‑3 |
| Decay | exponential, 0.8 per epoch |
| Augmentations | random colour sharpening, CLAHE, shift ±25%, scale ±20%, rotation ±15°, colour jitter |
Performance
Conservation X Labs evaluated MiewID as an individual-retrieval model using rank‑k accuracy. Selected results:
| Species | Rank‑1 | Species | Rank‑1 |
|---|---|---|---|
| Zebra (Grevy's) | 96.1% | Giraffe (Reticulated) | 98.8% |
| Cheetah | 70.8% | Lion | 93.2% |
| Leopard | 77.6% | Wild Dog | 86.1% |
| Humpback Whale | 70.3% | Orca | 86.0% |
| Whale Shark | 65.2% | Green Turtle | 89.0% |
| Bottlenose Dolphin | 92.4% | Spinner Dolphin | 98.8% |
The full 64‑species evaluation table (mAP, rank‑1 through rank‑20, data source)
lives in DATA_SOURCES.md.
Quick start
PyTorch + Transformers
from transformers import AutoModel
from PIL import Image
import numpy as np
import torch
model = AutoModel.from_pretrained("james-burgess/miewid", trust_remote_code=True)
chip = Image.open("zebra_chip.jpg").convert("RGB").resize((440, 440))
tensor = torch.from_numpy(np.array(chip, dtype=np.float32))
tensor = (tensor - torch.tensor([0.485, 0.456, 0.406]) * 255) \
/ (torch.tensor([0.229, 0.224, 0.225]) * 255)
tensor = tensor.permute(2, 0, 1).unsqueeze(0)
with torch.no_grad():
embedding = model(tensor).numpy()
# -> (1, 2152)
ONNX Runtime
pip install onnxruntime huggingface_hub
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download
from PIL import Image
model_path = hf_hub_download("james-burgess/miewid", "miewid.onnx")
session = ort.InferenceSession(
model_path,
providers=["CPUExecutionProvider"],
)
# To run on GPU, install onnxruntime-gpu and use ["CUDAExecutionProvider"]
MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32)
STD = np.array([0.229, 0.224, 0.225], dtype=np.float32)
chip = Image.open("zebra_chip.jpg").convert("RGB")
chip = chip.resize((440, 440), Image.BILINEAR)
chip = np.array(chip, dtype=np.float32)
chip = (chip - MEAN * 255.0) / (STD * 255.0)
chip = np.transpose(chip, (2, 0, 1))
chip = np.expand_dims(chip, axis=0)
embedding = session.run(None, {"input": chip})[0]
# -> (1, 2152)
OpenCV DNN
import cv2
import numpy as np
MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32) * 255.0
STD = np.array([0.229, 0.224, 0.225], dtype=np.float32) * 255.0
net = cv2.dnn.readNetFromONNX("miewid.onnx")
chip = cv2.imread("zebra_chip.jpg")
chip = cv2.cvtColor(chip, cv2.COLOR_BGR2RGB)
chip = cv2.resize(chip, (440, 440)).astype(np.float32)
chip = (chip - MEAN) / STD
net.setInput(np.transpose(chip, (2, 0, 1))[np.newaxis, ...])
embedding = net.forward()
Preprocessing
MiewID expects ImageNet normalisation:
pixel_normalised = (pixel − mean × 255) / (std × 255)
| R | G | B | |
|---|---|---|---|
| mean | 0.485 | 0.456 | 0.406 |
| std | 0.229 | 0.224 | 0.225 |
Order: resize → (chip − mean × 255) / (std × 255) → CHW transpose → batch dim.
Finetuning
finetune.ipynb adapts MiewID to a new species, region, or
camera-trap setup. It covers:
- Loading the base model and freezing the backbone
- Attaching an ArcFace head for individual‑level training
- Training loop with data augmentation
- ONNX export of the finetuned model
Export
miewid.onnx was exported from the PyTorch checkpoint with torch.onnx.export()
(opset 14). To recreate it:
python scripts/export.py --upload
License
MIT
Citation
If you use MiewID in your work, cite both the paper and the software:
@misc{otarashvili2024multispecies,
title={Multispecies Animal Re-ID Using a Large Community-Curated Dataset},
author={Lasha Otarashvili and Tamilselvan Subramanian and Jason Holmberg
and J.J. Levenson and Charles V. Stewart},
year={2024},
eprint={2412.05602},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.05602},
}
@misc{WildMe2023,
author={Otarashvili, Lasha and Holmberg, Jason and Abidi, Collin
and Subramanian, Tamilselvan},
title={MiewID},
year={2024},
publisher={Zenodo},
doi={10.5281/zenodo.13647526},
url={https://github.com/WildMeOrg/wbia-plugin-miew-id},
}
Credits
Conservation X Labs developed MiewID with the Wild Me community and data partners. Funding came from the Gordon and Betty Moore Foundation, the Bureau of Ocean Energy Management (BOEM), and the US National Science Foundation (Award 2118240).
- Downloads last month
- 17