Milo โ CCG Card Embedder
MobileViT-XXS backbone trained with ArcFace loss (multitask: illustration_id + set_code) to produce 128-dimensional L2-normalised embeddings of CCG card images for nearest-neighbour retrieval.
Model details
| Property | Value |
|---|---|
| Architecture | MobileViT-XXS + linear projection |
| Input | 448ร448 RGB, ImageNet-normalised |
| Output | 128-d L2-normalised embedding vector |
| Parameters | ~1.0M |
| File size | 5.2 MB (fp32 ONNX) |
| Codename | milo |
| Version | 1.0.0 (epoch 15) |
| Training labels | illustration_id + set_code (multitask ArcFace) |
Usage
The easiest way to use Milo is through the CollectorVision library, which handles corner detection, dewarping, gallery loading, and nearest-neighbour search:
import collector_vision as cvg
cvid = cvg.Identifier(cvg.HFD("HanClinto/milo", "scryfall-mtg"))
result = cvid.identify("photo.jpg")
print(result.ids) # {"scryfall_id": "..."}
print(result.confidence) # 0.94
Direct ONNX usage
import onnxruntime as ort
import numpy as np
from PIL import Image
session = ort.InferenceSession("model.onnx")
# Preprocess: resize to 448ร448, ImageNet normalise, NCHW float32
img = Image.open("card_crop.jpg").convert("RGB").resize((448, 448))
x = np.array(img, dtype=np.float32) / 255.0
x = (x - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
x = x.transpose(2, 0, 1)[None] # (1, 3, 448, 448)
emb = session.run(None, {"pixel_values": x})[0] # (1, 128) float32, L2-normalised
Cosine similarity between two embeddings is just their dot product (both are unit vectors).
Gallery compatibility
Gallery files built with Milo v1.0.0 use milo1 in their filename. Embeddings from different Milo versions are not compatible โ rebuild the gallery when upgrading.
Part of CollectorVision
Used together with HanClinto/cornelius in the CollectorVision inference library.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for HanClinto/milo
Base model
apple/mobilevit-xx-small