MagikaDocumentFromPixel — Lightweight Blur Detector

A Magika-inspired image quality gate that classifies images as sharp, blurred, or uncertain in a few milliseconds on CPU. Built to sit at the front of vision pipelines so expensive downstream models (OCR, detection, classification, VLMs) never waste compute on unusable input.

GitHub repo (training code, Dockerfile, full README): bradduy/MagikaDocumentFromPixel

Result on GoPro Large test split

Metric	Value
F1	0.9803
Accuracy	0.9806
Precision	0.9981
Recall	0.9631
AUC	0.9989
Model size	17 MB
Inference latency	~17 ms / image (CPU, single-scale)

Recipe

Backbone: MobileNetV3-Large, ImageNet-pretrained, 2-class softmax head (~3.3M parameters).
Frequency-domain auxiliary channel (Freq-Aux): a per-image-standardized Laplacian magnitude map is concatenated to the RGB tensor as a 4th input channel. The first conv is expanded from 3→4 channels (pretrained RGB weights preserved; the new slice is initialized from the mean of the RGB kernels). The Laplacian gives the network an explicit, scale-invariant edge-energy cue.
Training: 384×384 input, AdamW lr=1e-4, CosineAnnealing, CrossEntropy, 25 epochs, medium augmentation, mixed-precision, GoPro Large with blur_gamma extra positives.
Inference: 5-scale multi-scale TTA at 256, 320, 384, 448, 512.
Routing: return sharp or blurred when max softmax ≥ 0.60, otherwise return uncertain.

Files

best.pt — PyTorch state dict for the FreqAuxModel(MobileNetV3-Large) 4-channel-input model.

Usage

Clone the GitHub repo for the inference scripts, then load this checkpoint.

git clone https://github.com/bradduy/MagikaDocumentFromPixel.git
cd MagikaDocumentFromPixel
pip install -r blur_detector/requirements.txt

# Download this checkpoint
pip install huggingface_hub
python -c "from huggingface_hub import hf_hub_download; \
  hf_hub_download('bradduy/MagikaDocumentFromPixel', 'best.pt', \
                  local_dir='blur_detector/outputs/checkpoints/champion')"

# Run inference
python blur_detector/scripts/predict.py \
  --checkpoint blur_detector/outputs/checkpoints/champion/best.pt --freq_aux \
  path/to/image.jpg

Or in Python:

from blur_detector.src.models.blur_detector import build_model
from blur_detector.src.datasets.freq_aux import FreqAuxModel
from blur_detector.src.inference.predictor import BlurPredictor
import torch

backbone = build_model("mobilenet_v3_large", pretrained=False, in_channels=4)
model = FreqAuxModel(backbone)
model.load_state_dict(torch.load("best.pt"))

predictor = BlurPredictor(model, image_size=[256, 320, 384, 448, 512])
pred = predictor.predict("receipt.jpg")
print(pred.label, pred.confidence)

Intended use

Pre-check before OCR / VLM / paid vision API calls.
Upload-time quality filter ("please retake the photo").
Dataset curation for ML programs.
Edge / on-device inference (single-scale 384px → ONNX → mobile/browser).

Limitations

Trained on GoPro motion blur. Domain-shift retraining is recommended for defocus blur, low-light, scanner skew, or compression artifacts.
Threshold (0.60) is a product-level knob — sweep on a small hand-labeled slice of your traffic to set the precision/recall trade-off.

Citation

If you use this work in research or production, please cite:

Duy, Tran Thanh (2026). Edges Before Embeddings: A Confidence-Aware Blur Gate for Vision-Language Pipelines. Zenodo. https://doi.org/10.5281/zenodo.19765336

BibTeX:

@misc{duy2026edges,
  author       = {Duy, Tran Thanh},
  title        = {Edges Before Embeddings: A Confidence-Aware Blur Gate for Vision-Language Pipelines},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19765336},
  url          = {https://doi.org/10.5281/zenodo.19765336}
}

License

Author

Duy Tran Thanh (Brad Duy) — Sr. Applied AI Engineer

GitHub: @bradduy
Hugging Face: @bradduy

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

f1 on GoPro Large (test split)
self-reported

0.980
accuracy on GoPro Large (test split)
self-reported

0.981
precision on GoPro Large (test split)
self-reported

0.998
recall on GoPro Large (test split)
self-reported

0.963
roc_auc on GoPro Large (test split)
self-reported

0.999