LocateAnything-3B CoreML
CoreML packages and a lightweight Python runner for image localization on Apple hardware.
Author: devin-lai markauto75@gmail.com
CoreML Inference Performance
This CoreML build is tuned for local macOS inference, where fast repeat runs matter more than model startup time. On an M5 Mac with 32GB memory, the optimized CoreML path improves the post-load inference workflow and makes the biggest jump in the decoder stage.
Benchmark setup:
- Device: macOS M5, 32GB memory
- Model: LocateAnything-3B
- Input image: 1536x1024
- Categories: person, car
- Comparison focus: inference time after model loading
| Metric | CoreML Optimized | PyTorch MPS bf16 | Improvement |
|---|---|---|---|
| Post-load inference time | 11.7s | 12.7s | ~1.1x faster |
| Generation time | 7.64s | 12.56s | ~1.6x faster |
| Prefill time | 1.72s | 7.97s | ~4.6x faster |
| Tokens per second | 17.55 TPS | 10.35 TPS | ~1.7x higher throughput |
For anyone running vision-language localization directly on a Mac, the practical win is lower wait time after the packages are loaded. CoreML reduces post-load inference from 12.7s to 11.7s, while the generation path improves from 12.56s to 7.64s.
The standout improvement is prefill: 7.97s drops to 1.72s, a roughly 4.6x speedup. Throughput also rises from 10.35 TPS to 17.55 TPS, making local decoding noticeably smoother for repeated image queries.
In short, this CoreML version delivers faster local inference, much faster prefill, and higher decoding throughput for macOS users who want the model running close to the metal.
Contents
LocateAnything-vision.mlpackage- image encoder packageLocateAnything-embed.mlpackage- token embedding packageLocateAnything-decoder.mlpackage- decoder packageLocateAnything-assets/- tokenizer and runtime configurationrun_locateanything_image_coreml.py- still-image runnertest.png- sample input
Setup
pip install -r requirements.txt
Example
python run_locateanything_image_coreml.py \
--input test.png \
--categories "person,car"
By default, the script writes:
test.coreml.annotated.pngtest.coreml.detections.json
Notes
The packages are configured for the image grid stored in the vision package metadata. Use the bundled assets directory with these packages so token ids and runtime limits stay aligned.
The license follows the upstream NVIDIA LocateAnything-3B terms linked in the metadata above.
- Downloads last month
- 23