LocateAnything-3B CoreML

CoreML packages and a lightweight Python runner for image localization on Apple hardware.

CoreML Inference Performance

This CoreML build is tuned for local macOS inference, where fast repeat runs matter more than model startup time. On an M5 Mac with 32GB memory, the optimized CoreML path improves the post-load inference workflow and makes the biggest jump in the decoder stage.

Benchmark setup:

Device: macOS M5, 32GB memory
Model: LocateAnything-3B
Input image: 1536x1024
Categories: person, car
Comparison focus: inference time after model loading

Metric	CoreML Optimized	PyTorch MPS bf16	Improvement
Post-load inference time	11.7s	12.7s	~1.1x faster
Generation time	7.64s	12.56s	~1.6x faster
Prefill time	1.72s	7.97s	~4.6x faster
Tokens per second	17.55 TPS	10.35 TPS	~1.7x higher throughput

For anyone running vision-language localization directly on a Mac, the practical win is lower wait time after the packages are loaded. CoreML reduces post-load inference from 12.7s to 11.7s, while the generation path improves from 12.56s to 7.64s.

The standout improvement is prefill: 7.97s drops to 1.72s, a roughly 4.6x speedup. Throughput also rises from 10.35 TPS to 17.55 TPS, making local decoding noticeably smoother for repeated image queries.

In short, this CoreML version delivers faster local inference, much faster prefill, and higher decoding throughput for macOS users who want the model running close to the metal.

LocateAnything-vision.mlpackage - image encoder package
LocateAnything-embed.mlpackage - token embedding package
LocateAnything-decoder.mlpackage - decoder package
LocateAnything-assets/ - tokenizer and runtime configuration
run_locateanything_image_coreml.py - still-image runner
test.png - sample input

Setup

pip install -r requirements.txt

Example

python run_locateanything_image_coreml.py \
  --input test.png \
  --categories "person,car"

By default, the script writes:

test.coreml.annotated.png
test.coreml.detections.json

Notes

The packages are configured for the image grid stored in the vision package metadata. Use the bundled assets directory with these packages so token ids and runtime limits stay aligned.

The license follows the upstream NVIDIA LocateAnything-3B terms linked in the metadata above.

Downloads last month: 23

devin-lai
/

LocateAnything-3B-CoreML

LocateAnything-3B CoreML

CoreML Inference Performance

Contents

Setup

Example

Notes