| --- |
| license: apache-2.0 |
| base_model: roboflow/rf-detr |
| tags: |
| - coreai |
| - aimodel |
| - object-detection |
| - rf-detr |
| - detr |
| - apple |
| - ios |
| - macos |
| pipeline_tag: object-detection |
| --- |
| |
| # RF-DETR β Core AI (`.aimodel`) |
|
|
| [RF-DETR](https://github.com/roboflow/rf-detr) (Roboflow's real-time detection |
| transformer, COCO-pretrained) converted to Apple **Core AI** for iOS 27 / macOS 27 β |
| the answer to [apple/coreai-models#14](https://github.com/apple/coreai-models/issues/14). |
| **DETR family = no NMS**: post-processing is one sigmoid + top-k. |
|
|
| <p align="center"><img src="demo_coco_cats.jpg" width="440" alt="RF-DETR medium on Core AI"></p> |
|
|
| ## Files |
|
|
| | file | input | params | M4 Max GPU | iPhone 17 Pro GPU | |
| |---|---|---|---|---| |
| | `rfdetr-nano_float32.aimodel` | 384Γ384 | 30.5M | **8.6 ms** (~116 FPS) | **~25 ms (33β39 FPS live)** | |
| | `rfdetr-small_float32.aimodel` | 512Γ512 | 32.1M | **12.0 ms** (~83 FPS) | β | |
| | `rfdetr-medium_float32.aimodel` | 576Γ576 | 33.7M | **14.8 ms** (~68 FPS) | **56β63 ms (15β17 FPS live)** | |
| | `rfdetr-large_float32.aimodel` | 704Γ704 | 33.9M | **19.1 ms** (~52 FPS) | β | |
|
|
| iPhone numbers are end-to-end live-camera measurements from the |
| [CoreAIKit DetectCamera example](https://github.com/john-rocky/coreai-kit) |
| (Release; zero-copy capture pipeline β AVCaptureVideoPreviewLayer display, |
| hardware-scaled 32BGRA buffers, vImage preprocessing overlapped with GPU |
| inference). Peak measured 39.6 FPS β the nano model ceiling; sustained |
| max-load throughput drops on a hot chassis (thermal). |
|
|
| fp32 is the ship dtype: it gates **detection-set exact** vs the PyTorch fp32 reference on |
| CPU and GPU (per confident detection: same class, IoU β₯ 0.999 measured, score within 2e-3), |
| and fp16 only bought ~7% latency on M4 Max while adding near-tie ranking noise. |
|
|
| ## Graph contract |
|
|
| ``` |
| input "image" [1, 3, R, R] float32, RGB in [0, 1] (ImageNet mean/std folded in-graph) |
| output "dets" [1, 300, 4] boxes, cxcywh normalized to [0, 1] |
| output "labels" [1, 300, 91] raw class logits; column index = ORIGINAL COCO id (0 unused, 1=person β¦ 17=cat β¦ 90) |
| ``` |
|
|
| Python decode sketch (Swift is the same three steps): |
|
|
| ```python |
| import numpy as np, coreai.runtime as rt |
| |
| model = await rt.AIModel.load(path, rt.SpecializationOptions.default()) |
| fn = model.load_function("main") |
| out = await fn({"image": rt.NDArray(rgb01)}) # rgb01: [1,3,R,R] in [0,1] |
| prob = 1 / (1 + np.exp(-out["labels"].numpy()[0])) # [300, 91] |
| scores, classes = prob.max(-1), prob.argmax(-1) # column index IS the COCO id |
| boxes = out["dets"].numpy()[0] # cxcywh, multiply by image W/H |
| keep = scores > 0.5 # done β no NMS |
| ``` |
|
|
| ## RF-DETR-Seg (instance segmentation) |
|
|
| `rfdetr-seg-{nano,small,medium,large,xlarge,2xlarge}_float32.aimodel` β same |
| contract plus `masks [1, Q, R/4, R/4]`: per-query FULL-FRAME logit planes at |
| stride 4 (host: sigmoid > 0.5; no ROI plumbing, no NMS). All six gate on CPU |
| and GPU with binary-mask IoU 1.000 on stable scenes. M4 Max GPU: |
| seg-nano 312Β² **10.7 ms** β seg-2xlarge 768Β² **59.1 ms**. |
|
|
| <p align="center"><img src="demo_seg_coco_cats.jpg" width="440" alt="RF-DETR-Seg nano on Core AI"></p> |
|
|
| ## Split deployment (`split/`) |
|
|
| `split/rfdetr-{nano,medium}_{backbone,head}.aimodel` separate the pure-ViT |
| backbone (image β features) from the deformable head (features β dets/labels; |
| position encodings baked in). The chain is bit-exact vs the monolith. Purpose: |
| per-stage compute-unit preferences β e.g. backbone on the Neural Engine. |
| Measured honestly: on iOS 27 beta the runtime still executes the backbone on |
| the GPU delegate even under `.neuralEngine` preference (identical detection |
| fingerprint, no ANE-compile pause), so today the monolith on GPU is the |
| fastest config; the split exists so ANE placement can be adopted the moment |
| the runtime honors it. Regenerate with `export_rf_detr.py --variant <v> --split`. |
|
|
| ## Conversion |
|
|
| Exported with |
| [`conversion/export_rf_detr.py`](https://github.com/john-rocky/coreai-model-zoo/blob/main/conversion/export_rf_detr.py) |
| from `rfdetr==1.7.1` weights. The port surfaced four Core AI converter/runtime bugs |
| (float-arg `arange` abort, int64-comparison buffer clobber, GPU-delegate |
| floor/trunc/ceil = identity, cast-pair cancellation) β each worked around numerically |
| identically; details and minimal repros in |
| [zoo/rf-detr.md](https://github.com/john-rocky/coreai-model-zoo/blob/main/zoo/rf-detr.md). |
|
|
| License: Apache-2.0 (upstream RF-DETR code and COCO-pretrained weights are Apache-2.0). |
|
|