card: RF-DETR-Seg family

2ed6139 verified 11 days ago

4.55 kB

	---
	license: apache-2.0
	base_model: roboflow/rf-detr
	tags:
	- coreai
	- aimodel
	- object-detection
	- rf-detr
	- detr
	- apple
	- ios
	- macos
	pipeline_tag: object-detection
	---

	# RF-DETR — Core AI (`.aimodel`)

	[RF-DETR](https://github.com/roboflow/rf-detr) (Roboflow's real-time detection
	transformer, COCO-pretrained) converted to Apple Core AI for iOS 27 / macOS 27 —
	the answer to [apple/coreai-models#14](https://github.com/apple/coreai-models/issues/14).
	DETR family = no NMS: post-processing is one sigmoid + top-k.

	<p align="center"><img src="demo_coco_cats.jpg" width="440" alt="RF-DETR medium on Core AI"></p>

	## Files

	\| file \| input \| params \| M4 Max GPU \| iPhone 17 Pro GPU \|
	\|---\|---\|---\|---\|---\|
	\| `rfdetr-nano_float32.aimodel` \| 384×384 \| 30.5M \| 8.6 ms (~116 FPS) \| ~25 ms (33–39 FPS live) \|
	\| `rfdetr-small_float32.aimodel` \| 512×512 \| 32.1M \| 12.0 ms (~83 FPS) \| — \|
	\| `rfdetr-medium_float32.aimodel` \| 576×576 \| 33.7M \| 14.8 ms (~68 FPS) \| 56–63 ms (15–17 FPS live) \|
	\| `rfdetr-large_float32.aimodel` \| 704×704 \| 33.9M \| 19.1 ms (~52 FPS) \| — \|

	iPhone numbers are end-to-end live-camera measurements from the
	[CoreAIKit DetectCamera example](https://github.com/john-rocky/coreai-kit)
	(Release; zero-copy capture pipeline — AVCaptureVideoPreviewLayer display,
	hardware-scaled 32BGRA buffers, vImage preprocessing overlapped with GPU
	inference). Peak measured 39.6 FPS ≈ the nano model ceiling; sustained
	max-load throughput drops on a hot chassis (thermal).

	fp32 is the ship dtype: it gates detection-set exact vs the PyTorch fp32 reference on
	CPU and GPU (per confident detection: same class, IoU ≥ 0.999 measured, score within 2e-3),
	and fp16 only bought ~7% latency on M4 Max while adding near-tie ranking noise.

	## Graph contract

	```
	input "image" [1, 3, R, R] float32, RGB in [0, 1] (ImageNet mean/std folded in-graph)
	output "dets" [1, 300, 4] boxes, cxcywh normalized to [0, 1]
	output "labels" [1, 300, 91] raw class logits; column index = ORIGINAL COCO id (0 unused, 1=person … 17=cat … 90)
	```

	Python decode sketch (Swift is the same three steps):

	```python
	import numpy as np, coreai.runtime as rt

	model = await rt.AIModel.load(path, rt.SpecializationOptions.default())
	fn = model.load_function("main")
	out = await fn({"image": rt.NDArray(rgb01)}) # rgb01: [1,3,R,R] in [0,1]
	prob = 1 / (1 + np.exp(-out["labels"].numpy()[0])) # [300, 91]
	scores, classes = prob.max(-1), prob.argmax(-1) # column index IS the COCO id
	boxes = out["dets"].numpy()[0] # cxcywh, multiply by image W/H
	keep = scores > 0.5 # done — no NMS
	```

	## RF-DETR-Seg (instance segmentation)

	`rfdetr-seg-{nano,small,medium,large,xlarge,2xlarge}_float32.aimodel` — same
	contract plus `masks [1, Q, R/4, R/4]`: per-query FULL-FRAME logit planes at
	stride 4 (host: sigmoid > 0.5; no ROI plumbing, no NMS). All six gate on CPU
	and GPU with binary-mask IoU 1.000 on stable scenes. M4 Max GPU:
	seg-nano 312² 10.7 ms → seg-2xlarge 768² 59.1 ms.

	<p align="center"><img src="demo_seg_coco_cats.jpg" width="440" alt="RF-DETR-Seg nano on Core AI"></p>

	## Split deployment (`split/`)

	`split/rfdetr-{nano,medium}_{backbone,head}.aimodel` separate the pure-ViT
	backbone (image → features) from the deformable head (features → dets/labels;
	position encodings baked in). The chain is bit-exact vs the monolith. Purpose:
	per-stage compute-unit preferences — e.g. backbone on the Neural Engine.
	Measured honestly: on iOS 27 beta the runtime still executes the backbone on
	the GPU delegate even under `.neuralEngine` preference (identical detection
	fingerprint, no ANE-compile pause), so today the monolith on GPU is the
	fastest config; the split exists so ANE placement can be adopted the moment
	the runtime honors it. Regenerate with `export_rf_detr.py --variant <v> --split`.

	## Conversion

	Exported with
	[`conversion/export_rf_detr.py`](https://github.com/john-rocky/coreai-model-zoo/blob/main/conversion/export_rf_detr.py)
	from `rfdetr==1.7.1` weights. The port surfaced four Core AI converter/runtime bugs
	(float-arg `arange` abort, int64-comparison buffer clobber, GPU-delegate
	floor/trunc/ceil = identity, cast-pair cancellation) — each worked around numerically
	identically; details and minimal repros in
	[zoo/rf-detr.md](https://github.com/john-rocky/coreai-model-zoo/blob/main/zoo/rf-detr.md).

	License: Apache-2.0 (upstream RF-DETR code and COCO-pretrained weights are Apache-2.0).