Instructions to use litert-community/YuNet-Face-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use litert-community/YuNet-Face-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
YuNet — LiteRT (on-device face detection, fully-GPU)
YuNet (ShiqiYu/libfacedetection), a tiny fast face detector
(faces + 5 landmarks), converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on
Android. 0.076 M params / 0.3 MB fp16.
On-device (Pixel 8a, Tensor G3 — verified)
| nodes on GPU | 146 / 146 LITERT_CL (full residency) |
| inference | ~4 ms (640×640) |
| size | 0.3 MB (fp16) |
| accuracy | device-vs-PyTorch corr 0.9999 (all 12 outputs) |
image[1,3,640,640] (BGR, 0-255) →[GPU: YuNet]→ 12 outputs: cls/obj/bbox/kps × strides {8,16,32}
How it converts (litert-torch) — clean, no re-authoring
Pure CNN (depthwise-separable ConvDPUnit) + a nearest-upsample neck (F.interpolate(mode="nearest") →
RESIZE_NEAREST_NEIGHBOR, no transposed conv) + non-padded MaxPool2d (no PADV2). The head's per-stride
permute/reshape/sigmoid is baked in → 12 decode-ready outputs. Banned ops NONE, ≤4D, tflite-vs-torch corr
1.0, device-vs-torch corr 0.9999.
Decode (host-side) & preprocessing
Preprocessing: letterbox to 640×640, BGR, 0-255, no normalization. Anchor-free priors
(px=col·s, py=row·s, offset 0): score=cls·obj, box=center+exp(wh)·s, 5 landmarks kps·s+prior, then NMS.
Minimal usage
Android (Kotlin, CompiledModel GPU)
val model = CompiledModel.create(context.assets, "yunet_fp16.tflite",
CompiledModel.Options(Accelerator.GPU), null)
val inputs = model.createInputBuffers(); val outputs = model.createOutputBuffers()
inputs[0].writeFloat(bgr) // [1,3,640,640] NCHW BGR, 0-255 (no normalization)
model.run(inputs, outputs)
// 12 outputs in order: cls x3 [1,N,1], obj x3 [1,N,1], bbox x3 [1,N,4], kps x3 [1,N,10]
// for strides {8,16,32}, N = (640/stride)^2 = 6400/1600/400. Decode = Python below.
val cls8 = outputs[0].readFloat()
Python (desktop verification)
import math, numpy as np
from PIL import Image
from ai_edge_litert.interpreter import Interpreter
im = Image.open("faces.jpg").convert("RGB").resize((640, 640))
bgr = np.asarray(im, np.float32)[:, :, ::-1] # BGR, 0-255
x = bgr.transpose(2, 0, 1)[None].copy() # [1,3,640,640]
it = Interpreter(model_path="yunet_fp16.tflite"); it.allocate_tensors()
it.set_tensor(it.get_input_details()[0]["index"], x); it.invoke()
o = [it.get_tensor(d["index"])[0] for d in it.get_output_details()]
# output order: cls x3, obj x3, bbox x3, kps x3 (strides 8, 16, 32)
dets = []
for li, s in enumerate([8, 16, 32]):
cls, obj, bb, kp = o[li][:, 0], o[3 + li][:, 0], o[6 + li], o[9 + li]
fw = 640 // s
for i in np.where(cls * obj > 0.6)[0]: # score threshold
px, py = (i % fw) * s, (i // fw) * s
cx, cy = bb[i, 0] * s + px, bb[i, 1] * s + py
w, h = math.exp(bb[i, 2]) * s, math.exp(bb[i, 3]) * s
lm = [(kp[i, 2 * j] * s + px, kp[i, 2 * j + 1] * s + py) for j in range(5)]
dets.append(([cx - w/2, cy - h/2, cx + w/2, cy + h/2], float(cls[i] * obj[i]), lm))
def iou(a, b): # greedy NMS, IoU 0.45
ix = max(0, min(a[2], b[2]) - max(a[0], b[0])); iy = max(0, min(a[3], b[3]) - max(a[1], b[1]))
u = (a[2]-a[0])*(a[3]-a[1]) + (b[2]-b[0])*(b[3]-b[1]) - ix*iy
return ix * iy / u if u > 0 else 0
dets.sort(key=lambda d: -d[1]); faces = []
for d in dets:
if all(iou(d[0], f[0]) < 0.45 for f in faces): faces.append(d)
for box, score, lm in faces: print(f"face {score:.2f}", np.round(box, 1), "landmarks", np.round(lm, 1))
License
BSD-3-Clause. Upstream: ShiqiYu/libfacedetection.
- Downloads last month
- 14
