SupertonicTTS-3 — CoreML (.mlpackage, iOS)

First-party CoreML export of Supertonic-3's four non-autoregressive flow-matching graphs, for on-device iOS / Apple Neural Engine. Built by our own pipeline (speech-models/stmodels): weights lifted from the Supertone/supertonic-3 ONNX initializers → PyTorch nn.Module → coremltools (mlprogram, FP32, iOS18+).

Graphs & parity (FP32, vs ONNX Runtime)

Module	mlpackage	parity max\|Δ\|
Duration predictor	`DurationPredictor.mlpackage`	7.2e-06 ✓
Vector estimator (ODE denoiser)	`VectorEstimator.mlpackage`	2.5e-03 ✓
Vocoder	`Vocoder.mlpackage`	3.0e-04 ✓
Text encoder	`TextEncoder.mlpackage`	mean 2.5e-04 (max 2.5e-2 at isolated positions)

Text/duration use fixed T=128 (relpos attention has T-dependent pad widths — pad/segment text to 128); vocoder + vector-estimator use a dynamic latent-length RangeDim. The host runs the flow-matching ODE loop (vector_estimator ×total_steps) — the graphs contain no control flow. Assets to drive them: tts.json, unicode_indexer.json (G2P-free tokenizer table), voice_styles/*.json.

FP32 = parity reference. For ANE residency, use the mixed-precision Supertonic-3-CoreML-FP16 — vocoder + duration FP16, text-encoder + vector-estimator FP32; measured transparent at 47–51 dB mag-STFT SNR.

Attribution & license

Weights: derivative of Supertone/supertonic-3 (commit 3cadd1ee6394adea1bd021217a0e650ede09a323), Supertone Inc., arXiv:2503.23108 — OpenRAIL-M (use-based restrictions carry over: no non-consensual impersonation/deepfakes, etc.).

Other Supertonic-3 formats

Supertonic-3 — CoreML (FP16) — mixed-precision ANE variant (47–51 dB).
Supertonic-3 — ONNX (INT8) — server / desktop (ONNX Runtime).
Supertonic-3 — LiteRT — Android / Qualcomm NPU (.tflite).

Ecosystem

soniqo.audio — website / use-case explorer (transcription, voice cloning, live ASR, voice agents).
speech-core — C++ orchestration library; Supertonic plugs in as a TTSInterface CoreML model.
speech-swift — Apple Silicon MLX + CoreML runtime.
speech-android — Android SDK consuming on-device LiteRT bundles.

Other CoreML models

Kokoro-82M — CoreML · VoxCPM2 — MLX · full CoreML Speech Models collection.

Downloads last month: 181

Model tree for aufklarer/Supertonic-3-CoreML

Base model

Supertone/supertonic-3

Quantized

(8)

this model

Collection including aufklarer/Supertonic-3-CoreML

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 28 items • Updated about 2 hours ago • 4

Paper for aufklarer/Supertonic-3-CoreML

SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System

Paper • 2503.23108 • Published Mar 29, 2025 • 1