Instructions to use aufklarer/Supertonic-3-CoreML with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use aufklarer/Supertonic-3-CoreML with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
SupertonicTTS-3 β CoreML (.mlpackage, iOS)
First-party CoreML export of Supertonic-3's four non-autoregressive flow-matching graphs, for
on-device iOS / Apple Neural Engine. Built by our own pipeline (speech-models/stmodels): weights
lifted from the Supertone/supertonic-3 ONNX
initializers β PyTorch nn.Module β coremltools (mlprogram, FP32, iOS18+).
Graphs & parity (FP32, vs ONNX Runtime)
| Module | mlpackage | parity max|Ξ| |
|---|---|---|
| Duration predictor | DurationPredictor.mlpackage |
7.2e-06 β |
| Vector estimator (ODE denoiser) | VectorEstimator.mlpackage |
2.5e-03 β |
| Vocoder | Vocoder.mlpackage |
3.0e-04 β |
| Text encoder | TextEncoder.mlpackage |
mean 2.5e-04 (max 2.5e-2 at isolated positions) |
Text/duration use fixed T=128 (relpos attention has T-dependent pad widths β pad/segment text to
128); vocoder + vector-estimator use a dynamic latent-length RangeDim. The host runs the flow-matching
ODE loop (vector_estimator Γtotal_steps) β the graphs contain no control flow. Assets to drive them:
tts.json, unicode_indexer.json (G2P-free tokenizer table), voice_styles/*.json.
FP32 = parity reference. For ANE residency, use the mixed-precision
Supertonic-3-CoreML-FP16β vocoder + duration FP16, text-encoder + vector-estimator FP32; measured transparent at 47β51 dB mag-STFT SNR.
Attribution & license
- Weights: derivative of
Supertone/supertonic-3(commit3cadd1ee6394adea1bd021217a0e650ede09a323), Supertone Inc., arXiv:2503.23108 β OpenRAIL-M (use-based restrictions carry over: no non-consensual impersonation/deepfakes, etc.).
Other Supertonic-3 formats
- Supertonic-3 β CoreML (FP16) β mixed-precision ANE variant (47β51 dB).
- Supertonic-3 β ONNX (INT8) β server / desktop (ONNX Runtime).
- Supertonic-3 β LiteRT β Android / Qualcomm NPU (.tflite).
Ecosystem
- soniqo.audio β website / use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library; Supertonic plugs in as a
TTSInterfaceCoreML model. - speech-swift β Apple Silicon MLX + CoreML runtime.
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other CoreML models
- Downloads last month
- 181
Model tree for aufklarer/Supertonic-3-CoreML
Base model
Supertone/supertonic-3