Kokoro-82M CoreML
On-device CoreML export of Kokoro-82M for iOS TTS. Runs locally on iPhone β no network, no API calls.
This repo hosts a two-model split (acoustic + decoder) that sidesteps the limitations of converting the full Kokoro pipeline directly.
Files
| File | Size | Purpose |
|---|---|---|
kokoro_acoustic.mlpackage |
~107 MB | Phoneme β acoustic features |
kokoro_decoder.mlpackage |
~203 MB | Acoustic features β waveform |
Validation
Against the original unpatched PyTorch output on the same test phrases:
- Waveform correlation: 0.9826
- SNR: 15 dB
The output is perceptually identical to the PyTorch reference.
Usage
Download
pip install huggingface_hub
hf download maxpar1/kokoro-82m-coreml --local-dir ./models
In an iOS app
Drop the two .mlpackage files into your Xcode project as folder references, then load with CoreML:
import CoreML
let acousticURL = Bundle.main.url(forResource: "kokoro_acoustic", withExtension: "mlpackage")!
let compiled = try await MLModel.compileModel(at: acousticURL)
let model = try MLModel(contentsOf: compiled)
A complete reference implementation β including phoneme generation via espeak-ng, text frontend, and TTS service wiring β is in the voice-assistant repo.
How this was converted
Converting Kokoro-82M to CoreML required solving several problems that aren't in the standard TTS export path:
- Bidirectional LSTM splitting β
pack_padded_sequenceis untraceable bytorch.jit.trace, so bidirectional LSTMs were split into forward/backward unidirectional passes. - iSTFT reconstruction β the original iSTFT implementation was missing one-sided spectrum doubling and COLA normalization.
- Dynamic interpolation β
F.interpolatewith dynamic sizes fails coremltools export. Replaced with manualarange+gather+lerp. - Deterministic noise β
torch.randnis non-deterministic at export. Replaced with Box-Muller Gaussian approximation from a seed. - Two-model split β exporting as a single unified model exceeded coremltools' memory budget during conversion; splitting into acoustic + decoder stages avoids this.
Full debugging diary and conversion scripts: voice-assistant/scripts.
Target platform
- iOS 26.0+
- Apple Silicon (A14+ Bionic chip or newer) β M-series Macs also work
Tested on iPhone 15 Pro and M4 Max Mac Studio.
License
Apache 2.0, matching the upstream Kokoro-82M license.
Credits
- Upstream model: hexgrad/Kokoro-82M
- Converted and packaged by Maxwell Parsons
- Downloads last month
- 5