Kokoro-82M CoreML

On-device CoreML export of Kokoro-82M for iOS TTS. Runs locally on iPhone — no network, no API calls.

This repo hosts a two-model split (acoustic + decoder) that sidesteps the limitations of converting the full Kokoro pipeline directly.

Files

File	Size	Purpose
`kokoro_acoustic.mlpackage`	~107 MB	Phoneme → acoustic features
`kokoro_decoder.mlpackage`	~203 MB	Acoustic features → waveform

Validation

Against the original unpatched PyTorch output on the same test phrases:

Waveform correlation: 0.9826
SNR: 15 dB

The output is perceptually identical to the PyTorch reference.

Usage

Download

pip install huggingface_hub
hf download maxpar1/kokoro-82m-coreml --local-dir ./models

In an iOS app

Drop the two .mlpackage files into your Xcode project as folder references, then load with CoreML:

import CoreML

let acousticURL = Bundle.main.url(forResource: "kokoro_acoustic", withExtension: "mlpackage")!
let compiled = try await MLModel.compileModel(at: acousticURL)
let model = try MLModel(contentsOf: compiled)

A complete reference implementation — including phoneme generation via espeak-ng, text frontend, and TTS service wiring — is in the voice-assistant repo.

How this was converted

Converting Kokoro-82M to CoreML required solving several problems that aren't in the standard TTS export path:

Bidirectional LSTM splitting — pack_padded_sequence is untraceable by torch.jit.trace, so bidirectional LSTMs were split into forward/backward unidirectional passes.
iSTFT reconstruction — the original iSTFT implementation was missing one-sided spectrum doubling and COLA normalization.
Dynamic interpolation — F.interpolate with dynamic sizes fails coremltools export. Replaced with manual arange + gather + lerp.
Deterministic noise — torch.randn is non-deterministic at export. Replaced with Box-Muller Gaussian approximation from a seed.
Two-model split — exporting as a single unified model exceeded coremltools' memory budget during conversion; splitting into acoustic + decoder stages avoids this.

Full debugging diary and conversion scripts: voice-assistant/scripts.

Target platform

iOS 26.0+
Apple Silicon (A14+ Bionic chip or newer) — M-series Macs also work

Tested on iPhone 15 Pro and M4 Max Mac Studio.

License

Apache 2.0, matching the upstream Kokoro-82M license.

Credits

Upstream model: hexgrad/Kokoro-82M
Converted and packaged by Maxwell Parsons

Downloads last month: 5

Model tree for maxpar1/kokoro-82m-coreml

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Quantized

(44)

this model