Kokoro-82M CoreML

On-device CoreML export of Kokoro-82M for iOS TTS. Runs locally on iPhone β€” no network, no API calls.

This repo hosts a two-model split (acoustic + decoder) that sidesteps the limitations of converting the full Kokoro pipeline directly.

Files

File Size Purpose
kokoro_acoustic.mlpackage ~107 MB Phoneme β†’ acoustic features
kokoro_decoder.mlpackage ~203 MB Acoustic features β†’ waveform

Validation

Against the original unpatched PyTorch output on the same test phrases:

  • Waveform correlation: 0.9826
  • SNR: 15 dB

The output is perceptually identical to the PyTorch reference.

Usage

Download

pip install huggingface_hub
hf download maxpar1/kokoro-82m-coreml --local-dir ./models

In an iOS app

Drop the two .mlpackage files into your Xcode project as folder references, then load with CoreML:

import CoreML

let acousticURL = Bundle.main.url(forResource: "kokoro_acoustic", withExtension: "mlpackage")!
let compiled = try await MLModel.compileModel(at: acousticURL)
let model = try MLModel(contentsOf: compiled)

A complete reference implementation β€” including phoneme generation via espeak-ng, text frontend, and TTS service wiring β€” is in the voice-assistant repo.

How this was converted

Converting Kokoro-82M to CoreML required solving several problems that aren't in the standard TTS export path:

  • Bidirectional LSTM splitting β€” pack_padded_sequence is untraceable by torch.jit.trace, so bidirectional LSTMs were split into forward/backward unidirectional passes.
  • iSTFT reconstruction β€” the original iSTFT implementation was missing one-sided spectrum doubling and COLA normalization.
  • Dynamic interpolation β€” F.interpolate with dynamic sizes fails coremltools export. Replaced with manual arange + gather + lerp.
  • Deterministic noise β€” torch.randn is non-deterministic at export. Replaced with Box-Muller Gaussian approximation from a seed.
  • Two-model split β€” exporting as a single unified model exceeded coremltools' memory budget during conversion; splitting into acoustic + decoder stages avoids this.

Full debugging diary and conversion scripts: voice-assistant/scripts.

Target platform

  • iOS 26.0+
  • Apple Silicon (A14+ Bionic chip or newer) β€” M-series Macs also work

Tested on iPhone 15 Pro and M4 Max Mac Studio.

License

Apache 2.0, matching the upstream Kokoro-82M license.

Credits

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for maxpar1/kokoro-82m-coreml

Quantized
(44)
this model