CoreML Speech Models
Collection
Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 18 items • Updated • 1
Real-time speech enhancement for Apple Silicon. Removes background noise from speech audio. Runs on Neural Engine via CoreML.
Measured on 30 VoiceBank-DEMAND test clips via Python CoreMLBackend
(replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter
post-processing intact).
| Variant | PESQ | STOI | SI-SDR | Size |
|---|---|---|---|---|
| PyTorch FP32 (reference) | 2.900 | 0.947 | 18.19 | — |
| CoreML FP16 | 2.901 | 0.947 | 18.19 | 4.2 MB |
| CoreML INT8 (this repo) | 2.907 | 0.947 | 18.11 | 2.2 MB |
INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR −0.07 dB, STOI identical) while cutting size by 48%.
| Duration | Time | RTF |
|---|---|---|
| 5 s | 0.65 s | 0.13 |
| 10 s | 1.2 s | 0.12 |
| 20 s | 4.8 s | 0.24 |
| File | Size | Description |
|---|---|---|
DeepFilterNet3.mlmodelc |
2.2 MB | Pre-compiled CoreML model (runs on Neural Engine) |
auxiliary.npz |
126 KB | ERB filterbank, Vorbis window, normalization states |
Add speech-swift to Package.swift:
.package(url: "https://github.com/soniqo/speech-swift", branch: "main")
Then denoise:
import SpeechEnhancement
let enhancer = try await SpeechEnhancer.fromPretrained()
let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
CLI:
swift run audio denoise noisy.wav --output clean.wav