Cocktail-Fork-MRX (MLX)

Apple MLX port of MERL's MRX (Multi-Resolution CrossNet) — separates a soundtrack mixture into three stems: music, speech, and sound effects (sfx). Runs natively on Apple Silicon, no PyTorch at inference.

Upstream: merlresearch/cocktail-fork-separation — The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks (ICASSP 2022).
Checkpoint: default_ (SNR-loss trained — the upstream default inference weights).
Variants: -paper (SI-SNR, ICASSP reproduction) · -adapted-loudness · -adapted-eq (cinematic-tuned for real movie stems).
Collection: Cocktail-Fork MRX (MLX).
License: MIT.
Parity: numerically exact vs the PyTorch reference (full-pipeline max_abs ≈ 9e-8; per-stem SI-SDR 107–139 dB vs torch).

Usage

pip install cocktail-fork-mlx   # or: pip install git+https://github.com/xocialize/cocktail-fork-mlx
cocktail-fork-mlx --audio-path soundtrack.wav --out-dir ./out
# -> out/music.wav  out/speech.wav  out/sfx.wav

import mlx.core as mx, soundfile as sf, numpy as np
from cocktail_fork_mlx.separate import separate_soundtrack
from cocktail_fork_mlx.weights import from_pretrained

audio, fs = sf.read("soundtrack.wav", always_2d=True)   # 44.1 kHz
model = from_pretrained("mlx-community/Cocktail-Fork-MRX")
stems = separate_soundtrack(mx.array(audio.T.astype("float32")), model)
for name, x in stems.items():
    sf.write(f"{name}.wav", np.array(x).T, 44100)

Model

44.1 kHz, any channel count. ~30.6M params, fp32 (122 MB).
Multi-resolution STFT (windows 1024/2048/8192, hop 256) → per-resolution magnitude encoders → 3 parallel bidirectional CrossNet LSTMs → per-source/per-resolution mask decoders → masked iSTFT summed across resolutions.
CPU is the faster device for this LSTM-bound model (default in the CLI).

Ported by MVS Collective (xocialize). MIT, © MERL for the original model/weights.

Downloads last month: 43

Safetensors

Model size

30.6M params

Tensor type

F32

MLX

Hardware compatibility

Quantized

Collection including mlx-community/Cocktail-Fork-MRX

Cocktail-Fork MRX (MLX)

Collection

MERL MRX ported to Apple MLX — 3-stem music/speech/sfx soundtrack separation. Numerically exact vs PyTorch. 4 variants. • 4 items • Updated 20 days ago • 1