LFM2.5-Embedding-350M — MLX (8-bit)

MLX build of LiquidAI/LFM2.5-Embedding-350M, a multilingual dense bi-encoder (1024-dim CLS embedding, cosine similarity), for local inference on Apple Silicon with MLX.

All weights, architecture, and behavior are LiquidAI's. This repository changes the file format (PyTorch/safetensors → MLX) and post-training quantized to 8-bit (affine, group size 64) from the bf16 MLX conversion. Every Linear and embedding layer is quantized; the non-quantized layers (conv, norms) stay bf16. See the original model card for training details and intended use.

Quantization details

  • Quantized with mlx.nn.quantize(mode='affine', bits=8, group_size=64) — the exact configuration benchmarked below.
  • Verified bit-exact: the reloaded checkpoint's encodings are identical (max abs diff 0) to the in-memory-quantized model used for the benchmark (verify_export.py) — the shipped artifact is the model measured below.
  • Reload applies the quant from config.json["quantization"] before loading weights (see retrieval.load_model).

Evaluation

Retrieval quality of this checkpoint (and its sibling precisions), measured as NDCG@10 / Recall@10 on judged pools. Retention = metric ÷ bf16 metric, averaged per-dataset.

Setup. English = the four NanoBEIR sets (full small corpora, ~2–5k passages, 50 queries each). Multilingual = MIRACL dev (the real queries and relevance judgments) for Spanish, German, Japanese, Arabic, each scored over a reduced pool of ~6k passages (judged positives + hard-mined negatives + sampled distractors, from mteb/MIRACLRetrievalHardNegatives), 100 queries each. Reduced pools make absolute scores easier than full-corpus MIRACL and not leaderboard-comparable — but every precision searches the identical pool, so the retention numbers (the point of this table) are sound. ColBERT uses brute-force MaxSim with no query augmentation, so its absolute scores sit a touch below a full PLAID setup.

Summary (mean over 8 datasets)

precision NDCG@10 NDCG retention Recall@10 Recall retention size
bf16 0.728 100.0% 0.775 100.0% 709 MB
8-bit 0.729 100.1% 0.775 100.0% 377 MB
4-bit 0.730 100.0% 0.766 98.6% 200 MB
mxfp4 0.725 99.8% 0.764 98.4%

NDCG@10 by dataset

dataset bf16 8-bit 4-bit mxfp4
NanoNQ · en 0.704 0.704 0.703 0.703
NanoFiQA2018 · en 0.504 0.511 0.502 0.498
NanoSciFact · en 0.716 0.717 0.714 0.712
NanoNFCorpus · en 0.342 0.340 0.335 0.345
MIRACL · es 0.891 0.892 0.895 0.893
MIRACL · de 0.809 0.810 0.819 0.812
MIRACL · ja 0.929 0.928 0.940 0.922
MIRACL · ar 0.926 0.926 0.928 0.916

License & attribution

Redistributed under the LFM Open License v1.0 (LICENSE) — the same license as the original model. Per Section 4, this notice records that the files were modified (format conversion to MLX + 8-bit quantization). The original work is by Liquid AI; this repository is an independent conversion, not affiliated with or endorsed by Liquid AI. The license includes a commercial-use threshold (Section 5) — review it for your use case.

Base model: LiquidAI/LFM2.5-Embedding-350M

Downloads last month
-
Safetensors
Model size
99.7M params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/LFM2.5-Embedding-350M-8bit

Finetuned
(11)
this model