AIST-87M GGUF

This repository contains GGUF quantizations of augmem/AIST-87M.

Current published artifact: matryoshka32-prefixrot-20260625. This refresh keeps the canonical AIST-87M name because the architecture and parameter scale are unchanged, but the released weights now include the shared 128d orthogonal prefix rotation used to support native 64d and 32d Matryoshka slices. The old and new artifacts are retrieval-equivalent at 128d+ on the local held-out SALT gate, while the new artifact improves 64d and 32d.

Base model:

augmem/AIST-87M

Quantizations:

AIST-87M_q8_0.gguf
AIST-87M_q5_1.gguf

The source model is a compact audio + image + speech + text embedding model for human-memory augmentation workloads. It is the single-audio evolution of the earlier dual-audio tower line and uses a merged native mn20_as EfficientAT audio encoder with no separate runtime LoRA pass.

Matryoshka slices: [1280, 768, 512, 256, 128, 64, 32]. Exact loaded params: 87,186,755.

Evaluation Scope

The quantized files correspond to the same release checkpoint and human-memory evaluation slice as the base repo.

Dim	Tasks	Text continuity	Image recall	Audio recall	Overall
1280	8 / 8	0.763	0.425	0.104	0.349
768	8 / 8	0.762	0.424	0.104	0.349
512	8 / 8	0.762	0.424	0.104	0.349

Primary metrics are main_score for text continuity tasks and NDCG@10 for image/audio retrieval tasks.

Matryoshka32 No-Regression Gate

Source: aist87m_salt_matryoshka32_summary_20260626.json.

The current artifact was compared against the previous published AIST-87M projection checkpoint on the local held-out SALT cached retrieval gate. Mean R@1 averages all six paired directions: audio-text, image-text, and image-audio in both directions.

Dim	Previous mean R@1	Current mean R@1	Delta
1280	0.193905	0.193905	+0.000000
768	0.193072	0.193072	+0.000000
512	0.193105	0.193105	+0.000000
256	0.192472	0.192472	+0.000000
128	0.190671	0.190671	+0.000000
64	0.111089	0.142028	+0.030940
32	0.056745	0.102354	+0.045609

Runtime Footprint vs Dual-Audio Tower

The base AIST-87M release replaces the dual-audio tower's separate EfficientAT + Whisper-Tiny branches with one merged native mn20_as EfficientAT encoder.

Runtime surface	AIST-87M	AIST-95M dual-audio tower	Delta
Loaded parameters	87,186,755	95,315,959	-8.5%
Safetensors artifact	348.9 MB	381.9 MB	-8.6%
Audio encoders	1	2	removes Whisper branch
Audio path parameters incl. projection	32,193,126	40,390,311	-20.3%
Audio projection input width	1,280	2,304	-44.4%

Exact-gate tradeoff at 1280d against the same dual-audio local baseline:

Slice	AIST-87M	AIST-95M dual-audio tower	Delta
Speech holdout audio-text R@1 avg	0.724	0.582	+0.142
WavCaps FSD audio-text R@1 avg	0.097	0.105	-0.009
SALT audio-text R@1 avg	0.008	0.007	flat
SALT image-audio R@1 avg	0.138	0.148	-0.010

Reference PyTorch audio-stack throughput for the base release was measured on an NVIDIA L4 with synthetic 10s 32 kHz CPU waveforms passed through waveform -> audio encoder -> projection -> normalized embedding. Median wall time is over 50 timed iterations after 20 warmup iterations. This excludes audio file decode, dataset download, and MTEB result serialization.

Batch	AIST-87M median ms	AIST-87M throughput	AIST-95M median ms	AIST-95M throughput	Speedup
1	5.36	186.7 clips/s; 1,867 audio-s/s	10.50	95.2 clips/s; 952 audio-s/s	1.96x
8	16.46	486.0 clips/s; 4,860 audio-s/s	60.29	132.7 clips/s; 1,327 audio-s/s	3.66x
16	41.19	388.5 clips/s; 3,885 audio-s/s	133.95	119.4 clips/s; 1,194 audio-s/s	3.25x

The GGUF files are quantized distribution artifacts and were not separately rebenchmarked in a GGUF runtime. Raw PyTorch benchmark output is included as aist87m_vs_dual_audio_throughput_l4_20260504.json.

Files

File	Purpose
`AIST-87M_q8_0.gguf`	Higher-accuracy GGUF
`AIST-87M_q5_1.gguf`	Smaller GGUF
`manifest.json`	Release manifest
`parameter_breakdown.json`	Exact parameter accounting
`aist87m_memory_slice_release_report.md`	Human-memory slice report
`aist87m_memory_slice_release_report.json`	Machine-readable evaluation summary
`aist87m_vs_dual_audio_throughput_l4_20260504.json`	Reference L4 throughput benchmark vs dual-audio tower
`aist87m_salt_published_retrieval_20260626.json`	Previous published AIST-87M SALT gate output
`aist87m_salt_matryoshka32_retrieval_20260626.json`	Current artifact SALT gate output
`aist87m_salt_matryoshka32_summary_20260626.json`	SALT no-regression summary

Notes

These are GGUF exports of the same merged-audio release artifact.
This is not a generic MTEB/MIEB/MAEB leaderboard claim; the reported gate is selected for human-memory embedding workloads.
Consumers should treat this hash as a new embedding-space artifact and regenerate caches rather than mixing embeddings from earlier AIST-87M hashes.

Downloads last month: 69

GGUF

Model size

87.2M params

Architecture

triembed

Hardware compatibility

5-bit

8-bit

Model tree for augmem/AIST-87M-GGUF

Base model

augmem/AIST-87M

Quantized

(1)

this model