AIST-87M GGUF
This repository contains GGUF quantizations of augmem/AIST-87M.
Current published artifact: matryoshka32-prefixrot-20260625. This refresh
keeps the canonical AIST-87M name because the architecture and parameter
scale are unchanged, but the released weights now include the shared 128d
orthogonal prefix rotation used to support native 64d and 32d Matryoshka
slices. The old and new artifacts are retrieval-equivalent at 128d+ on the
local held-out SALT gate, while the new artifact improves 64d and 32d.
Base model:
augmem/AIST-87M
Quantizations:
AIST-87M_q8_0.ggufAIST-87M_q5_1.gguf
The source model is a compact audio + image + speech + text embedding model for
human-memory augmentation workloads. It is the single-audio evolution of the
earlier dual-audio tower line and uses a merged native mn20_as EfficientAT
audio encoder with no separate runtime LoRA pass.
Matryoshka slices: [1280, 768, 512, 256, 128, 64, 32].
Exact loaded params: 87,186,755.
Evaluation Scope
The quantized files correspond to the same release checkpoint and human-memory evaluation slice as the base repo.
| Dim | Tasks | Text continuity | Image recall | Audio recall | Overall |
|---|---|---|---|---|---|
| 1280 | 8 / 8 | 0.763 | 0.425 | 0.104 | 0.349 |
| 768 | 8 / 8 | 0.762 | 0.424 | 0.104 | 0.349 |
| 512 | 8 / 8 | 0.762 | 0.424 | 0.104 | 0.349 |
Primary metrics are main_score for text continuity tasks and NDCG@10 for
image/audio retrieval tasks.
Matryoshka32 No-Regression Gate
Source: aist87m_salt_matryoshka32_summary_20260626.json.
The current artifact was compared against the previous published AIST-87M projection checkpoint on the local held-out SALT cached retrieval gate. Mean R@1 averages all six paired directions: audio-text, image-text, and image-audio in both directions.
| Dim | Previous mean R@1 | Current mean R@1 | Delta |
|---|---|---|---|
| 1280 | 0.193905 | 0.193905 | +0.000000 |
| 768 | 0.193072 | 0.193072 | +0.000000 |
| 512 | 0.193105 | 0.193105 | +0.000000 |
| 256 | 0.192472 | 0.192472 | +0.000000 |
| 128 | 0.190671 | 0.190671 | +0.000000 |
| 64 | 0.111089 | 0.142028 | +0.030940 |
| 32 | 0.056745 | 0.102354 | +0.045609 |
Runtime Footprint vs Dual-Audio Tower
The base AIST-87M release replaces the dual-audio tower's separate
EfficientAT + Whisper-Tiny branches with one merged native mn20_as
EfficientAT encoder.
| Runtime surface | AIST-87M | AIST-95M dual-audio tower | Delta |
|---|---|---|---|
| Loaded parameters | 87,186,755 | 95,315,959 | -8.5% |
| Safetensors artifact | 348.9 MB | 381.9 MB | -8.6% |
| Audio encoders | 1 | 2 | removes Whisper branch |
| Audio path parameters incl. projection | 32,193,126 | 40,390,311 | -20.3% |
| Audio projection input width | 1,280 | 2,304 | -44.4% |
Exact-gate tradeoff at 1280d against the same dual-audio local baseline:
| Slice | AIST-87M | AIST-95M dual-audio tower | Delta |
|---|---|---|---|
| Speech holdout audio-text R@1 avg | 0.724 | 0.582 | +0.142 |
| WavCaps FSD audio-text R@1 avg | 0.097 | 0.105 | -0.009 |
| SALT audio-text R@1 avg | 0.008 | 0.007 | flat |
| SALT image-audio R@1 avg | 0.138 | 0.148 | -0.010 |
Reference PyTorch audio-stack throughput for the base release was measured on an NVIDIA L4 with synthetic 10s 32 kHz CPU waveforms passed through waveform -> audio encoder -> projection -> normalized embedding. Median wall time is over 50 timed iterations after 20 warmup iterations. This excludes audio file decode, dataset download, and MTEB result serialization.
| Batch | AIST-87M median ms | AIST-87M throughput | AIST-95M median ms | AIST-95M throughput | Speedup |
|---|---|---|---|---|---|
| 1 | 5.36 | 186.7 clips/s; 1,867 audio-s/s | 10.50 | 95.2 clips/s; 952 audio-s/s | 1.96x |
| 8 | 16.46 | 486.0 clips/s; 4,860 audio-s/s | 60.29 | 132.7 clips/s; 1,327 audio-s/s | 3.66x |
| 16 | 41.19 | 388.5 clips/s; 3,885 audio-s/s | 133.95 | 119.4 clips/s; 1,194 audio-s/s | 3.25x |
The GGUF files are quantized distribution artifacts and were not separately
rebenchmarked in a GGUF runtime. Raw PyTorch benchmark output is included as
aist87m_vs_dual_audio_throughput_l4_20260504.json.
Files
| File | Purpose |
|---|---|
AIST-87M_q8_0.gguf |
Higher-accuracy GGUF |
AIST-87M_q5_1.gguf |
Smaller GGUF |
manifest.json |
Release manifest |
parameter_breakdown.json |
Exact parameter accounting |
aist87m_memory_slice_release_report.md |
Human-memory slice report |
aist87m_memory_slice_release_report.json |
Machine-readable evaluation summary |
aist87m_vs_dual_audio_throughput_l4_20260504.json |
Reference L4 throughput benchmark vs dual-audio tower |
aist87m_salt_published_retrieval_20260626.json |
Previous published AIST-87M SALT gate output |
aist87m_salt_matryoshka32_retrieval_20260626.json |
Current artifact SALT gate output |
aist87m_salt_matryoshka32_summary_20260626.json |
SALT no-regression summary |
Notes
- These are GGUF exports of the same merged-audio release artifact.
- This is not a generic MTEB/MIEB/MAEB leaderboard claim; the reported gate is selected for human-memory embedding workloads.
- Consumers should treat this hash as a new embedding-space artifact and regenerate caches rather than mixing embeddings from earlier AIST-87M hashes.
- Downloads last month
- 69
5-bit
8-bit
Model tree for augmem/AIST-87M-GGUF
Base model
augmem/AIST-87M