AIST-87M GGUF

This repository contains GGUF quantizations of augmem/AIST-87M.

Current published artifact: matryoshka32-prefixrot-20260625. This refresh keeps the canonical AIST-87M name because the architecture and parameter scale are unchanged, but the released weights now include the shared 128d orthogonal prefix rotation used to support native 64d and 32d Matryoshka slices. The old and new artifacts are retrieval-equivalent at 128d+ on the local held-out SALT gate, while the new artifact improves 64d and 32d.

Base model:

  • augmem/AIST-87M

Quantizations:

  • AIST-87M_q8_0.gguf
  • AIST-87M_q5_1.gguf

The source model is a compact audio + image + speech + text embedding model for human-memory augmentation workloads. It is the single-audio evolution of the earlier dual-audio tower line and uses a merged native mn20_as EfficientAT audio encoder with no separate runtime LoRA pass.

Matryoshka slices: [1280, 768, 512, 256, 128, 64, 32]. Exact loaded params: 87,186,755.

Evaluation Scope

The quantized files correspond to the same release checkpoint and human-memory evaluation slice as the base repo.

Dim Tasks Text continuity Image recall Audio recall Overall
1280 8 / 8 0.763 0.425 0.104 0.349
768 8 / 8 0.762 0.424 0.104 0.349
512 8 / 8 0.762 0.424 0.104 0.349

Primary metrics are main_score for text continuity tasks and NDCG@10 for image/audio retrieval tasks.

Matryoshka32 No-Regression Gate

Source: aist87m_salt_matryoshka32_summary_20260626.json.

The current artifact was compared against the previous published AIST-87M projection checkpoint on the local held-out SALT cached retrieval gate. Mean R@1 averages all six paired directions: audio-text, image-text, and image-audio in both directions.

Dim Previous mean R@1 Current mean R@1 Delta
1280 0.193905 0.193905 +0.000000
768 0.193072 0.193072 +0.000000
512 0.193105 0.193105 +0.000000
256 0.192472 0.192472 +0.000000
128 0.190671 0.190671 +0.000000
64 0.111089 0.142028 +0.030940
32 0.056745 0.102354 +0.045609

Runtime Footprint vs Dual-Audio Tower

The base AIST-87M release replaces the dual-audio tower's separate EfficientAT + Whisper-Tiny branches with one merged native mn20_as EfficientAT encoder.

Runtime surface AIST-87M AIST-95M dual-audio tower Delta
Loaded parameters 87,186,755 95,315,959 -8.5%
Safetensors artifact 348.9 MB 381.9 MB -8.6%
Audio encoders 1 2 removes Whisper branch
Audio path parameters incl. projection 32,193,126 40,390,311 -20.3%
Audio projection input width 1,280 2,304 -44.4%

Exact-gate tradeoff at 1280d against the same dual-audio local baseline:

Slice AIST-87M AIST-95M dual-audio tower Delta
Speech holdout audio-text R@1 avg 0.724 0.582 +0.142
WavCaps FSD audio-text R@1 avg 0.097 0.105 -0.009
SALT audio-text R@1 avg 0.008 0.007 flat
SALT image-audio R@1 avg 0.138 0.148 -0.010

Reference PyTorch audio-stack throughput for the base release was measured on an NVIDIA L4 with synthetic 10s 32 kHz CPU waveforms passed through waveform -> audio encoder -> projection -> normalized embedding. Median wall time is over 50 timed iterations after 20 warmup iterations. This excludes audio file decode, dataset download, and MTEB result serialization.

Batch AIST-87M median ms AIST-87M throughput AIST-95M median ms AIST-95M throughput Speedup
1 5.36 186.7 clips/s; 1,867 audio-s/s 10.50 95.2 clips/s; 952 audio-s/s 1.96x
8 16.46 486.0 clips/s; 4,860 audio-s/s 60.29 132.7 clips/s; 1,327 audio-s/s 3.66x
16 41.19 388.5 clips/s; 3,885 audio-s/s 133.95 119.4 clips/s; 1,194 audio-s/s 3.25x

The GGUF files are quantized distribution artifacts and were not separately rebenchmarked in a GGUF runtime. Raw PyTorch benchmark output is included as aist87m_vs_dual_audio_throughput_l4_20260504.json.

Files

File Purpose
AIST-87M_q8_0.gguf Higher-accuracy GGUF
AIST-87M_q5_1.gguf Smaller GGUF
manifest.json Release manifest
parameter_breakdown.json Exact parameter accounting
aist87m_memory_slice_release_report.md Human-memory slice report
aist87m_memory_slice_release_report.json Machine-readable evaluation summary
aist87m_vs_dual_audio_throughput_l4_20260504.json Reference L4 throughput benchmark vs dual-audio tower
aist87m_salt_published_retrieval_20260626.json Previous published AIST-87M SALT gate output
aist87m_salt_matryoshka32_retrieval_20260626.json Current artifact SALT gate output
aist87m_salt_matryoshka32_summary_20260626.json SALT no-regression summary

Notes

  • These are GGUF exports of the same merged-audio release artifact.
  • This is not a generic MTEB/MIEB/MAEB leaderboard claim; the reported gate is selected for human-memory embedding workloads.
  • Consumers should treat this hash as a new embedding-space artifact and regenerate caches rather than mixing embeddings from earlier AIST-87M hashes.
Downloads last month
69
GGUF
Model size
87.2M params
Architecture
triembed
Hardware compatibility
Log In to add your hardware

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for augmem/AIST-87M-GGUF

Base model

augmem/AIST-87M
Quantized
(1)
this model