WriteSAE: Sparse Autoencoders for Recurrent State

A sparse autoencoder for the matrix updates that Gated DeltaNet, Mamba-2, and RWKV-7 write into their recurrent cache each token. WriteSAE atoms are rank-1 matrices with the same shape as the model's own write, so a single atom can replace one native write at one position. Companion checkpoints for the paper WriteSAE: Sparse Autoencoders for Recurrent State (arXiv:2605.12770).

Headline result

At a single Gated DeltaNet layer-head on Qwen3.5-0.8B, the WriteSAE atom yields a closer final token distribution than deleting the write on 92.4% of evaluated positions; averaged per atom, the rate is 89.8%. A closed-form expression in the forget gate, read query, and output embedding predicts the per-firing logit change at R²=0.98. The same replacement test transfers to Mamba-2-370M at 88.1%. In generation, writing the formula's chosen direction into three consecutive cache positions at 3× the norm of the model's write makes tokens initially ranked 100–1000 by the unmodified model appear in 100% of continuations, up from 33.3%. To our knowledge this is the first cache-level steering intervention in a state-space or hybrid recurrent layer.

Variants

variant encoder decoder
WriteSAE $v_i^\top S w_i$ $v_i w_i^\top$ (rank-1)
FlatSAE linear on vec($S$) flat
MatrixSAE linear on vec($S$) full-rank
BilinearSAE $v_i^\top S w_i$ bilinear

WriteSAE is the primary artifact and supports all main-text results.

Base models covered

  • Qwen3.5-0.8B (primary)
  • Qwen3.5-4B (scale replication)
  • Qwen3.5-27B (scale replication)
  • Cross-architecture: DeltaNet 1.3B, GLA 1.3B, Mamba-2 2.8B, RWKV-7

Quick start

from huggingface_hub import snapshot_download

ckpt_dir = snapshot_download("JackYoung27/writesae-ckpts", local_dir="ckpts")
# ckpts/manifest.json maps tags to SHA256 and metadata.

Load and run with the companion code:

git clone https://github.com/JackYoung27/writesae && cd matrix-sae
pip install -e .
python -m experiments.analysis.analyze \
  --sae_checkpoint ckpts/writesae/qwen3p5-0p8b/L9_H4/best.pt \
  --data_dir states --layer 9 --head 4 --output_dir out

Training details

  • Architecture: rank-1 decoder atoms $v_i w_i^\top$, bilinear encoder.
  • Dictionary size: 16384 features (configurable).
  • Sparsity: TopK activation; BatchTopK supported.
  • Training data: OpenWebText (Skylion007/openwebtext, streaming), tokenized with the Qwen3.5 tokenizer.
  • Training compute: ~180 H100-hours single-GPU total across variants (paper App. B.3).

Intended use

Interpretability research on matrix-recurrent and linear-attention model internals: decomposing register/bundle structure, validating cross-architecture transfer, and testing causal substitution experiments at the cache write site.

Out of scope

Production model editing, safety interventions without independent validation, or claims about individual atom identity. Atoms reproduce class-level structure; the basis is SAE-run specific (paper section 6).

Limitations

  • Single primary architecture (GatedDeltaNet); Mamba-2 and GLA are confirmed negative class.
  • Small-model primary (0.8B); 4B and 27B replications supplement but do not replace the main evidence base.
  • Mechanism claims are class-granular, not per-atom.

License

MIT.

Citation

@article{young2026writesae,
  title  = {WriteSAE: Sparse Autoencoders for Recurrent State},
  author = {Young, Jack},
  year   = {2026},
  journal= {arXiv preprint arXiv:2605.12770},
  url    = {https://github.com/JackYoung27/writesae}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JackYoung27/writesae-ckpts

Finetuned
(201)
this model

Paper for JackYoung27/writesae-ckpts