PGSM Text Surprisal Editor Model
This repository contains the trained model weights used by the Hugging Face Space:
https://huggingface.co/spaces/build-small-hackathon/pgsm-text-surprisal-editor
Model Summary
PGSM Text Surprisal Editor is powered by a compact non-Transformer language model based on a custom ExactState Memory / PGSM architecture.
The model is used to score whole-word surprisal by evaluating how predictable each removed word is from its left and right context.
Architecture
- Architecture: PGSM / ExactState Memory
- Transformer blocks: 0
- Self-attention layers: 0
- Parameters: approximately 4 million
- Vocabulary: approximately 2k tokens
- Model file:
final_infer.pt
This model does not use Transformer self-attention. Context is propagated through learned state transitions rather than pairwise attention computations.
Training
The model was fully trained by the author on approximately 19 billion tokens from FineWeb-Edu.
Training details:
- Training source: FineWeb-Edu
- Training scale: approximately 19B tokens
- Training type: full custom training by the author
- Base architecture: PGSM / ExactState Memory
- Off-the-shelf Transformer checkpoint used: none
- Final inference weights:
final_infer.pt
Intended Use
This model is intended for the PGSM Text Surprisal Editor Space, where it powers whole-word surprisal heatmaps for pasted text.
The model is designed for experimentation, visualization, and language-analysis demos rather than production writing assistance or factual generation.
Limitations
- Very small model size compared with mainstream LLMs
- Compact vocabulary
- Designed for surprisal visualization, not general-purpose chat
- Outputs should be treated as model-analysis signals, not factual judgments
- Training and evaluation details are summarized here for hackathon review
Hackathon Context
This model supports the Hugging Face Build Small Hackathon submission:
- Track: Thousand Token Wood
- Badges: Tiny Titan, Well-Tuned, Off the Grid, Field Notes
The key goal is to demonstrate a very small, fully trained, non-Transformer language model running locally inside a Hugging Face Space.