Parcae: Scaling Laws For Stable Looped Language Models
Paper β’ 2604.12946 β’ Published β’ 6
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A clean reimplementation of the OpenMythos architecture in the HuggingFace ecosystem.
OpenMythos is a speculative reconstruction of "Claude Mythos" based on several cutting-edge papers. This repo provides a bug-free, HF-native reimplementation that:
tokens β Embedding β Prelude β [Recurrent Block Γ r] β Coda β LM Head β logits
Prelude: L_P standard transformer blocks (run once)
Recurrent Block: L_R transformer blocks with state injection, looped r times
Coda: L_C standard transformer blocks (run once)
| Component | Source Paper | Description |
|---|---|---|
| SandwichBlock | Huginn (arXiv:2502.05171) | Pre-norm Attn + Pre-norm MLP + Post-norm for recurrent stability |
| LTI Injection | Parcae (arXiv:2604.12946) | Stable state mixing with guaranteed Ο(Δ) < 1 |
| Linear Injection | Huginn | Concatenate state + input, project down |
| Multi-Latent Attention | DeepSeek-V2 (arXiv:2405.04434) | Compressed KV cache (87.5% reduction) |
| Sparse MoE | DeepSeek-V3 | Top-K routed + shared always-active experts |
| Truncated BPTT | Huginn Β§3.3 | Only backprop through last K recurrence steps |
| Poisson-Lognormal Sampling | Huginn Β§3.3 | Variable depth per training sample |
| Variant | Params | n_embd | Recurrence | Extensions |
|---|---|---|---|---|
mythos_tiny |
12M | 256 | ΞΌ=4 | LTI only |
mythos_140m |
140M | 768 | ΞΌ=8 | LTI only |
mythos_370m |
370M | 1024 | ΞΌ=8 | LTI only |
mythos_770m |
770M | 1280 | ΞΌ=8 | LTI only |
mythos_1b |
~1B | 1536 | ΞΌ=16 | LTI + MoE |
mythos_3b |
~3.5B | 2560 | ΞΌ=32 | LTI + MLA + MoE |
from open_mythos_hf import OpenMythosForCausalLM, mythos_tiny
# Create a model
config = mythos_tiny()
model = OpenMythosForCausalLM(config)
# Forward pass
import torch
input_ids = torch.randint(0, 32000, (1, 128))
output = model(input_ids=input_ids, labels=input_ids)
print(f"Loss: {output.loss.item():.4f}")
print(f"Recurrence steps: {output.num_steps}")
python train_mythos.py --variant 140m --max_steps 5000 --batch_size 8
| Bug | Original | Fix |
|---|---|---|
| Tokenizer β non-existent model | openai/gpt-oss-20b |
Use gpt2 (valid HF tokenizer) |
torch = "2.11.0" |
Non-existent version | Compatible with PyTorch 2.x |
mythos_7b missing |
Referenced but not defined | Removed from README |
_tied_weights_keys |
Wrong format for transformers 5.6+ | Dict format |
load_tokenizer / get_vocab_size |
Exported but undefined | Removed from __all__ |
| No tests | None | Full test suite |