OpenMythos-HF — Recurrent Depth Transformer

A clean reimplementation of the OpenMythos architecture in the HuggingFace ecosystem.

What is this?

OpenMythos is a speculative reconstruction of "Claude Mythos" based on several cutting-edge papers. This repo provides a bug-free, HF-native reimplementation that:

✅ Actually runs (fixes tokenizer, dependency, and export bugs)
✅ Integrates with HF Trainer, Trackio, and FSDP
✅ Is fully tested (forward/backward pass, all components)
✅ Trains on real data (FineWeb-Edu)

Architecture

tokens → Embedding → Prelude → [Recurrent Block × r] → Coda → LM Head → logits

Prelude: L_P standard transformer blocks (run once)
Recurrent Block: L_R transformer blocks with state injection, looped r times
Coda: L_C standard transformer blocks (run once)

Key Components

Component	Source Paper	Description
SandwichBlock	Huginn (arXiv:2502.05171)	Pre-norm Attn + Pre-norm MLP + Post-norm for recurrent stability
LTI Injection	Parcae (arXiv:2604.12946)	Stable state mixing with guaranteed ρ(Ā) < 1
Linear Injection	Huginn	Concatenate state + input, project down
Multi-Latent Attention	DeepSeek-V2 (arXiv:2405.04434)	Compressed KV cache (87.5% reduction)
Sparse MoE	DeepSeek-V3	Top-K routed + shared always-active experts
Truncated BPTT	Huginn §3.3	Only backprop through last K recurrence steps
Poisson-Lognormal Sampling	Huginn §3.3	Variable depth per training sample

Model Variants

Variant	Params	n_embd	Recurrence	Extensions
`mythos_tiny`	12M	256	μ=4	LTI only
`mythos_140m`	140M	768	μ=8	LTI only
`mythos_370m`	370M	1024	μ=8	LTI only
`mythos_770m`	770M	1280	μ=8	LTI only
`mythos_1b`	~1B	1536	μ=16	LTI + MoE
`mythos_3b`	~3.5B	2560	μ=32	LTI + MLA + MoE

Quick Start

from open_mythos_hf import OpenMythosForCausalLM, mythos_tiny

# Create a model
config = mythos_tiny()
model = OpenMythosForCausalLM(config)

# Forward pass
import torch
input_ids = torch.randint(0, 32000, (1, 128))
output = model(input_ids=input_ids, labels=input_ids)
print(f"Loss: {output.loss.item():.4f}")
print(f"Recurrence steps: {output.num_steps}")

Training

python train_mythos.py --variant 140m --max_steps 5000 --batch_size 8

What We Fixed vs Original OpenMythos

Bug	Original	Fix
Tokenizer → non-existent model	`openai/gpt-oss-20b`	Use `gpt2` (valid HF tokenizer)
`torch = "2.11.0"`	Non-existent version	Compatible with PyTorch 2.x
`mythos_7b` missing	Referenced but not defined	Removed from README
`_tied_weights_keys`	Wrong format for transformers 5.6+	Dict format
`load_tokenizer` / `get_vocab_size`	Exported but undefined	Removed from `__all__`
No tests	None	Full test suite

References

Huginn/Raven — Recurrent Depth Transformer
Parcae — LTI-stable injection
DeepSeek-V2 — Multi-Latent Attention
ACT — Adaptive Computation Time
OpenMythos (original) — Original speculative implementation

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for maidacundo/open-mythos-hf

Parcae: Scaling Laws For Stable Looped Language Models

Paper • 2604.12946 • Published 19 days ago • 6

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 155

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 25

Adaptive Computation Time for Recurrent Neural Networks

Paper • 1603.08983 • Published Mar 29, 2016