YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OpenMythos-HF β€” Recurrent Depth Transformer

A clean reimplementation of the OpenMythos architecture in the HuggingFace ecosystem.

What is this?

OpenMythos is a speculative reconstruction of "Claude Mythos" based on several cutting-edge papers. This repo provides a bug-free, HF-native reimplementation that:

  1. βœ… Actually runs (fixes tokenizer, dependency, and export bugs)
  2. βœ… Integrates with HF Trainer, Trackio, and FSDP
  3. βœ… Is fully tested (forward/backward pass, all components)
  4. βœ… Trains on real data (FineWeb-Edu)

Architecture

tokens β†’ Embedding β†’ Prelude β†’ [Recurrent Block Γ— r] β†’ Coda β†’ LM Head β†’ logits

Prelude: L_P standard transformer blocks (run once)
Recurrent Block: L_R transformer blocks with state injection, looped r times
Coda: L_C standard transformer blocks (run once)

Key Components

Component Source Paper Description
SandwichBlock Huginn (arXiv:2502.05171) Pre-norm Attn + Pre-norm MLP + Post-norm for recurrent stability
LTI Injection Parcae (arXiv:2604.12946) Stable state mixing with guaranteed ρ(Δ€) < 1
Linear Injection Huginn Concatenate state + input, project down
Multi-Latent Attention DeepSeek-V2 (arXiv:2405.04434) Compressed KV cache (87.5% reduction)
Sparse MoE DeepSeek-V3 Top-K routed + shared always-active experts
Truncated BPTT Huginn Β§3.3 Only backprop through last K recurrence steps
Poisson-Lognormal Sampling Huginn Β§3.3 Variable depth per training sample

Model Variants

Variant Params n_embd Recurrence Extensions
mythos_tiny 12M 256 ΞΌ=4 LTI only
mythos_140m 140M 768 ΞΌ=8 LTI only
mythos_370m 370M 1024 ΞΌ=8 LTI only
mythos_770m 770M 1280 ΞΌ=8 LTI only
mythos_1b ~1B 1536 ΞΌ=16 LTI + MoE
mythos_3b ~3.5B 2560 ΞΌ=32 LTI + MLA + MoE

Quick Start

from open_mythos_hf import OpenMythosForCausalLM, mythos_tiny

# Create a model
config = mythos_tiny()
model = OpenMythosForCausalLM(config)

# Forward pass
import torch
input_ids = torch.randint(0, 32000, (1, 128))
output = model(input_ids=input_ids, labels=input_ids)
print(f"Loss: {output.loss.item():.4f}")
print(f"Recurrence steps: {output.num_steps}")

Training

python train_mythos.py --variant 140m --max_steps 5000 --batch_size 8

What We Fixed vs Original OpenMythos

Bug Original Fix
Tokenizer β†’ non-existent model openai/gpt-oss-20b Use gpt2 (valid HF tokenizer)
torch = "2.11.0" Non-existent version Compatible with PyTorch 2.x
mythos_7b missing Referenced but not defined Removed from README
_tied_weights_keys Wrong format for transformers 5.6+ Dict format
load_tokenizer / get_vocab_size Exported but undefined Removed from __all__
No tests None Full test suite

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for maidacundo/open-mythos-hf