Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents
Paper β’ 2604.20572 β’ Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A CPU-inspired hierarchical neural architecture where 3 Small LeWorld Models (SLMs) compete to find the most useful memory for 1 Big LeWorld Model (BLM) to predict the next world state.
| Component | Parameters | Role |
|---|---|---|
| Artificial Memory | 21K | Bit-level storage (64K words Γ 32 bits) + learned bit encoder/decoder |
| SLM-0 | 745K | State β memory address range |
| SLM-1 | 745K | State β memory address range |
| SLM-2 | 745K | State β memory address range |
| BLM | 11.2M | SLM selector [1,0,1] + next-state predictor + info requester |
| Total | 13.5M |
[1,0,1] in forward, differentiable in backward βββββββββββββββββββββββββββββββ
β ARTIFICIAL MEMORY β
β [0][1][0][1]...[1][0][1][0] β
β 64K words Γ 32 bits each β
ββββββββββββ¬ββββββββββββββββββ-ββ
β READ(addr_range)
βββββββββββββββββββββΌββββββββββββββββββββ
ββββββββΌβββββββ ββββββββββΌββββββββ ββββββββΌβββββββββββ
β SLM-0 β β SLM-1 β β SLM-2 β
β (745K) β β (745K) β β (745K) β
β past_state β β past_state β β past_state β
β curr_state β β curr_state β β curr_state β
β character. β β character. β β character. β
β β addr β β β addr β β β addr β
ββββββββ¬βββββββ ββββββββββ¬ββββββββ ββββββββββ¬βββββββββ
β β β
ββββββββββββΊ BLM (11.2M) ββββββββββββββββ
mask = [1, 0, 1]
β next_state prediction
β "what info do I need next?"
| File | Description |
|---|---|
leworld_architecture.py |
All model definitions: Memory, SLM, BLM, full system (~990 lines) |
leworld_training.py |
3-phase training pipeline, data generation, evaluation (~820 lines) |
PLAN.md |
Complete design document with literature references |
from leworld_architecture import LeWorldSystem, MemoryConfig, SLMConfig, BLMConfig
from leworld_training import run_training, TrainingConfig
# Build system
system = LeWorldSystem(MemoryConfig(), SLMConfig(), BLMConfig())
# Train (3 phases: pre-train β joint β refine)
metrics = run_training(system, TrainingConfig())
| Paper | What we borrowed |
|---|---|
| Gumbel-Softmax | Straight-Through sigmoid for binary routing |
| Switch Transformers | Gate-value scaling, load balance loss |
| Product Key Memory | Address decomposition into sub-keys |
| LM2 | LSTM-style memory gates |
| NAMM | Binary memory eviction |
| ProactAgent | Paired-branch reward for retrieval decisions |
| Mamba | Explicit state maintenance |
Phase 1: SLM loss 12.87 β 7.13, BLM loss 0.39 β 0.33
Phase 2: Routing becomes diverse β SLM usage: [0.72, 0.79, 0.67]
Phase 3: Info-request improves predictions by 19.5 loss units vs baseline
Final: MSE=0.36, Routing entropy=0.70
Per-step MSE: [0.64, 0.44, 0.31, 0.23, 0.19] β improves over time
Routing patterns: [1,0,1] β [0,1,1] β [1,1,1] β [1,1,0] β [0,1,0]