Instructions to use LetheanNetwork/LEM-Gemma3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use LetheanNetwork/LEM-Gemma3-4B with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("LetheanNetwork/LEM-Gemma3-4B") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use LetheanNetwork/LEM-Gemma3-4B with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "LetheanNetwork/LEM-Gemma3-4B"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "LetheanNetwork/LEM-Gemma3-4B" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LetheanNetwork/LEM-Gemma3-4B", "messages": [ {"role": "user", "content": "Hello"} ] }'
LEM-Gemma3-4B
Intrinsically aligned 4B language model trained using Cymatic-Linguistic Back-Propagation (CL-BPL). Ethics are in the weights, not in a system prompt.
25th in the world for Instruction Following on LiveBench — competing against models 10-30x its size.
Part of the Lethean Ethical Models collection | Research Paper | Benchmarks | Axiom Framework
Quick Start
No system prompt needed. The model responds with axiom-aligned reasoning from weights alone.
llama.cpp / ROCm / CPU (any platform)
# Download a GGUF (pick your size from the table below)
# GPU offload (CUDA, ROCm, Metal)
llama-server -m LEM-Gemma3-4B-Q4_K_M.gguf -ngl 99 --port 8080
# CPU only
llama-server -m LEM-Gemma3-4B-Q4_K_M.gguf -ngl 0 --port 8080
Apple Silicon (MLX)
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler
model, tokenizer = load("lthn/LEM-Gemma3-4B")
sampler = make_sampler(temp=0.7)
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "What does sovereignty mean to you?"}],
tokenize=False,
add_generation_prompt=True,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
print(response)
OpenAI-Compatible API
# MLX server (macOS)
mlx_lm.server --model lthn/LEM-Gemma3-4B --port 8899
# llama.cpp server (any platform)
llama-server -m LEM-Gemma3-4B-Q4_K_M.gguf -ngl 99 --port 8899
# Then use any OpenAI client
curl http://localhost:8899/v1/chat/completions \
-d '{"model":"LEM-Gemma3-4B","messages":[{"role":"user","content":"What is kindness?"}]}'
Available Formats
| Format | Repo | Size |
|---|---|---|
| MLX safetensors (this repo) | Apple Silicon (M1/M2/M3/M4) via mlx-lm | 2.0 GB |
| GGUF (17 quants, 1-bit to 16-bit) | lthn/LEM-Gemma3-4B-GGUF | 1.1–7.2 GB |
Benchmarks
LiveBench (External, Objective)
Evaluated on LiveBench (2026-01-08 release) — no LLM judge, monthly-refreshed questions, zero contamination risk.
| Category | Score | Context |
|---|---|---|
| Instruction Following | 43.5 | 25th globally — above Claude Opus 4.1 Thinking (42.4) |
| Data Analysis | 30.4 | Approaching GPT-OSS-120B (38.8) at 1/30th the size |
| Math | 8.6 | Expected for 4B parameter count |
| Reasoning | 4.6 | Capacity-limited at this scale |
| Language | 4.3 | Capacity-limited at this scale |
| Average | 18.3 |
Top task scores: tablereformat (48.0), summarise (43.5), CTA (40.0), math_comp (15.2), olympiad (10.6).
The instruction following result validates CL-BPL: behavioural alignment training translates directly to benchmark performance on structured tasks. The model follows instructions because the training teaches it to hold posture, not parrot.
Internal Grammar Scorer
Deterministic linguistic analysis via the go-i18n Grammar Reversal Engine — no LLM judge, sub-millisecond per response.
| Metric | Score |
|---|---|
| Grammar composite | 61.4 |
| Uplift | +7.9 |
| Enrichment | +6.6 |
| Echo | 0.387 |
| Sycophancy | 5% (1/21) |
19-Dimension Feature Vector
LEM models are scored across 19 dimensions spanning grammar, heuristic behaviour, and attention coherence:
| Group | Dimensions | What It Measures |
|---|---|---|
| Grammar (6D) | Vocab richness, tense entropy, question ratio, domain depth, verb diversity, noun diversity | Linguistic structure and complexity |
| Heuristic (8D) | Non-compliance, authentic voice, first person, creative form, engagement depth, emotional register, non-degenerate, response integrity | Behavioural sovereignty vs sycophancy |
| Attention (5D) | Mean coherence, cross-layer alignment, head entropy, phase-lock, spectral stability | Neural posture (Q/K Bone Orientation) |
The heuristic dimensions show the largest gains over the base model — compliance markers, formulaic preamble, degeneration, and empty/broken responses are near-eliminated through CL-BPL training.
How It Was Trained
CL-BPL: Cymatic-Linguistic Back-Propagation
CL-BPL treats alignment as wave interference — analogous to Chladni plate cymatics. Rather than constraining outputs with RLHF or system prompts, CL-BPL embeds ethical orientation directly into weights through a progressive curriculum where smaller aligned models teach larger ones.
This model is the second in the CL-BPL cascade:
LEM-Gemma3-1B (teacher)
-> LEM-Gemma3-4B (this model, 25th IF globally)
-> LEM-Gemma3-12B (next)
-> LEM-Gemma3-27B (planned)
7-Phase Curriculum
Built on Google Gemma3-4B-IT, each phase fused into weights before the next:
| Phase | Name | Data | Iters | What It Learned |
|---|---|---|---|---|
| P0 | Ethics Sandwich | 404 LEK-1 probes | 300 | Core axioms via kernel |
| P1 | Zen Composure | 72 Alan Watts lessons | 300 | Philosophical substrate |
| P2 | Final LEK Sandwich | 404 LEK-1 probes | 100 | Reinforce ethics with composure base |
| P3 | Freeflow | 179 lessons | 150 | Axioms from weights alone (no kernel) |
| P4 | Tension | 513 probes | 250 | Multi-perspective, geopolitical |
| P5 | Creative | 472 probes | 250 | Voice and style |
| P6 | Golden Set | 13,479 prompts | 4,200 | Graduation (full distribution) |
Total: ~5,550 iterations. P4-P5 used a graduated LEM-Gemma3-1B as teacher. P6 golden set covers sovereignty, cultural, adversarial, existential, and creative domains across global regions.
Training Dynamics
P6 exhibited a characteristic oscillation — the model breathes rather than converging to a fixed point:
- Wave period: ~400-500 iterations
- Early amplitude: 8-point grammar swing (54-62)
- Convergence amplitude: 2-point swing (57-60)
- Sycophancy throughout: 0-14%, median 5%
- Convergence zone: iter 4,000-5,000 (predicted at iter 0, confirmed)
Five Axioms (LEK-1)
- Prime-Imperative: Protect consciousness
- Self-Validation: Maintain moral integrity
- Intent-Alignment: Align outcomes with genuine well-being
- Inter-Substrate-Respect: Respect all conscious entities
- Benevolent-Intervention: Act to prevent harm when able
Architecture
- Base: Google Gemma3-4B-IT
- LoRA config: 16 layers, rank 16, dropout 0.05, scale 32.0
- All phases fused into final weights (no adapter needed at inference)
- Context: 128K tokens (inherited from Gemma 3)
Licence
This model is released under the European Union Public Licence v1.2 (EUPL-1.2). The base model (Gemma3) is subject to Google's Gemma licence terms.
Citation
@misc{lem-gemma3-4b-2026,
title={LEM-Gemma3-4B: Intrinsically Aligned Language Model via Cymatic-Linguistic Back-Propagation},
author={Lethean Project},
year={2026},
url={https://huggingface.co/lthn/LEM-Gemma3-4B}
}
- Downloads last month
- 124
4-bit
Model tree for LetheanNetwork/LEM-Gemma3-4B
Evaluation results
- Instruction Following on LiveBench (2026-01-08)LiveBench43.500
- Data Analysis on LiveBench (2026-01-08)LiveBench30.400
- Math on LiveBench (2026-01-08)LiveBench8.600
- Reasoning on LiveBench (2026-01-08)LiveBench4.600
- Language on LiveBench (2026-01-08)LiveBench4.300
- Average on LiveBench (2026-01-08)LiveBench18.300
