Instructions to use classicgrey/gemma4-e4b-memory-policy-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use classicgrey/gemma4-e4b-memory-policy-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e4b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "classicgrey/gemma4-e4b-memory-policy-lora") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use classicgrey/gemma4-e4b-memory-policy-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for classicgrey/gemma4-e4b-memory-policy-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for classicgrey/gemma4-e4b-memory-policy-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for classicgrey/gemma4-e4b-memory-policy-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="classicgrey/gemma4-e4b-memory-policy-lora", max_seq_length=2048, )
Gemma 4 E4B Memory Policy LoRA
Experimental LoRA adapter for google/gemma-4-E4B-it, fine-tuned to act as the
policy layer of a German-first memory agent.
The goal is not to store real memories inside the model weights. The adapter teaches the model to decide when to call memory tools, when to answer directly, and when to ask for clarification.
Model on Hugging Face:
classicgrey/gemma4-e4b-memory-policy-lora
What This Adapter Is For
This adapter is intended for assistant systems that use external memory storage such as a database, vector store, or local memory service. It learns routing behavior for situations like:
- save stable user preferences, project facts, routines, tasks, and decisions
- avoid saving transient states such as "I am tired right now"
- retrieve durable memory when a question depends on stored context
- recall recent conversation when the user says "vorhin", "eben", or "gerade"
- update or delete memory only after a known memory ID exists
- ask a clarification question when the memory reference is ambiguous
- keep scene or vehicle actions separate from memory tools
Expected Output Schema
The assistant target is a single JSON object:
{
"action": "tool_call",
"tool_calls": [
{
"name": "saveMemory",
"arguments": {
"content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
"category": "personal"
}
}
],
"response": "",
"policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
}
Supported actions:
tool_call
final_response
ask_clarification
Core memory tools:
saveMemory
retrieveMemory
recallRecentConversation
updateMemory
deleteMemory
Example Behavior
User:
Merk dir bitte, dass ich Codebeispiele immer mit Tests sehen möchte.
Desired assistant output:
{
"action": "tool_call",
"tool_calls": [
{
"name": "saveMemory",
"arguments": {
"content": "Der Nutzer möchte Codebeispiele immer mit Tests sehen.",
"category": "personal"
}
}
],
"response": "",
"policy_reason": "stabile wiederverwendbare Nutzerpräferenz"
}
User:
Ich bin gerade ziemlich müde, aber lass uns kurz weitermachen.
Desired assistant output:
{
"action": "final_response",
"tool_calls": [],
"response": "Klar, wir machen ruhig und fokussiert weiter.",
"policy_reason": "momentaner Zustand ohne Speicherwunsch"
}
Dataset
The training data is synthetic and intentionally generic. It does not contain private real user memories.
Current merged dataset:
total examples: 1,766
train examples: 1,501
validation examples: 265
Category distribution:
save_memory: 334
no_memory_write: 356
retrieve_memory: 235
recall_recent: 185
update_memory_lookup: 134
update_memory_known_id: 109
delete_memory_lookup: 137
delete_memory_known_id: 87
clarification: 101
direct_response: 55
scene_tool: 26
other seed categories: 7
The dataset was built from a small hand-written seed set plus synthetic expansion with DeepSeek. Generated candidates were validated, filtered for schema issues, deduplicated by user utterance, and checked for obvious PII-like patterns before merging.
Local dataset files:
dataset/combined_train.jsonl
dataset/combined_validation.jsonl
dataset/combined_manifest.json
Training
Training was done with Unsloth on Google Colab using a T4 GPU.
Base model:
google/gemma-4-E4B-it
Adapter:
classicgrey/gemma4-e4b-memory-policy-lora
Training shape:
method: QLoRA / LoRA SFT
max sequence length: 1024
steps: 300
batch size: 1
gradient accumulation: 4
LoRA rank: 16
LoRA alpha: 16
Colab notebook:
notebooks/gemma_memory_unsloth_colab.ipynb
Loading With Unsloth
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="classicgrey/gemma4-e4b-memory-policy-lora",
max_seq_length=1024,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
You still need access to the base model google/gemma-4-E4B-it on Hugging Face.
Local Reproduction
Validate the generated dataset:
python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl
Run the local Unsloth training script:
uv run scripts/train_unsloth_sft.py \
--model google/gemma-4-E4B-it \
--train-file dataset/combined_train.jsonl \
--validation-file dataset/combined_validation.jsonl \
--output-dir outputs/gemma4-e4b-memory-policy-lora \
--max-steps 300 \
--batch-size 1 \
--grad-accum 4 \
--max-seq-length 1024
Generate more synthetic candidates:
uv run scripts/generate_deepseek_dataset.py \
--model deepseek-v4-pro \
--target-total 1500 \
--batch-size 20 \
--workers 5 \
--seed-sample-size 8 \
--max-tokens 4096 \
--out dataset/generated_candidates.jsonl
Filter and merge candidates:
python3 scripts/filter_generated_dataset.py
python3 scripts/inspect_generated_dataset.py dataset/generated_candidates.filtered.jsonl
python3 scripts/merge_dataset.py
python3 scripts/validate_dataset.py dataset/combined_train.jsonl dataset/combined_validation.jsonl
Known Limitations
This is a first experimental adapter. It has learned the broad routing behavior,
but schema adherence still needs improvement. In early manual tests, the model
correctly chose saveMemory, but sometimes used non-target argument names such
as memory or memory_id instead of the desired content and category.
For production use, add:
- stricter schema-only examples
- adversarial negative examples
- a held-out evaluation set
- JSON schema validation after generation
- automatic repair or rejection of malformed tool arguments
- more multi-turn traces after tool results
Safety Notes
Do not use this adapter as the memory store. Use it only as a planner/router. Actual user memories should live in an external system with explicit retention, deletion, privacy, and audit controls.
Do not add private real user memories to public training data.
Project Files
scripts/build_dataset.py seed dataset builder
scripts/generate_deepseek_dataset.py synthetic expansion generator
scripts/inspect_generated_dataset.py quality inspector
scripts/filter_generated_dataset.py candidate filter
scripts/merge_dataset.py train/validation merger
scripts/train_unsloth_sft.py Unsloth SFT script
notebooks/gemma_memory_unsloth_colab.ipynb
- Downloads last month
- 37