Llama-Carole-v1

A fine-tuned Llama 3.1 8B Instruct chatbot designed for neurodivergent users โ€” particularly those who experience rejection-sensitive dysphoria (RSD), the visceral spike that comes with criticism or perceived rejection.

Built with Llama.

Carole is a portfolio / educational project. She is named after the author's wife, who has a way of holding hard conversations: validate first, redirect with a question. The model was trained to mirror that pattern.

The live demo is at meetcarole.com (gated).


What this is

  • A QLoRA fine-tune of meta-llama/Llama-3.1-8B-Instruct
  • Trained on ~1,500 synthetic conversations seeded from 50 hand-written golden examples
  • Quantized to Q4_K_M GGUF (~4.5GB) for CPU inference via llama.cpp at ~26 tokens/sec on a single CPU box
  • Deployed end-to-end with a RAG layer (ChromaDB, all-MiniLM-L6-v2 embeddings, 1,732 chunks across 98 curated Wikipedia articles + reference works)

The defining behavior is validate, then redirect โ€” not as a softener for sycophancy but as a way to deliver pushback without triggering RSD. Praise that doesn't land hurts. Direct correction that skips validation hurts. The pause between the two is the point.

What this is not

  • Not therapy. Not medical advice. Not a substitute for a clinician.
  • Not a product. No waitlist, no support, no roadmap.
  • Not finished. Persona has rough edges. RAG sometimes misses context. The streaming has visible seams.

Files in this repo

  • *.gguf โ€” Q4_K_M quantization, ready for llama.cpp and compatible runners
  • *.safetensors โ€” merged full-precision weights for further fine-tuning or alternative quantization

Quickstart (llama.cpp)

./llama-server \
  --hf-repo datamatters24/Llama-Carole-v1 \
  --hf-file llama-8b-persona-q4.gguf \
  --host 127.0.0.1 --port 8085 \
  --threads 12 --ctx-size 4096

Then POST to http://127.0.0.1:8085/v1/chat/completions with an OpenAI-compatible payload.

Training

Setting Value
Base meta-llama/Llama-3.1-8B-Instruct
Method QLoRA (4-bit NF4 + LoRA) via TRL SFTTrainer
LoRA rank / alpha 64 / 128
Targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate 2e-4, cosine schedule
Epochs 3, with load_best_model_at_end=True
Best checkpoint epoch 2 (eval_loss = 1.40) โ€” clean U-shape on eval
Hardware 1ร— A100 80GB on RunPod
Dataset 50 golden examples โ†’ ~1,500 synthetic conversations

The persona is shaped by the validate-then-redirect pattern, with explicit guidance toward source citation, ND-friendly structure (numbered lists, labeled sections), and a feedback check-in at the end of substantive responses. Banned-phrase filters caught common sycophantic patterns ("obviously", "simply", "just do X", empty "Great idea!" preambles).

Intended use

Educational / portfolio demonstrations of:

  • A non-sycophantic, neurodivergence-aware conversational pattern
  • End-to-end fine-tune + RAG + inference on a single CPU box (no ongoing GPU spend)
  • The deliberate cadence of sentence-by-sentence streaming for RSD-aware UX

Out of scope

  • Crisis intervention or any clinical mental-health use
  • Medical / legal / financial advice
  • Coding / general-purpose assistance (kindly redirects)
  • Unrestricted public deployment without rate limiting and a clear "AI character, not a therapist" disclaimer

Acknowledgments

The persona draws on:

  • Marshall Rosenberg, Nonviolent Communication
  • Dale Carnegie, How to Win Friends and Influence People
  • Tony Robbins, Awaken the Giant Within

Plus targeted Wikipedia coverage of ADHD, autism, RSD, anxiety, depression, executive function, CBT/DBT, mindfulness, and attachment theory in the RAG corpus.

License

This model is a derivative work of Meta's Llama 3.1 8B Instruct and is distributed under the Llama 3.1 Community License. Use must comply with the Llama 3.1 Acceptable Use Policy.

Built with Llama.

Downloads last month
7
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for datamatters24/CaroleNDVoice

Quantized
(629)
this model