Instructions to use datamatters24/CaroleNDVoice with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use datamatters24/CaroleNDVoice with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="datamatters24/CaroleNDVoice")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("datamatters24/CaroleNDVoice", dtype="auto")

llama-cpp-python

How to use datamatters24/CaroleNDVoice with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="datamatters24/CaroleNDVoice",
	filename="carole-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use datamatters24/CaroleNDVoice with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf datamatters24/CaroleNDVoice:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf datamatters24/CaroleNDVoice:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf datamatters24/CaroleNDVoice:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf datamatters24/CaroleNDVoice:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf datamatters24/CaroleNDVoice:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf datamatters24/CaroleNDVoice:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf datamatters24/CaroleNDVoice:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf datamatters24/CaroleNDVoice:Q4_K_M

Use Docker

docker model run hf.co/datamatters24/CaroleNDVoice:Q4_K_M

LM Studio
Jan

vLLM

How to use datamatters24/CaroleNDVoice with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "datamatters24/CaroleNDVoice"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "datamatters24/CaroleNDVoice",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/datamatters24/CaroleNDVoice:Q4_K_M

SGLang

How to use datamatters24/CaroleNDVoice with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "datamatters24/CaroleNDVoice" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "datamatters24/CaroleNDVoice",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "datamatters24/CaroleNDVoice" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "datamatters24/CaroleNDVoice",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use datamatters24/CaroleNDVoice with Ollama:
```
ollama run hf.co/datamatters24/CaroleNDVoice:Q4_K_M
```

Unsloth Studio new

How to use datamatters24/CaroleNDVoice with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for datamatters24/CaroleNDVoice to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for datamatters24/CaroleNDVoice to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for datamatters24/CaroleNDVoice to start chatting

Pi new

How to use datamatters24/CaroleNDVoice with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf datamatters24/CaroleNDVoice:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "CaroleNDVoice"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use datamatters24/CaroleNDVoice with Docker Model Runner:
```
docker model run hf.co/datamatters24/CaroleNDVoice:Q4_K_M
```

Lemonade

How to use datamatters24/CaroleNDVoice with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull datamatters24/CaroleNDVoice:Q4_K_M

Run and chat with the model

lemonade run user.CaroleNDVoice-Q4_K_M

List all available models

lemonade list

Llama-Carole-v1

A fine-tuned Llama 3.1 8B Instruct chatbot designed for neurodivergent users — particularly those who experience rejection-sensitive dysphoria (RSD), the visceral spike that comes with criticism or perceived rejection.

Built with Llama.

Carole is a portfolio / educational project. She is named after the author's wife, who has a way of holding hard conversations: validate first, redirect with a question. The model was trained to mirror that pattern.

The live demo is at meetcarole.com (gated).

What this is

A QLoRA fine-tune of meta-llama/Llama-3.1-8B-Instruct
Trained on ~1,500 synthetic conversations seeded from 50 hand-written golden examples
Quantized to Q4_K_M GGUF (~4.5GB) for CPU inference via llama.cpp at ~26 tokens/sec on a single CPU box
Deployed end-to-end with a RAG layer (ChromaDB, all-MiniLM-L6-v2 embeddings, 1,732 chunks across 98 curated Wikipedia articles + reference works)

The defining behavior is validate, then redirect — not as a softener for sycophancy but as a way to deliver pushback without triggering RSD. Praise that doesn't land hurts. Direct correction that skips validation hurts. The pause between the two is the point.

What this is not

Not therapy. Not medical advice. Not a substitute for a clinician.
Not a product. No waitlist, no support, no roadmap.
Not finished. Persona has rough edges. RAG sometimes misses context. The streaming has visible seams.

Files in this repo

*.gguf — Q4_K_M quantization, ready for llama.cpp and compatible runners
*.safetensors — merged full-precision weights for further fine-tuning or alternative quantization

Quickstart (llama.cpp)

./llama-server \
  --hf-repo datamatters24/Llama-Carole-v1 \
  --hf-file llama-8b-persona-q4.gguf \
  --host 127.0.0.1 --port 8085 \
  --threads 12 --ctx-size 4096

Then POST to http://127.0.0.1:8085/v1/chat/completions with an OpenAI-compatible payload.

Training

Setting	Value
Base	`meta-llama/Llama-3.1-8B-Instruct`
Method	QLoRA (4-bit NF4 + LoRA) via TRL `SFTTrainer`
LoRA rank / alpha	64 / 128
Targets	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
Learning rate	2e-4, cosine schedule
Epochs	3, with `load_best_model_at_end=True`
Best checkpoint	epoch 2 (`eval_loss = 1.40`) — clean U-shape on eval
Hardware	1× A100 80GB on RunPod
Dataset	50 golden examples → ~1,500 synthetic conversations

The persona is shaped by the validate-then-redirect pattern, with explicit guidance toward source citation, ND-friendly structure (numbered lists, labeled sections), and a feedback check-in at the end of substantive responses. Banned-phrase filters caught common sycophantic patterns ("obviously", "simply", "just do X", empty "Great idea!" preambles).

Intended use

Educational / portfolio demonstrations of:

A non-sycophantic, neurodivergence-aware conversational pattern
End-to-end fine-tune + RAG + inference on a single CPU box (no ongoing GPU spend)
The deliberate cadence of sentence-by-sentence streaming for RSD-aware UX

Out of scope

Crisis intervention or any clinical mental-health use
Medical / legal / financial advice
Coding / general-purpose assistance (kindly redirects)
Unrestricted public deployment without rate limiting and a clear "AI character, not a therapist" disclaimer

Acknowledgments

The persona draws on:

Marshall Rosenberg, Nonviolent Communication
Dale Carnegie, How to Win Friends and Influence People
Tony Robbins, Awaken the Giant Within

Plus targeted Wikipedia coverage of ADHD, autism, RSD, anxiety, depression, executive function, CBT/DBT, mindfulness, and attachment theory in the RAG corpus.

License

This model is a derivative work of Meta's Llama 3.1 8B Instruct and is distributed under the Llama 3.1 Community License. Use must comply with the Llama 3.1 Acceptable Use Policy.

Built with Llama.

Downloads last month: 7

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for datamatters24/CaroleNDVoice

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(629)

this model