📧 Customer Email Response Classifier

Fine-tuned Gemma 3 1B IT (google/gemma-3-1b-it) for classifying customer email responses into 5 categories. The model generates a structured JSON output and is optimized for low-latency deployment via vLLM.

Model Summary

Property Value
Base model google/gemma-3-1b-it
Task Generative classification (Causal-LM)
PEFT method QLoRA (4-bit) via Unsloth
Training framework Unsloth SFTTrainer with completion-only masking
Dataset size ~3,500 samples
Output format {"classification": "<label>"}
Deployment target vLLM (/v1/chat/completions)

Labels

The model classifies each email into exactly one of:

Label Description
automated_reply Auto-generated out-of-office or delivery receipts
interested Recipient shows genuine interest or engagement
not_interested Recipient explicitly declines or opts out
out_of_office Human OOO message (distinct from automated replies)
unrelated Reply does not relate to the original outreach

Usage

Transformers (local)

import json
import torch
from transformers import pipeline

LABELS = ["automated_reply", "interested", "not_interested", "out_of_office", "unrelated"]
SYSTEM_PROMPT = (
    "You are an email-response classifier. "
    f"Classify the email into exactly one of: {', '.join(LABELS)}. "
    'Reply ONLY with a JSON object in the format: {"classification": "<label>"}. '
    "Do not add any explanation."
)

gen = pipeline(
    "text-generation",
    model="OmarioVIC/customer-email-classifier",
    device=0 if torch.cuda.is_available() else -1,
    do_sample=False,
)

def classify(email_text: str) -> str:
    messages = [{"role": "user", "content": f"{SYSTEM_PROMPT}\n\nEmail text:\n{email_text}"}]
    prompt = gen.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    output = gen(prompt, max_new_tokens=20)
    generated = output[0]["generated_text"].split("<start_of_turn>model")[-1].strip()
    return json.loads(generated)["classification"]

print(classify("Yeah, Monday works — book a 15-min call."))
# → "interested"

vLLM (recommended for production)

Serve:

pip install vllm

vllm serve OmarioVIC/customer-email-classifier \
    --dtype bfloat16 \
    --max-model-len 512

Query:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "OmarioVIC/customer-email-classifier",
    "messages": [{
      "role": "user",
      "content": "Classify into one of: automated_reply, interested, not_interested, out_of_office, unrelated. Reply with JSON only: {\"classification\": \"<label>\"}.\n\nEmail text:\nyeah 15 mins call? free monday"
    }],
    "max_tokens": 20,
    "temperature": 0
  }'

Training Details

Data Format

Each training example is a chat-template conversation:

{
  "messages": [
    {
      "role": "user",
      "content": "<system prompt>\n\nEmail text:\n<raw email body>"
    },
    {
      "role": "assistant",
      "content": "{\"classification\": \"interested\"}"
    }
  ]
}

Only the assistant turn is used for loss computation (completion-only masking via train_on_responses_only).

Hyperparameters

Parameter Value
Epochs 3
Batch size (per device) 4
Gradient accumulation steps 4
Learning rate 2e-4
LR scheduler Cosine
Warmup steps 50
Max sequence length 320
Precision bfloat16 (Ampere+) / float16

LoRA Config

Parameter Value
Rank (r) 32
Alpha 32
Dropout 0.05
Target modules All linear layers
Gradient checkpointing Unsloth optimised

Framework

Training was accelerated using Unsloth, which provides:

  • 2× faster training via custom CUDA kernels
  • ~60% less VRAM via QLoRA 4-bit quantisation

The final model was merged to full 16-bit weights (merged_16bit) for straightforward vLLM deployment.


Limitations

  • Designed for short email replies (max 320 tokens including prompt).
  • Trained on a specific business outreach dataset; may not generalise to all email domains.
  • Output is deterministic (do_sample=False, temperature=0) — always greedy.

License

This model is derived from google/gemma-3-1b-it and is subject to the Gemma Terms of Use.

Downloads last month
91
Safetensors
Model size
1.0B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OmarioVIC/customer-email-classifier

Adapter
(213)
this model