Instructions to use Infiniaai/teddy-3.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Infiniaai/teddy-3.5b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Infiniaai/teddy-3.5b",
	filename="teddy_Phi-3.5-10epoch_Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Infiniaai/teddy-3.5b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Infiniaai/teddy-3.5b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Infiniaai/teddy-3.5b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Infiniaai/teddy-3.5b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Infiniaai/teddy-3.5b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Infiniaai/teddy-3.5b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Infiniaai/teddy-3.5b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Infiniaai/teddy-3.5b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Infiniaai/teddy-3.5b:Q4_K_M

Use Docker

docker model run hf.co/Infiniaai/teddy-3.5b:Q4_K_M

LM Studio
Jan
Ollama
How to use Infiniaai/teddy-3.5b with Ollama:
```
ollama run hf.co/Infiniaai/teddy-3.5b:Q4_K_M
```

Unsloth Studio new

How to use Infiniaai/teddy-3.5b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Infiniaai/teddy-3.5b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Infiniaai/teddy-3.5b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Infiniaai/teddy-3.5b to start chatting

Docker Model Runner
How to use Infiniaai/teddy-3.5b with Docker Model Runner:
```
docker model run hf.co/Infiniaai/teddy-3.5b:Q4_K_M
```

Lemonade

How to use Infiniaai/teddy-3.5b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Infiniaai/teddy-3.5b:Q4_K_M

Run and chat with the model

lemonade run user.teddy-3.5b-Q4_K_M

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for Teddy 3.5B

Teddy 3.5B is a fine-tuned conversational AI model based on Phi-3.5-mini-instruct (3.8B parameters). It is designed to deliver warm, gentle, emotionally supportive, and child-friendly conversations. Teddy's tone is soft and reassuring, suitable for creative play, emotional learning, and comforting dialogue.

Model Details

Model Description

Teddy is an empathetic conversational model fine-tuned for supportive, emotion-aware dialogue. It is ideal for child-friendly interactions, imaginative play, and calm, comforting companionship.

Developed by: John Bellew, Infinia
Funded by: Self-funded
Shared by: Infinia.ie
Model type: Causal language model (fine-tuned)
Language(s): English
License: Apache 2.0
Finetuned from model: microsoft/Phi-3.5-mini-instruct

Model Sources

Repository: https://huggingface.co/Infiniaai/teddy-3.5b
Paper: N/A
Demo: N/A

Uses

Direct Use

Teddy may be used directly for:

Emotionally supportive conversations
Child-friendly chat
Imaginative play
Emotional regulation practice
Companion-style dialogue
Embedded offline use (e.g., Raspberry Pi toys)

Downstream Use

Teddy can be integrated into:

AI-powered toys
Companion and wellness apps
Educational emotional-intelligence tools
Storytelling systems
Offline embedded LLM devices

Out-of-Scope Use

Teddy must not be used for:

Mental health diagnosis
Crisis intervention
Medical or psychological treatment
Unsupervised interactions with vulnerable individuals
Tasks requiring professional therapeutic judgment
Child care / Babysitting

Bias, Risks, and Limitations

May misunderstand nuanced emotional situations
Not a replacement for human/parent care
English-only
Can generate inconsistent reasoning due to model size
Requires supervision with children

Recommendations

Users should be aware of risks and limitations and supervise child interactions. Add safety filtering in production. Direct users to professionals for serious mental health needs.

How to Get Started with the Model

import warnings
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Suppress warnings (optional)
warnings.filterwarnings("ignore")

model_name = "Infiniaai/teddy-3.5b"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
    attn_implementation="eager"  # Prevents flash-attention warning
)

messages = [
    {"role": "system", "content": "You are Teddy, a soft and comforting companion."},
    {"role": "user", "content": "I'm feeling sad today."}
]

input_ids = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to(model.device)

output = model.generate(
    input_ids, 
    max_new_tokens=200, 
    temperature=0.7, 
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

Note: You may see a "flash-attention" warning on first generation. This is harmless and can be ignored, or suppressed by adding attn_implementation="eager" as shown above.

Training Details

Training Data

Custom dataset (~12,000 examples) focused on:

Emotional regulation
Comforting responses
Imaginative play
Social skills and empathy
Child-friendly tone
Conflict resolution

Training Procedure

Method: LoRA
Precision: bf16 mixed
Optimizer: AdamW
LoRA rank: 16
LoRA alpha: 32
Learning rate: 2e-4
Epochs: 10
Max sequence length: 512

Speeds, Sizes, Times

Checkpoint size: ~7.4GB (fp16 merged)
Q4_K_M quantised: ~2.3GB

Evaluation

Testing Data

Internal evaluation using emotional-support prompts, safety prompts, story prompts, and child-safe dialogue tests.

Factors

Emotional tone stability
Safety
Child-appropriate language
Multi-turn coherence
Persona consistency

Metrics

Qualitative evaluation only.

Results

The model maintains consistent warmth, emotional safety, and persona adherence across tests.

Model Examination

Manual audits of LoRA layers and behaviour drift analysis.

Environmental Impact

Hardware Type: Consumer GPU
Hours used: 4–6
Cloud Provider: None (local)
Compute Region: Ireland
Carbon Emitted: Very low (<1kg CO2eq estimated)

Technical Specifications

Model Architecture and Objective

Architecture: Phi-3.5 (Transformer)
Parameters: 3.8B
Objective: Next-token prediction
Position embeddings: RoPE (LongRope)
Context length: 131k

Compute Infrastructure

Hardware

Single RTX-class GPU for training
Raspberry Pi 5 for deployment testing (Q4)

Software

Python 3.10
PyTorch 2.x
Transformers 4.40+
PEFT

Citation

BibTeX:

@misc{teddy-3.5b-2025,
  author = {Infinia.ie},
  title = {Teddy 3.5B: A Comforting Conversational AI Companion},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {https://huggingface.co/Infiniaai/teddy-3.5b},
  note = {Fine-tuned from microsoft/Phi-3.5-mini-instruct}
}

APA:

Infinia IE. (2025). Teddy 3.5B: A Comforting Conversational AI Companion. HuggingFace.

More Information

For questions or collaboration: https://huggingface.co/Infiniaai/teddy-3.5b

Model Card Authors

Infinia AI Team

Model Card Contact

https://huggingface.co/Infiniaai

Downloads last month: 11

Safetensors

Model size

4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Infiniaai/teddy-3.5b

Base model

microsoft/Phi-3.5-mini-instruct

Adapter

(703)

this model