Instructions to use Sorihon/CYDR-24B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sorihon/CYDR-24B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sorihon/CYDR-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Sorihon/CYDR-24B")
model = AutoModelForMultimodalLM.from_pretrained("Sorihon/CYDR-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Sorihon/CYDR-24B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sorihon/CYDR-24B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sorihon/CYDR-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sorihon/CYDR-24B

SGLang

How to use Sorihon/CYDR-24B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sorihon/CYDR-24B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sorihon/CYDR-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sorihon/CYDR-24B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sorihon/CYDR-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Sorihon/CYDR-24B with Docker Model Runner:
```
docker model run hf.co/Sorihon/CYDR-24B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Quick note:

I expect there to be issues with this model however it will serve as a good component in an upcoming merge the reason I released this is because I found its behaviour different from most models I have tried I think the worst issue that can happen is sometimes it falls into a repition loop within itself most notably when it has little to work with or depending on how the system prompt is structure, another issue is it can take the prompt too literal which leads to interesting albeit frustrating outcomes, My main goal with this model was to have a model that can give short replies while being adaptable in various roleplay scenarios and to that end I am satisfied. this model has in my opinion good prompt following, maybe to a fault as you most likely will have to put in more effort than normal to get the results you may be looking for.

Out of the box it prefers short responses unless of course the greeting message is a sizeable wall of text.

During my testing I found that due to it's desire to follow the prompt it was more than adequate for playing through quick scenarios with decent results, though I have no idea how large complex scenarios will play out using it.

I only speak English, therefore I can't test out other languages.