ZAYA1-74B-Preview

ZAYA1-74B-Preview is a mixture of experts language model with 4B active and 74B total parameters. This is a reasoning-base checkpoint which has not been tuned for chat or undergone RL post-training. ZAYA1-74B-Preview was trained end to end on AMD.

Learn more on our blog.

Quickstart

Prerequisites

We recommend installing the following libraries in a fresh python environment (tested with python 3.12).

To use ZAYA1-74B-preview, install zaya1-pr branch from our fork of vllm library (the command will trigger a full build of vLLM from source):

pip install "vllm @ git+https://github.com/Zyphra/vllm.git@zaya1-pr"

If you want to run in transformers, install zaya1 branch from our fork of transformers library as well:

pip install "transformers @ git+https://github.com/Zyphra/transformers.git@zaya1"

Deployment

To start vLLM server, run the following command:

vllm serve Zyphra/ZAYA1-74B-Preview --port 8010 \
   --mamba-cache-dtype float32 --dtype bfloat16 \
   --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser zaya_xml

For parallel deployment we recommend using DP with EP as TP for CCA is not supported in the branch above. If running on 8 GPUs, set extra flags -dp 8 -ep to run with DP=EP=8.

Once the server is up, you can query a model with curl like in the following example:

curl http://localhost:8010/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Zyphra/ZAYA1-74B-Preview",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello. How is it going?"}
        ]
    }'
Downloads last month
13
Safetensors
Model size
75B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support