Instructions to use zai-org/LongWriter-glm4-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zai-org/LongWriter-glm4-9b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zai-org/LongWriter-glm4-9b", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("zai-org/LongWriter-glm4-9b", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zai-org/LongWriter-glm4-9b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zai-org/LongWriter-glm4-9b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/LongWriter-glm4-9b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zai-org/LongWriter-glm4-9b

SGLang

How to use zai-org/LongWriter-glm4-9b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zai-org/LongWriter-glm4-9b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/LongWriter-glm4-9b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zai-org/LongWriter-glm4-9b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/LongWriter-glm4-9b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zai-org/LongWriter-glm4-9b with Docker Model Runner:
```
docker model run hf.co/zai-org/LongWriter-glm4-9b
```

Fix TypeError in _pad method by adding missing padding_side field

by ayyylol - opened Oct 2, 2024

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

-0

ayyylol

Oct 2, 2024

Hi,
Thank you for this model!
I noticed that the _pad method in the ChatGLM4Tokenizer class is missing the padding_side field, which is causing a TypeError when calling the encode method.
This issue comes up when making quants with llama.cpp:

  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
    main()
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
    model_instance.write()
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 434, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 427, in prepare_metadata
    self.set_vocab()
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 3928, in set_vocab
    tokpre = self.get_vocab_base_pre(tokenizer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 550, in get_vocab_base_pre
    chktok = tokenizer.encode(chktxt)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2791, in encode
    encoded_inputs = self.encode_plus(
                     ^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3210, in encode_plus
    return self._encode_plus(
           ^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils.py", line 801, in _encode_plus
    return self.prepare_for_model(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3706, in prepare_for_model
    encoded_inputs = self.pad(
                     ^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3508, in pad
    encoded_inputs = self._pad(
                     ^^^^^^^^^^
TypeError: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'

Thank you!

Fix TypeError in _pad method by adding missing padding_side field778b5712

ZHANGYUXUAN-zR changed pull request status to merged Oct 2, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment