Instructions to use uw-ssec/OLMo-7B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use uw-ssec/OLMo-7B-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="uw-ssec/OLMo-7B-Instruct-GGUF",
	filename="OLMo-7B-Instruct-Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use uw-ssec/OLMo-7B-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use uw-ssec/OLMo-7B-Instruct-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "uw-ssec/OLMo-7B-Instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "uw-ssec/OLMo-7B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M

Ollama
How to use uw-ssec/OLMo-7B-Instruct-GGUF with Ollama:
```
ollama run hf.co/uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M
```

Unsloth Studio new

How to use uw-ssec/OLMo-7B-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for uw-ssec/OLMo-7B-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for uw-ssec/OLMo-7B-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for uw-ssec/OLMo-7B-Instruct-GGUF to start chatting

Docker Model Runner
How to use uw-ssec/OLMo-7B-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use uw-ssec/OLMo-7B-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull uw-ssec/OLMo-7B-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.OLMo-7B-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

OLMo 7B-Instruct-GGUF

For more details on OLMO-7B-Instruct, refer to Allen AI's OLMo-7B-Instruct model card.

OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo base models are trained on the Dolma dataset. The Instruct version is trained on the cleaned version of the UltraFeedback dataset.

OLMo 7B Instruct is trained for better question answering. They show the performance gain that OLMo base models can achieve with existing fine-tuning techniques.

This version of the model is derived from ssec-uw/OLMo-7B-Instruct-hf as GGUF format, a binary format that is optimized for quick loading and saving of models, making it highly efficient for inference purposes.

In addition to the model being in GGUF format, the model has been quantized, to reduce the computational and memory costs of running inference. We are currently working on adding all of the Quantization Types.

These files are designed for use with GGML and executors based on GGML such as llama.cpp.

Get Started

To get started using one of the GGUF file, you can simply use llama-cpp-python, a Python binding for llama.cpp.

Install llama-cpp-python of at least v0.2.70 with pip. The following command will install a pre-built wheel with basic CPU support. For other installation methods, see llama-cpp-python installation docs.
```
pip install llama-cpp-python>=0.2.70 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```
Download one of the GGUF file. In this example, we will download the OLMo-7B-Instruct-Q4_K_M.gguf, when the link is clicked.

Open up a python interpreter and run the following commands. For example, we can ask it: What is a solar system?

You will need to modify the model_path argument to where the GGUF model has been saved in your system

from llama_cpp import Llama
llm = Llama(
      model_path="path/to/OLMo-7B-Instruct-Q4_K_M.gguf"
)
result_dict = llm.create_chat_completion(
      messages = [
          {
              "role": "user",
              "content": "What is a solar system?"
          }
      ]
)
print(result_dict['choices'][0]['message']['content'])

That's it, you should see the result fairly quickly! Have fun! 🤖

Contact

For errors in this model card, contact Don or Anant, {landungs, anmittal} at uw dot edu.

Acknowledgement

We would like to thank the hardworking folks at Allen AI for providing the original model.

Additionally, the work to convert and quantize the model was done by the University of Washington Scientific Software Engineering Center (SSEC), as part of the Schmidt Sciences Virtual Institute for Scientific Software (VISS).

Downloads last month: 155

GGUF

Model size

7B params

Architecture

olmo

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Space using uw-ssec/OLMo-7B-Instruct-GGUF 1

Collection including uw-ssec/OLMo-7B-Instruct-GGUF

OLMo Suite

Collection

Artifacts for OLMo models. • 2 items • Updated May 7, 2024