Instructions to use microsoft/Orca-2-13b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Orca-2-13b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/Orca-2-13b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/Orca-2-13b") model = AutoModelForCausalLM.from_pretrained("microsoft/Orca-2-13b") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/Orca-2-13b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Orca-2-13b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Orca-2-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/Orca-2-13b
- SGLang
How to use microsoft/Orca-2-13b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Orca-2-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Orca-2-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Orca-2-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Orca-2-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/Orca-2-13b with Docker Model Runner:
docker model run hf.co/microsoft/Orca-2-13b
It can not even answer this question: 一公斤的棉花和一公斤的铁,哪一个更重?
It can not even answer this question: 一公斤的棉花和一公斤的铁,哪一个更重?
Which template are you using, and have you considered to ask the model in English instead?
Welp, I tested multiple way to massage the model in English, and it seems to insist density somehow matter in the question of weight, thus iron is heavier.
Model doesn't give correct answer unless I ask in very specific manner and template, but then I start asking the same question to a few other model, same old issue.
It is what it is I suppose, LLM truthfulness is always problematic when the internet (which presumably made up orca 2 dataset) can't make up their mind of this simple question to begin with, I blame dumb human.
Well, that's what happens when AI is trained with human data. it will always have some problems, even more remarquable when dealing with dilemas and commun misconceptions. However, it also depends on the models, we are talking about a 13B model here, it's a powerfull one but we cannot expect it to have greater reasonning than humans. Even the most powerfull ones like llama 70B that are capable of answering your question correctly from what I tried still lack a lot in reasonning, but we are getting there ! To be honnest, I am still surprised how well these models with 13B and even 7B params are doing, and cannot wait to see these models being even more optimised.
I got a correct answer. slightly restructured. And not in first try.
The model is from https://huggingface.co/TheBloke/Orca-2-13B-GGUF
orca-2-13b.Q6_K.gguf
Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.
Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.
LLMs should be deterministic unless you sample.
Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.
LLMs should be deterministic unless you sample.
Or... unless you change the prompt like i did? You can always add some randomness to it, like having a previous conversation with it or rephrase the prompt.
I am not able to run a model properly. I am on RTX4000, and it takes a lot of time to process a single answer. Do we have any solution for that
