Instructions to use senseable/33x-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use senseable/33x-coder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="senseable/33x-coder")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("senseable/33x-coder") model = AutoModelForCausalLM.from_pretrained("senseable/33x-coder") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use senseable/33x-coder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "senseable/33x-coder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "senseable/33x-coder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/senseable/33x-coder
- SGLang
How to use senseable/33x-coder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "senseable/33x-coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "senseable/33x-coder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "senseable/33x-coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "senseable/33x-coder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use senseable/33x-coder with Docker Model Runner:
docker model run hf.co/senseable/33x-coder
| language: | |
| - "en" | |
| metrics: | |
| - code_eval | |
| library_name: transformers | |
| tags: | |
| - Code Generation | |
| datasets: | |
| - andersonbcdefg/synthetic_retrieval_tasks | |
| - ise-uiuc/Magicoder-Evol-Instruct-110K | |
| license: "apache-2.0" | |
| # 33x Coding Model | |
| 33x-coder is a powerful Llama based model available on Hugging Face, designed to assist and augment coding tasks. Leveraging the capabilities of advanced language models, 33x-coder specializes in understanding and generating code. This model is trained on a diverse range of programming languages and coding scenarios, making it a versatile tool for developers looking to streamline their coding process. Whether you're debugging, seeking coding advice, or generating entire scripts, 33x-coder can provide relevant, syntactically correct code snippets and comprehensive programming guidance. Its intuitive understanding of coding languages and constructs makes it an invaluable asset for any coding project, helping to reduce development time and improve code quality. | |
| ## Importing necessary libraries from transformers | |
| ``` | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| ``` | |
| ## Initialize the tokenizer and model | |
| ``` | |
| tokenizer = AutoTokenizer.from_pretrained("senseable/33x-coder") | |
| model = AutoModelForCausalLM.from_pretrained("senseable/33x-coder").cuda() | |
| ``` | |
| # User's request for a quick sort algorithm in Python | |
| ``` | |
| messages = [ | |
| {'role': 'user', 'content': "Write a Python function to check if a number is prime."} | |
| ] | |
| ``` | |
| ## Preparing the input for the model by encoding the messages and sending them to the same device as the model | |
| ``` | |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) | |
| ``` | |
| ## Generating responses from the model with specific parameters for text generation | |
| ``` | |
| outputs = model.generate( | |
| inputs, | |
| max_new_tokens=512, # Maximum number of new tokens to generate | |
| do_sample=False, # Disable random sampling to get the most likely next token | |
| top_k=50, # The number of highest probability vocabulary tokens to keep for top-k-filtering | |
| top_p=0.95, # Nucleus sampling: keeps the top p probability mass worth of tokens | |
| num_return_sequences=1, # The number of independently computed returned sequences for each element in the batch | |
| eos_token_id=32021, # End of sequence token id | |
| add_generation_prompt=True | |
| ) | |
| ``` | |
| ## Decoding and printing the generated response | |
| ``` | |
| start_index = len(inputs[0]) | |
| generated_output_tokens = outputs[0][start_index:] | |
| decoded_output = tokenizer.decode(generated_output_tokens, skip_special_tokens=True) | |
| print("Generated Code:\n", decoded_output) | |
| ``` | |
| --- | |
| license: apache-2.0 | |
| --- | |