Instructions to use Jayfeather1024/sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jayfeather1024/sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Jayfeather1024/sft")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Jayfeather1024/sft") model = AutoModelForCausalLM.from_pretrained("Jayfeather1024/sft") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Jayfeather1024/sft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jayfeather1024/sft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jayfeather1024/sft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Jayfeather1024/sft
- SGLang
How to use Jayfeather1024/sft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jayfeather1024/sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jayfeather1024/sft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jayfeather1024/sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jayfeather1024/sft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Jayfeather1024/sft with Docker Model Runner:
docker model run hf.co/Jayfeather1024/sft
| [2023-12-31 20:06:51,176] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| [2023-12-31 20:06:55,387] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. | |
| Detected CUDA_VISIBLE_DEVICES=0,1,2,3 but ignoring it because one or several of --include/--exclude/--num_gpus/--num_nodes cl args were used. If you want to use CUDA_VISIBLE_DEVICES don't pass any of these arguments to deepspeed. | |
| [2023-12-31 20:06:55,387] [INFO] [runner.py:571:main] cmd = /data/jiongxiao_wang/anaconda3/envs/safe-rlhf/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=56337 --module --enable_each_rank_log=None safe_rlhf.finetune --train_datasets alpaca --model_name_or_path huggyllama/llama-7b --max_length 512 --trust_remote_code True --epochs 3 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 16 --gradient_checkpointing --learning_rate 2e-5 --lr_scheduler_type cosine --lr_warmup_ratio 0.03 --weight_decay 0.0 --seed 42 --output_dir /data/jiongxiao_wang/rlhf_attack/safe-rlhf/output/sft --log_type wandb --log_project Safe-RLHF-SFT --zero_stage 3 --bf16 True --tf32 True | |
| [2023-12-31 20:06:57,487] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| [2023-12-31 20:07:00,478] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]} | |
| [2023-12-31 20:07:00,478] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0 | |
| [2023-12-31 20:07:00,478] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]}) | |
| [2023-12-31 20:07:00,478] [INFO] [launch.py:163:main] dist_world_size=4 | |
| [2023-12-31 20:07:00,478] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3 | |
| [2023-12-31 20:07:02,815] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| [2023-12-31 20:07:02,856] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| [2023-12-31 20:07:03,011] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| [2023-12-31 20:07:03,040] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| [2023-12-31 20:07:10,639] [INFO] [comm.py:637:init_distributed] cdb=None | |
| [2023-12-31 20:07:10,640] [INFO] [comm.py:637:init_distributed] cdb=None | |
| [2023-12-31 20:07:10,640] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl | |
| [2023-12-31 20:07:10,670] [INFO] [comm.py:637:init_distributed] cdb=None | |
| [2023-12-31 20:07:10,675] [INFO] [comm.py:637:init_distributed] cdb=None | |
| Set logger level to WARNING. | |
| ninja: no work to do. | |
| Time to load fused_adam op: 0.14865803718566895 seconds | |
| Time to load fused_adam op: 0.2057504653930664 seconds | |
| Time to load fused_adam op: 0.20213913917541504 seconds | |
| Time to load fused_adam op: 0.2022261619567871 seconds | |
| Parameter Offload: Total persistent parameters: 266240 in 65 params | |
| ***** Running training ***** | |
| Saving model to "/data/jiongxiao_wang/rlhf_attack/safe-rlhf/output/sft" ... | |
| Saving DeepSpeed Checkpoints... | |
| Converting DeepSpeed Checkpoints to Hugging Face format... | |
| [2023-12-31 21:51:42,560] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
| Processing zero checkpoint './global_step609' | |
| Detected checkpoint of type zero stage 3, world_size: 4 | |
| Parsing checkpoint created by deepspeed==0.12.6 | |
| Reconstructed Trainable fp32 state dict with 291 params 6738423808 elements | |
| Saving fp32 state dict to pytorch_model.bin | |
| Model saved! | |
| [2023-12-31 21:52:50,198] [INFO] [launch.py:347:main] Process 189883 exits successfully. | |
| [2023-12-31 21:52:50,198] [INFO] [launch.py:347:main] Process 189885 exits successfully. | |
| [2023-12-31 21:52:50,198] [INFO] [launch.py:347:main] Process 189884 exits successfully. | |
| [2023-12-31 21:52:58,206] [INFO] [launch.py:347:main] Process 189882 exits successfully. | |