Instructions to use dkalpakchi/SweCTRL-Mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dkalpakchi/SweCTRL-Mini with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dkalpakchi/SweCTRL-Mini")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dkalpakchi/SweCTRL-Mini") model = AutoModelForCausalLM.from_pretrained("dkalpakchi/SweCTRL-Mini") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use dkalpakchi/SweCTRL-Mini with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dkalpakchi/SweCTRL-Mini" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dkalpakchi/SweCTRL-Mini", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/dkalpakchi/SweCTRL-Mini
- SGLang
How to use dkalpakchi/SweCTRL-Mini with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dkalpakchi/SweCTRL-Mini" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dkalpakchi/SweCTRL-Mini", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dkalpakchi/SweCTRL-Mini" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dkalpakchi/SweCTRL-Mini", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use dkalpakchi/SweCTRL-Mini with Docker Model Runner:
docker model run hf.co/dkalpakchi/SweCTRL-Mini
| license: bigscience-openrail-m | |
| datasets: | |
| - mc4 | |
| language: | |
| - sv | |
| library_name: transformers | |
| inference: | |
| parameters: | |
| top_p: 0.9 | |
| repetition_penalty: 1.1 | |
| max_new_tokens: 75 | |
| do_sample: true | |
| widget: | |
| - text: ":nyheter:" | |
| example_title: "News text" | |
| - text: ":wiki:" | |
| example_title: "Wikipedia text" | |
| - text: ":blogg:" | |
| example_title: "Blog post" | |
| - text: ":forum:" | |
| example_title: "Forum" | |
| - text: ":anons:" | |
| example_title: "Ads" | |
| # SweCTRL-Mini | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| SweCTRL-Mini is a large Swedish language model that can be used for inference and fine-tuning on a single consumer-grade GPU. The model is based on the CTRL architecture by Keskar, McCann, Varshney, Xiong, and Socher | |
| (2019), which means that users of the SweCTRL-Mini model can control the genre of the generated text by inserting special tokens in the generation prompts. | |
| Crucially, note that this model is: | |
| - **NOT** trained on following GPT-like instructions, | |
| - **NOT** trained for conversations, like ChatGPT, | |
| - **NOT** trained on any multi-modal data during training. Only one modality -- text, more than 99% of it in Swedish. | |
| **Note on using Inference API (text box to the right):** There are a number of presets that start the text with appropriate control codes to control the genre, e.g., `:wiki:` for | |
| texts form Wikipedia. You can add your own prompt on top of these control codes. For instance, if you want a Wikipedia article about Stockholm, you could write | |
| `:wiki: Stockholm`. The generation in the example is limited to 75 new tokens max. Also, normally the generation should stop after reaching the ending control code, | |
| which has `$` symbol at the end, e.g., `:wiki:$` for Wikipedia texts, however I couldn't configure that here, so please ignore all text after such tokens if they were to be | |
| generated. Additionaly, note, there are **no** filters or other mechanisms for making the text safe from biases or prohibiting it from generating texts on any topics. | |
| ## Model Details | |
| ### Model Description | |
| <!-- Provide a longer summary of what this model is. --> | |
| - **Developed by:** Dmytro Kalpakchi (with supervision from Johan Boye) | |
| - **Shared by:** Dmytro Kalpakchi | |
| - **Model type:** Transformer-based language model trained by predicting the next token | |
| - **Language(s) (NLP):** Swedish | |
| - **License:** BigScience Open RAIL-M | |
| - **Finetuned from model:** None, trained from scratch | |
| ### Model Sources | |
| <!-- Provide the basic links for the model. --> | |
| - **Website:** https://swectrl.dev/ | |
| - **Repository:** https://github.com/dkalpakchi/SweCTRL-Mini | |
| - **Paper:** https://arxiv.org/pdf/2304.13994.pdf | |
| - **Technical note:** https://zenodo.org/record/7868205 | |
| ## Uses | |
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> | |
| ### Direct Use | |
| <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> | |
| The model should be used for generating texts of various genres in Swedish. | |
| ### Out-of-Scope Use | |
| <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> | |
| Please refer to Appendix A of the License file for information of use restrictions. The model has a limited context window of 256 tokens, so it will most probably not work well | |
| for text summarization. Additionally, vast majority of its training data was in Swedish, although it contains tokens in other languages as well, so tasks like | |
| Machine Translation would require further fine-tuning. | |
| ## Bias, Risks, and Limitations | |
| <!-- This section is meant to convey both technical and sociotechnical limitations. --> | |
| To mitigate the inclusion of personally-identifiable data we attempted to remove sources that could contain such data to the best of our ability (see Technical note for | |
| more details on the data filtering process). However, we have still noted that the model can generate text that includes various forms of biases, which is why we strongly | |
| recommend human curation of the generated texts. Currently we have conducted no systematic investigation on either the kinds of biases are included in the generated texts or how | |
| frequently they occur. The contribution of the community on this matter would be very welcome. | |
| ### Recommendations | |
| <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> | |
| For further recommendations on the use of the model, please see the associated paper. | |
| ## How to Get Started with the Model | |
| The fastest way to start with the model is using the code below: | |
| ```py | |
| from transformers import pipeline | |
| pipe = pipeline(model="dkalpakchi/SweCTRL-Mini") | |
| print(pipe(":nyheter:", max_length=256, repetition_penalty=1.1, top_p=0.9)) | |
| ``` | |
| For more advanced uses and other code examples, please see the associated GitHub repository (https://github.com/dkalpakchi/SweCTRL-Mini). | |
| ## Training Details | |
| ### Training Data | |
| <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> | |
| The training data includes the *subset* of cleaned Swedish mC4, as well as some documents from Project Runeberg. | |
| The extensive information on the training data is provided in the Section 1 of the Technical note. | |
| The interface to partially mine training data is available at: https://swectrl.dev/data | |
| ### Training Procedure | |
| <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> | |
| #### Preprocessing [optional] | |
| See Section 1 of the Technical note. | |
| #### Training Hyperparameters | |
| - **Training regime:** fp32 | |
| ## Evaluation | |
| See Sections 5.3, 6, and 7 in the associated paper, and Section 3 of the Technical note. | |
| ## Environmental Impact | |
| <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> | |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | |
| - **Hardware Type:** 8 A100 GPUs | |
| - **Hours used:** 11907.6 GPU-hours for training and experimentation | |
| - **Provider:** BerzeLiUs supercomputer | |
| - **Carbon Emitted:** No public data on carbon efficiency, so hard to estimate | |
| ## Technical Specifications | |
| See Section 3 of the associated paper | |
| ## Citation | |
| **BibTeX:** | |
| ```bibtex | |
| @article{kalpakchi2023swectrl, | |
| title={SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish}, | |
| author={Kalpakchi, Dmytro and Boye, Johan}, | |
| journal={arXiv preprint arXiv:2304.13994}, | |
| year={2023} | |
| } | |
| ``` | |
| **APA:** | |
| Kalpakchi, D., & Boye, J. (2023). SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish. arXiv preprint arXiv:2304.13994. | |
| ## Model Card Authors | |
| Dmytro Kalpakchi (dmytroka@kth.se) | |
| ## Model Card Contact | |
| Dmytro Kalpakchi (dmytroka@kth.se) | |
| # References | |
| Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., & Socher, R. (2019). Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858. |