Instructions to use phenningsson/sdhk-mlm-pretrained-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use phenningsson/sdhk-mlm-pretrained-full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="phenningsson/sdhk-mlm-pretrained-full")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("phenningsson/sdhk-mlm-pretrained-full") model = AutoModelForMaskedLM.from_pretrained("phenningsson/sdhk-mlm-pretrained-full") - Notebooks
- Google Colab
- Kaggle
Domain-Adapted XLM-RoBERTa-Large for Old Swedish (SDHK)
This model is a domain-adapted version of XLM-RoBERTa-Large, pretrained using Masked Language Modelling (MLM) on the full Old Swedish corpus of the Svenskt Diplomatariums Huvudkartotek (SDHK), the main catalogue of the Swedish Diplomatarium maintained by the Swedish National Archives (Riksarkivet). The model is intended primarily as a foundation for downstream fine-tuning on Old Swedish text and is the base of phenningsson/sdhk-ner-old-swedish-v2, a NER model for Old Swedish charters.
Model Description
- Model type: Masked language model (encoder, XLM-RoBERTa architecture)
- Base model: FacebookAI/xlm-roberta-large
- Language: Old Swedish (medieval Swedish charter language)
- Adaptation method: Continued pre-training (Task-Adaptive Pre-Training, TAPT) on Old Swedish edition texts
The model retains the XLM-RoBERTa-Large tokenizer and architecture, but its encoder weights have been adapted to the orthographic, morphological, and lexical patterns of medieval Swedish charter language that are poorly represented in the multilingual XLM-RoBERTa pre-training data. For more information about this work, see phenningsson/sdhk-ner-old-swedish-v2, and the companion repository.
MLM hyperparameters:
- Base model:
xlm-roberta-large - Epochs: 8
- Effective batch size: 32 (batch 2 × gradient accumulation 16)
- Learning rate: 3e-5
- Max sequence length: 256
- Warm-up ratio: 6%
- LR schedule: cosine
- MLM masking probability: 15%
License
This model is released under the GNU General Public License v3.0 (GPL-3.0).
Contact
For questions or issues, please open an issue on the GitHub repository or contact: phenningsson@me.com
- Downloads last month
- 14