wikimedia/wikipedia
Viewer • Updated • 61.6M • 260k • 1.22k
How to use polyglot-tagger/multilabel-language-identification with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="polyglot-tagger/multilabel-language-identification") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("polyglot-tagger/multilabel-language-identification")
model = AutoModelForSequenceClassification.from_pretrained("polyglot-tagger/multilabel-language-identification")Refer to polyglot-tagger/language-identification. It is trained on the same dataset as a text-classifier rather than as a token classifier.
This model is a fine-tuned version of xlm-roberta-base. It achieves the following results on the evaluation set:
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Accuracy | F1 | Validation Loss | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 0.2186 | 0.2925 | 2500 | 0.8560 | 0.9651 | 0.0395 | 0.9778 | 0.9528 |
| 0.1331 | 0.5851 | 5000 | 0.0232 | 0.9803 | 0.9717 | 0.9760 | 0.9070 |
| 0.1044 | 0.8776 | 7500 | 0.0172 | 0.9828 | 0.9774 | 0.9801 | 0.9218 |
| 0.0851 | 1.1700 | 10000 | 0.0150 | 0.9844 | 0.9801 | 0.9822 | 0.9311 |
| 0.0783 | 1.4626 | 12500 | 0.0136 | 0.9859 | 0.9809 | 0.9834 | 0.9354 |
| 0.0705 | 1.7551 | 15000 | 0.0126 | 0.9861 | 0.9826 | 0.9843 | 0.9399 |
| 0.0692 | 2.0 | 17094 | 0.0123 | 0.9859 | 0.9831 | 0.9845 | 0.9412 |
Base model
FacebookAI/xlm-roberta-base