alakxender/dhivehi-ner-dataset
Viewer • Updated • 90.7k • 9
How to use alakxender/bert-dhivehi-ner-model with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="alakxender/bert-dhivehi-ner-model") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("alakxender/bert-dhivehi-ner-model")
model = AutoModelForTokenClassification.from_pretrained("alakxender/bert-dhivehi-ner-model")This is a BERT-based Named Entity Recognition model trained specifically for the Dhivehi language. The model can identify and classify named entities in Dhivehi text into different categories including Person (PER), Organization (ORG), Location (LOC), and Miscellaneous (MISC) entities.
alakxender/bert-dhivehi-ner-modelalakxender/bert-fast-dhivehi-tokenizer-extendedThe model can identify the following entity types:
PER: Person namesORG: Organization namesLOC: Location namesMISC: Miscellaneous named entitiesEach entity type uses the standard BIO (Beginning, Inside, Outside) tagging scheme:
B-: Marks the beginning of an entityI-: Marks the continuation (inside) of an entityO: Marks tokens that are not part of any entityHere's how to use the model with the Transformers library:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification
# Load model and tokenizer
model_name = "alakxender/bert-dhivehi-ner-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Create NER pipeline
ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
# Example text (in Dhivehi)
text = "ރަސްމާލެ ނަމުގައި ތަރައްގީކުރާ ކ. ފުށިދިއްގަރު ފަޅުން ބިން ހިއްކުމަށް ސްރީ ލަންކާގެ ކުންފުންޏެއް"
# Get predictions
entities = ner(text)
# Print results
for entity in entities:
print(f"Entity: {entity['word']}")
print(f"Type: {entity['entity_group']}")
print(f"Confidence: {entity['score']:.4f}")
print("---")
The model was trained for 10 epochs with the following training parameters:
Final training metrics:
Base model
google-bert/bert-base-multilingual-cased