Instructions to use Vihanga445/sinllama-singlis-sentiment-analysis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Vihanga445/sinllama-singlis-sentiment-analysis with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Vihanga445/sinllama-singlis-sentiment-analysis", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use Vihanga445/sinllama-singlis-sentiment-analysis with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vihanga445/sinllama-singlis-sentiment-analysis to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vihanga445/sinllama-singlis-sentiment-analysis to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Vihanga445/sinllama-singlis-sentiment-analysis to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Vihanga445/sinllama-singlis-sentiment-analysis", max_seq_length=2048, )
SinLlama-Singlis-Sentiment-Analysis
This model is a fine-tuned version of SinLlama-8B optimized for Sentiment Analysis on Romanized Sinhala-English code-mixed text (commonly known as Singlish).
Model Details
Model Description
This model was developed to address the linguistic challenges of social media text in Sri Lanka, where users frequently mix Sinhala and English using the Roman alphabet. It utilizes a decoder-only architecture to better capture sequential dependencies and semantic nuances in noisy, informal text compared to traditional statistical models.
- Developed by: V.S. Abeynayake (University of Ruhuna)
- Model type: Decoder-only Large Language Model (LLM)
- Language(s) (NLP): Romanized Sinhala (Singlish) and English
- Finetuned from model: polyglots/SinLlama_v01
- Task: 3-way Sentiment Classification (Positive, Negative, Neutral)
Model Sources
- Repository: Vihanga445/sinllama-singlis-sentiment-analysis
- Thesis: ENHANCING SENTIMENT ANALYSIS FOR ROMANIZED SINHALA-ENGLISH CODE-MIXED SOCIAL MEDIA TEXT
Uses
Direct Use
The model is intended for classifying the sentiment of social media comments, product reviews, and public feedback written in Singlish or code-mixed Sinhala-English.
Out-of-Scope Use
The model is not designed for formal Sinhala literature or technical English documents. It may not perform reliably on languages other than Sinhala and English.
Bias, Risks, and Limitations
- Dataset Bias: The training data is derived from YouTube comments, reflecting a highly informal and domain-specific communication style.
- Class Imbalance: Despite oversampling, the model may still show a slight bias toward the "Positive" class due to its dominance in real-world social media behavior.
- Standardization: Since Singlish has no standardized spelling, extreme phonetic variations not seen in training may affect accuracy.
How to Get Started with the Model
Use the following code to load the model and run clean inference. We use specific termination tokens to prevent the model from rambling or generating extra examples.
import torch
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
from peft import PeftModel
# The Hugging Face model ID for the fine-tuned adapter
hf_model_id = "Vihanga445/sinllama-singlis-sentiment-analysis"
base_model_name = "polyglots/SinLlama_v01"
# 1. Load Tokenizer and Base Model
tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
model, _ = FastLanguageModel.from_pretrained(
model_name = base_model_name,
max_seq_length = 2048,
dtype = torch.bfloat16,
load_in_4bit = True,
resize_model_vocab = 139336,
)
# 2. Attach Adapters and Prep for Inference
model = PeftModel.from_pretrained(model, hf_model_id)
FastLanguageModel.for_inference(model)
# 3. Define the prompt and stopping criteria
prompt = """### Instruction:
Analyze the sentiment of the comment enclosed in square brackets, determine if it is positive, neutral, or negative, and return the answer as the corresponding sentiment label "Pos" or "Neu" or "Neg".
### Input:
[awulak na]
### Response:
"""
inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")
# Define Llama-3 termination tokens
terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
# 4. Generate only the label
outputs = model.generate(
**inputs,
max_new_tokens = 64,
eos_token_id = terminators,
pad_token_id = tokenizer.eos_token_id,
do_sample = False
)
# 5. Clean and Display Output
decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
final_answer = decoded.split("### Response:\n")[-1].strip().split("\n")[0]
print(f"Sentiment: {final_answer}")