stanfordnlp/imdb
Viewer β’ Updated β’ 100k β’ 180k β’ 370
How to use ByteMeHarder-404/bert-imdb-ensemble with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="ByteMeHarder-404/bert-imdb-ensemble") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ByteMeHarder-404/bert-imdb-ensemble", dtype="auto")This is an ensemble of 3 BERT-base-uncased models fine-tuned on the IMDb dataset for binary sentiment classification (positive vs. negative reviews).
Each model was trained with a different random seed, and predictions are combined using weighted or unweighted averaging for more robust performance.
bert-base-uncased Dataset: IMDb (train/test split from Hugging Face datasets)
Preprocessing:
bert-base-uncasedHyperparameters:
Trainer) [42, 123, 999]Across the three models, results are very consistent:
| Model (Seed) | Epochs | Val. Accuracy | Val. Macro F1 |
|---|---|---|---|
| 42 | 2 | 93.74% | 0.9374 |
| 123 | 2 | 93.84% | 0.9383 |
| 999 | 2 | 93.98% | 0.9398 |
Ensemble performance (weighted example [0.2, 0.2, 0.6]) improves stability and helps reduce variance across seeds.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("ByteMeHarder-404/bert-imdb-ensemble")
model = AutoModelForSequenceClassification.from_pretrained("ByteMeHarder-404/bert-imdb-ensemble")
inputs = tokenizer("This movie was an absolute masterpiece!", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(probs) # tensor([[0.01, 0.99]]) -> positive sentiment
Base model
google-bert/bert-base-uncased