AMIS Commodity Classifier

This model repository contains artifacts from an AMIS commodity relevance classifier training run. It includes the Transformer model, any configured TF-IDF or sentence-embedding baselines, prediction files, and the training report.

Dataset: faodl/amis-agri-trade-pri-sec
Dataset subset: ``
Text column: chunk_text
Label column: label
Transformer: FacebookAI/xlm-roberta-base
Generated at: 2026-05-18T17:47:01.228362+00:00

Dataset Summary

Split	Rows	Label 0	Label 1	Unique groups	Mean text length
train	4799	2363	2436	2483	695.5
validation	1009	462	547	532	698.1
test	1017	529	488	533	694.6

Threshold Comparison on Test Split

Model	Threshold	Accuracy	Precision	Recall	F1	ROC AUC	Average precision
logistic_tfidf	0.500	0.738	0.736	0.709	0.722	0.838	0.815
logistic_tfidf	0.396	0.744	0.674	0.904	0.772	0.838	0.815
xgboost_tfidf	0.500	0.762	0.786	0.693	0.736	0.847	0.816
xgboost_tfidf	0.305	0.752	0.685	0.895	0.776	0.847	0.816
embedding-logistic_sentence_embeddings	0.500	0.790	0.750	0.842	0.793	0.881	0.863
embedding-logistic_sentence_embeddings	0.315	0.771	0.698	0.922	0.794	0.881	0.863
embedding-svm_sentence_embeddings	0.500	0.788	0.742	0.855	0.794	0.883	0.865
embedding-svm_sentence_embeddings	0.453	0.796	0.735	0.900	0.809	0.883	0.865
embedding-lightgbm_sentence_embeddings	0.500	0.782	0.744	0.832	0.785	0.880	0.867
embedding-lightgbm_sentence_embeddings	0.148	0.759	0.685	0.922	0.786	0.880	0.867
transformer	0.500	0.837	0.786	0.906	0.842	0.919	0.913
transformer	0.383	0.837	0.771	0.939	0.847	0.919	0.913

Confusion Matrices on Test Split

Rows are true labels and columns are predicted labels.

logistic_tfidf at threshold 0.500

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	405	124
RELEVANT	142	346

logistic_tfidf at threshold 0.396

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	316	213
RELEVANT	47	441

xgboost_tfidf at threshold 0.500

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	437	92
RELEVANT	150	338

xgboost_tfidf at threshold 0.305

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	328	201
RELEVANT	51	437

embedding-logistic_sentence_embeddings at threshold 0.500

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	392	137
RELEVANT	77	411

embedding-logistic_sentence_embeddings at threshold 0.315

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	334	195
RELEVANT	38	450

embedding-svm_sentence_embeddings at threshold 0.500

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	384	145
RELEVANT	71	417

embedding-svm_sentence_embeddings at threshold 0.453

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	371	158
RELEVANT	49	439

embedding-lightgbm_sentence_embeddings at threshold 0.500

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	389	140
RELEVANT	82	406

embedding-lightgbm_sentence_embeddings at threshold 0.148

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	322	207
RELEVANT	38	450

transformer at threshold 0.500

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	409	120
RELEVANT	46	442

transformer at threshold 0.383

True / Predicted	NOT_RELEVANT	RELEVANT
NOT_RELEVANT	393	136
RELEVANT	30	458

Validation-Tuned Thresholds

logistic_tfidf: threshold 0.396 (validation F1 0.811); test F1 change vs 0.5: +0.050.
xgboost_tfidf: threshold 0.305 (validation F1 0.813); test F1 change vs 0.5: +0.040.
embedding-logistic_sentence_embeddings: threshold 0.315 (validation F1 0.859); test F1 change vs 0.5: +0.001.
embedding-svm_sentence_embeddings: threshold 0.453 (validation F1 0.861); test F1 change vs 0.5: +0.015.
embedding-lightgbm_sentence_embeddings: threshold 0.148 (validation F1 0.866); test F1 change vs 0.5: +0.001.
transformer: threshold 0.383 (validation F1 0.874); test F1 change vs 0.5: +0.005.

Artifacts

logistic_tfidf: /content/agri-trade-classifier/baselines/logistic
xgboost_tfidf: /content/agri-trade-classifier/baselines/xgboost
embedding-logistic_sentence_embeddings: /content/agri-trade-classifier/baselines/embedding-logistic
embedding-svm_sentence_embeddings: /content/agri-trade-classifier/baselines/embedding-svm
embedding-lightgbm_sentence_embeddings: /content/agri-trade-classifier/baselines/embedding-lightgbm
transformer: /content/agri-trade-classifier/transformer

Inference

Install the runtime dependencies:

pip install transformers torch huggingface_hub pandas joblib scikit-learn xgboost sentence-transformers lightgbm

Transformer

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

MODEL_ID = "faodl/agri-trade-classifier"

texts = [
    "Rice export prices increased after new procurement rules were announced.",
    "The finance ministry released its monthly fuel tax bulletin.",
]

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, subfolder="transformer")
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, subfolder="transformer")
threshold = float(getattr(model.config, "threshold", 0.5))

encoded = tokenizer(
    texts,
    truncation=True,
    padding=True,
    max_length=256,
    return_tensors="pt",
)

with torch.no_grad():
    logits = model(**encoded).logits
    probabilities = torch.softmax(logits, dim=-1)[:, 1].tolist()

for text, probability in zip(texts, probabilities):
    label = model.config.id2label[int(probability >= threshold)]
    print({"text": text, "probability_positive": probability, "label": label})

TF-IDF Baselines

Available baseline names in this run: "logistic", "xgboost".

import json
import joblib
from huggingface_hub import hf_hub_download

MODEL_ID = "faodl/agri-trade-classifier"
BASELINE = "logistic"

texts = [
    "Maize production forecasts were revised after delayed rains.",
    "The central bank published new exchange rate statistics.",
]

model_path = hf_hub_download(
    repo_id=MODEL_ID,
    repo_type="model",
    filename=f"baselines/{BASELINE}/{BASELINE}_tfidf.joblib",
)
report_path = hf_hub_download(
    repo_id=MODEL_ID,
    repo_type="model",
    filename="report.json",
)

pipeline = joblib.load(model_path)
with open(report_path, encoding="utf-8") as handle:
    report = json.load(handle)

threshold = next(
    result["validation_best_threshold"]["threshold"]
    for result in report["results"]
    if result["model_type"] == f"{BASELINE}_tfidf"
)

probabilities = pipeline.predict_proba(texts)[:, 1]
for text, probability in zip(texts, probabilities):
    label = "RELEVANT" if probability >= threshold else "NOT_RELEVANT"
    print({"text": text, "probability_positive": float(probability), "label": label})

Sentence-Embedding Baselines

Available embedding baseline names in this run: "embedding-logistic", "embedding-svm", "embedding-lightgbm".

import joblib
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer

MODEL_ID = "faodl/agri-trade-classifier"
BASELINE = "embedding-logistic"

texts = [
    "Wheat export inspections rose as demand from importers increased.",
    "The sports ministry announced a new stadium renovation plan.",
]

model_path = hf_hub_download(
    repo_id=MODEL_ID,
    repo_type="model",
    filename=f"baselines/{BASELINE}/{BASELINE}.joblib",
)
artifact = joblib.load(model_path)
embedding_model = SentenceTransformer(artifact["embedding_model_name"])
embeddings = embedding_model.encode(
    texts,
    batch_size=artifact.get("embedding_batch_size", 64),
    convert_to_numpy=True,
    normalize_embeddings=artifact.get("normalize_embeddings", True),
)
probabilities = artifact["classifier"].predict_proba(embeddings)[:, 1]
threshold = artifact["validation_best_threshold"]["threshold"]

for text, probability in zip(texts, probabilities):
    label = "RELEVANT" if probability >= threshold else "NOT_RELEVANT"
    print({"text": text, "probability_positive": float(probability), "label": label})

Files

REPORT.md: Markdown report for this training run.
report.json: Machine-readable report containing metrics and thresholds.
transformer/: Fine-tuned Transformer artifacts, when Transformer training is enabled.
baselines/: TF-IDF and sentence-embedding baseline artifacts, when baseline training is enabled.
*/validation_predictions.csv and */test_predictions.csv: Split-level predictions.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for faodl/agri-trade-classifier

Base model

FacebookAI/xlm-roberta-base

Finetuned

(4055)

this model