EvalGuide IELTS AES v2.5

DeBERTa-v3-base ordinal regression model for IELTS Writing Task 2 scoring across four criteria:

  • Task Response
  • Coherence and Cohesion
  • Lexical Resource
  • Grammatical Range and Accuracy

Production checkpoint (current)

Field Value
Variant Augmented + calibrated
Weights ielts_v2.5_base_en_10ep.weights.h5
Calibration ielts_v2.5_base_en_10ep_calibration.pkl
Backbone deberta_v3_base_en
Input format Essay body only (full_text) โ€” no question prefix
Gold harness QWK 0.7989 calibrated / 0.8505 raw (1,952-essay holdout)

Why this checkpoint is served

  1. Calibrated serving โ€” Isotonic calibration plus bias correction improves mean-score alignment (SMD โˆ’0.07 vs v2.4 +0.08) and lowers RMSE, which matters more for production UX than the higher raw QWK ablation.
  2. Augmented training โ€” Synonym augmentation (10% of train essays) is part of the documented v2.5 strategy and was verified active in the final run. The no-aug ablation checkpoint is preserved in repo history (first commit).

Files

File Description
ielts_v2.5_base_en_10ep.weights.h5 Model weights (~3.5 GB)
ielts_v2.5_base_en_10ep_calibration.pkl Isotonic calibration layer
ielts_v2.5_base_en_10ep_config.json Training metadata and metrics
model_config.json Production serving config for EvalGuide backend

Download

hf download koecheup/evalguide-ielts-v2.5 --local-dir backend/model

Place artifacts under evalguide_client/backend/model/ alongside model_config.json.

Inference notes

  • Tokenize essay content only. Do not prepend Question: โ€ฆ โ€” training and offline eval use essay-only input.
  • Apply the calibration artifact after forward pass when serving the production config.
  • Rollback to v2.4: set IELTS_MODEL_NAME=ielts_v2.4_base_en_10ep.weights.h5.

Training summary

  • Real data: 9,760 cleaned essays (ielts_cleaned.csv)
  • Synthetic mix: 15% from 284 cleaned Task 2 essays (koecheup/ielts-synthetic)
  • Augmentation: 10% synonym replacement (780 train essays)
  • Epochs: 10, batch size 8, variance target 2.0 โ†’ 2.7

See docs/backend/v2.5_upgrade_report.md in the EvalGuide repo for full evaluation tables.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support