Allstate Claims Severity - Tabular Models (8 architectures, Gamma)

Pre-trained models for the Allstate Claims Severity Kaggle competition, covering all eight architectures from the tabular-data-modelling-pipeline on the full 188,318-row training set.

Dataset not redistributed here. The Allstate competition rules restrict redistribution to non-commercial use. To reproduce these models, download the data from Kaggle yourself (see below) - this repo ships the config file, pre-trained weights, and baseline metrics only.

Results

All 8 architectures + NNLS-stacked ensemble, default hyperparameters (no Optuna tuning), 3-seed ensembles per DL architecture, gamma family + log link, 80/20 random split (seed 42).

Rank Model Test Gini Test MAE (USD) Test RMSE A/E ratio n params Training time
1 CANN-GBM 0.3473 1,158 1,839 1.010 300,519 14.8 min
- Stacked ensemble (NNLS) 0.3472 1,144 1,864 1.063 (9 weights) -
2 XGBoost 0.3468 1,152 1,850 1.027 778 trees 0.7 min
3 CatBoost 0.3461 1,165 1,856 1.015 946 trees 2.3 min
4 CANN 0.3457 1,168 1,857 1.016 300,519 15.3 min
5 DRN 0.3454 1,175 1,861 1.004 300,714 14.4 min
6 LocalGLMnet 0.3428 1,197 1,903 1.008 193,134 58.4 min
7 TabM 0.3427 1,393 2,446 1.553 1,735,956 52.9 min
8 FT-Transformer† 0.0279 2,142 3,508 3.052 700,611 231 min

FT-Transformer underfit. Despite 188k training rows, the transformer architecture failed to converge under default hyperparameters within the early-stopping window. Predictions are off by a calibration factor of ~3. Two of its three ensemble members hit a flat local minimum near the global mean; the third descended properly but couldn't recover the ensemble. We ship the weights for completeness but do not recommend using FT-T predictions from this collection - retrain with Optuna tuning if you need a competitive transformer baseline.

  • Test set: 37,664 rows (20% of 188,318)
  • Target: loss (claim severity, USD)
  • Loss: Gamma deviance via reg:gamma (XGBoost) / Tweedie:variance_power=1.99 (CatBoost) / explicit gamma NLL (DL)
  • Cap: 99.5th percentile (= ~$15,200; ~940 rows winsorised)
  • Random seed: 42

For reference, the Kaggle competition leaderboard top scores hit MAE ~1126 using extensive tuning + cross-validation. This pipeline's stacked ensemble at MAE 1,144 lands within 1.6% of that using only default hyperparameters - which speaks to the strength of the pipeline's default settings rather than anything novel about the modelling.

How to use this collection

Step 1: Get the data from Kaggle

The Allstate competition data is not redistributed in this repo.

# Set up Kaggle API auth: https://github.com/Kaggle/kaggle-api#api-credentials
# Accept competition rules at:
#   https://www.kaggle.com/c/allstate-claims-severity/rules
# Then:
pip install kaggle
kaggle competitions download -c allstate-claims-severity
unzip allstate-claims-severity.zip
# Resulting train.csv is what these models were trained on

Or use the pipeline's downloader (handles the above):

git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"
python scripts/download_data.py --dataset allstate --kaggle
# Saves to data/allstate.csv (188318 rows × 132 cols)

Step 2: Load any of the pre-trained models

CatBoost:

from huggingface_hub import hf_hub_download
from catboost import CatBoostRegressor
import pandas as pd

path = hf_hub_download("t22000t/allstate-tabular-models", "catboost.cbm")
model = CatBoostRegressor()
model.load_model(path)

df = pd.read_csv("data/allstate.csv")  # downloaded in Step 1
features = [f"cat{i}" for i in range(1, 117)] + [f"cont{i}" for i in range(1, 15)]
preds = model.predict(df[features])

XGBoost:

from huggingface_hub import hf_hub_download
import xgboost as xgb

path = hf_hub_download("t22000t/allstate-tabular-models", "xgboost.json")
booster = xgb.Booster()
booster.load_model(path)

XGBoost requires the exact preprocessing path used at training time. The easiest way to reproduce inference is to clone the pipeline repo and run the prediction script — see the pipeline README.

Step 3 (alternative): Re-run the full training

git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"
python scripts/download_data.py --dataset allstate --kaggle

OMP_NUM_THREADS=1 python train.py \
    --config configs/example_allstate.py \
    --input data/allstate.csv \
    --skip-tuning --skip-interpretability \
    --architectures catboost xgboost cann cann_gbm ft_transformer tabm localglmnet drn

Expected wall-clock: 6-7 hours on Apple M-series CPU/MPS. Most of the time is FT-Transformer (4 hours, the worst-performing architecture); the GBMs + GLM-based DL models finish in under 90 min combined.

(OMP_NUM_THREADS=1 is only needed on macOS arm64 to avoid an OpenMP conflict; Linux runs are unaffected.)

Files

File What it is Size
example_allstate.py DatasetConfig (target=loss, gamma family, all 130 features)
catboost.cbm Trained CatBoost (Tweedie:variance_power=1.99) ~3 MB
xgboost.json Trained XGBoost Booster (reg:gamma) 5 MB
cann_member{0,1,2}.pt CANN 3-seed ensemble ~1.5 MB each
cann_gbm_member{0,1,2}.pt CANN-GBM 3-seed ensemble ~1.5 MB each
ft_transformer_member{0,1,2}.pt FT-Transformer 3-seed ensemble (underfit - see results note) ~3 MB each
tabm_member{0,1,2}.pt TabM 3-seed ensemble ~7 MB each
localglmnet_member{0,1,2}.pt LocalGLMnet 3-seed ensemble ~1 MB each
drn_member{0,1,2}.pt DRN 3-seed ensemble ~1.5 MB each
evaluation_summary.csv Per-model train/test Gini, MAE, RMSE, A/E ratio, gamma deviance 660 B
ensemble_weights.json NNLS weights over the 8 base predictions
dashboard_dl_models.html Interactive Plotly dashboard
figures/fig_dl_*.png Standalone publication figures
model_summary.json Structured run record (config, metrics, timing)

Total collection size: ~31 MB.

Training configuration

Setting Value
Pipeline tabular-data-modelling-pipeline v0.1.0
Architectures All 8 (catboost, xgboost, cann, cann_gbm, ft_transformer, tabm, localglmnet, drn)
Hyperparameters Defaults - no Optuna tuning
DL ensemble size 3 seeds per architecture
Family / link Gamma / log
XGBoost objective reg:gamma
CatBoost loss Tweedie:variance_power=1.99
Train/test split Random 80/20, seed 42
Cap percentile 99.5 (=$15,200; ~940 winsorised)
Hardware Apple M-series, MPS device for DL
Total wall-clock 6h 47m (407 min)

Limitations

  • Default hyperparameters only. No Optuna tuning. Kaggle leaderboard winners used extensive tuning + bagging - expect ~0.02-0.03 Gini lift and another ~$10-20 MAE reduction with tuning.
  • FT-Transformer underfit - documented above. Don't use those weights directly; retrain with tuning if you need a transformer baseline.
  • No interpretability artefacts (Captum attributions, partial dependence plots) - skipped to keep wall-clock under control. Run without --skip-interpretability to compute them on a re-run.
  • All 130 features anonymised. No domain interpretability is possible directly - cat1...cat116 and cont1...cont14 carry no semantic meaning, so monotonicity constraints, base levels, and GLM factor choices were left empty.
  • Random split, not stratified. loss is heavy-tailed; a quantile- stratified split would give a more representative test set.
  • Trained on competition train.csv only (test.csv is unlabelled). Not directly comparable to the official leaderboard.

Intended use

  • Baseline for actuarial / claims-severity research on the canonical Kaggle dataset.
  • Comparing your new tabular architecture against eight strong baselines on real insurance data.
  • Teaching gamma-family regression at meaningful scale (188k rows).
  • Sanity check for reimplementations of CatBoost/XGBoost/CANN-GBM/etc.

Citation

@software{tabular_data_modelling_pipeline,
  author = {Mun, Timothy},
  title  = {tabular-data-modelling-pipeline},
  url    = {https://github.com/timothy22000/tabular_data_modelling_pipeline},
  year   = {2026}
}

@misc{allstate2016,
  author       = {Allstate Insurance Company},
  title        = {Allstate Claims Severity},
  year         = {2016},
  url          = {https://www.kaggle.com/c/allstate-claims-severity},
  note         = {Kaggle Competition}
}

Please also cite the individual architecture papers - see the main repo README.

License

MIT for the model code and pipeline. The underlying Allstate dataset is distributed under Kaggle competition terms (non-commercial use only); this repository does not redistribute the raw data.

Related

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

  • Test Gini (CANN-GBM, best) on Allstate Claims Severity (Kaggle competition)
    self-reported
    0.347
  • Test MAE (CANN-GBM, USD) on Allstate Claims Severity (Kaggle competition)
    self-reported
    1158.000