Allstate Claims Severity - Tabular Models (8 architectures, Gamma)

Pre-trained models for the Allstate Claims Severity Kaggle competition, covering all eight architectures from the tabular-data-modelling-pipeline on the full 188,318-row training set.

Dataset not redistributed here. The Allstate competition rules restrict redistribution to non-commercial use. To reproduce these models, download the data from Kaggle yourself (see below) - this repo ships the config file, pre-trained weights, and baseline metrics only.

Results

All 8 architectures + NNLS-stacked ensemble, default hyperparameters (no Optuna tuning), 3-seed ensembles per DL architecture, gamma family + log link, 80/20 random split (seed 42).

Rank	Model	Test Gini	Test MAE (USD)	Test RMSE	A/E ratio	n params	Training time
1	CANN-GBM	0.3473	1,158	1,839	1.010	300,519	14.8 min
-	Stacked ensemble (NNLS)	0.3472	1,144	1,864	1.063	(9 weights)	-
2	XGBoost	0.3468	1,152	1,850	1.027	778 trees	0.7 min
3	CatBoost	0.3461	1,165	1,856	1.015	946 trees	2.3 min
4	CANN	0.3457	1,168	1,857	1.016	300,519	15.3 min
5	DRN	0.3454	1,175	1,861	1.004	300,714	14.4 min
6	LocalGLMnet	0.3428	1,197	1,903	1.008	193,134	58.4 min
7	TabM	0.3427	1,393	2,446	1.553	1,735,956	52.9 min
8	FT-Transformer†	0.0279	2,142	3,508	3.052	700,611	231 min

† FT-Transformer underfit. Despite 188k training rows, the transformer architecture failed to converge under default hyperparameters within the early-stopping window. Predictions are off by a calibration factor of ~3. Two of its three ensemble members hit a flat local minimum near the global mean; the third descended properly but couldn't recover the ensemble. We ship the weights for completeness but do not recommend using FT-T predictions from this collection - retrain with Optuna tuning if you need a competitive transformer baseline.

Test set: 37,664 rows (20% of 188,318)
Target: loss (claim severity, USD)
Loss: Gamma deviance via reg:gamma (XGBoost) / Tweedie:variance_power=1.99 (CatBoost) / explicit gamma NLL (DL)
Cap: 99.5th percentile (= ~$15,200; ~940 rows winsorised)
Random seed: 42

For reference, the Kaggle competition leaderboard top scores hit MAE ~1126 using extensive tuning + cross-validation. This pipeline's stacked ensemble at MAE 1,144 lands within 1.6% of that using only default hyperparameters - which speaks to the strength of the pipeline's default settings rather than anything novel about the modelling.

How to use this collection

Step 1: Get the data from Kaggle

The Allstate competition data is not redistributed in this repo.

# Set up Kaggle API auth: https://github.com/Kaggle/kaggle-api#api-credentials
# Accept competition rules at:
#   https://www.kaggle.com/c/allstate-claims-severity/rules
# Then:
pip install kaggle
kaggle competitions download -c allstate-claims-severity
unzip allstate-claims-severity.zip
# Resulting train.csv is what these models were trained on

Or use the pipeline's downloader (handles the above):

git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"
python scripts/download_data.py --dataset allstate --kaggle
# Saves to data/allstate.csv (188318 rows × 132 cols)

Step 2: Load any of the pre-trained models

CatBoost:

from huggingface_hub import hf_hub_download
from catboost import CatBoostRegressor
import pandas as pd

path = hf_hub_download("t22000t/allstate-tabular-models", "catboost.cbm")
model = CatBoostRegressor()
model.load_model(path)

df = pd.read_csv("data/allstate.csv")  # downloaded in Step 1
features = [f"cat{i}" for i in range(1, 117)] + [f"cont{i}" for i in range(1, 15)]
preds = model.predict(df[features])

XGBoost:

from huggingface_hub import hf_hub_download
import xgboost as xgb

path = hf_hub_download("t22000t/allstate-tabular-models", "xgboost.json")
booster = xgb.Booster()
booster.load_model(path)

XGBoost requires the exact preprocessing path used at training time. The easiest way to reproduce inference is to clone the pipeline repo and run the prediction script — see the pipeline README.

Step 3 (alternative): Re-run the full training

git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"
python scripts/download_data.py --dataset allstate --kaggle

OMP_NUM_THREADS=1 python train.py \
    --config configs/example_allstate.py \
    --input data/allstate.csv \
    --skip-tuning --skip-interpretability \
    --architectures catboost xgboost cann cann_gbm ft_transformer tabm localglmnet drn

Expected wall-clock: 6-7 hours on Apple M-series CPU/MPS. Most of the time is FT-Transformer (4 hours, the worst-performing architecture); the GBMs + GLM-based DL models finish in under 90 min combined.

(OMP_NUM_THREADS=1 is only needed on macOS arm64 to avoid an OpenMP conflict; Linux runs are unaffected.)

Files

File	What it is	Size
`example_allstate.py`	DatasetConfig (target=`loss`, gamma family, all 130 features)
`catboost.cbm`	Trained CatBoost (Tweedie:variance_power=1.99)	~3 MB
`xgboost.json`	Trained XGBoost Booster (`reg:gamma`)	5 MB
`cann_member{0,1,2}.pt`	CANN 3-seed ensemble	~1.5 MB each
`cann_gbm_member{0,1,2}.pt`	CANN-GBM 3-seed ensemble	~1.5 MB each
`ft_transformer_member{0,1,2}.pt`	FT-Transformer 3-seed ensemble (underfit - see results note)	~3 MB each
`tabm_member{0,1,2}.pt`	TabM 3-seed ensemble	~7 MB each
`localglmnet_member{0,1,2}.pt`	LocalGLMnet 3-seed ensemble	~1 MB each
`drn_member{0,1,2}.pt`	DRN 3-seed ensemble	~1.5 MB each
`evaluation_summary.csv`	Per-model train/test Gini, MAE, RMSE, A/E ratio, gamma deviance	660 B
`ensemble_weights.json`	NNLS weights over the 8 base predictions
`dashboard_dl_models.html`	Interactive Plotly dashboard
`figures/fig_dl_*.png`	Standalone publication figures
`model_summary.json`	Structured run record (config, metrics, timing)

Total collection size: ~31 MB.

Training configuration

Setting	Value
Pipeline	tabular-data-modelling-pipeline v0.1.0
Architectures	All 8 (catboost, xgboost, cann, cann_gbm, ft_transformer, tabm, localglmnet, drn)
Hyperparameters	Defaults - no Optuna tuning
DL ensemble size	3 seeds per architecture
Family / link	Gamma / log
XGBoost objective	`reg:gamma`
CatBoost loss	`Tweedie:variance_power=1.99`
Train/test split	Random 80/20, seed 42
Cap percentile	99.5 (=$15,200; ~940 winsorised)
Hardware	Apple M-series, MPS device for DL
Total wall-clock	6h 47m (407 min)

Limitations

Default hyperparameters only. No Optuna tuning. Kaggle leaderboard winners used extensive tuning + bagging - expect ~0.02-0.03 Gini lift and another ~$10-20 MAE reduction with tuning.
FT-Transformer underfit - documented above. Don't use those weights directly; retrain with tuning if you need a transformer baseline.
No interpretability artefacts (Captum attributions, partial dependence plots) - skipped to keep wall-clock under control. Run without --skip-interpretability to compute them on a re-run.
All 130 features anonymised. No domain interpretability is possible directly - cat1...cat116 and cont1...cont14 carry no semantic meaning, so monotonicity constraints, base levels, and GLM factor choices were left empty.
Random split, not stratified. loss is heavy-tailed; a quantile- stratified split would give a more representative test set.
Trained on competition train.csv only (test.csv is unlabelled). Not directly comparable to the official leaderboard.

Intended use

Baseline for actuarial / claims-severity research on the canonical Kaggle dataset.
Comparing your new tabular architecture against eight strong baselines on real insurance data.
Teaching gamma-family regression at meaningful scale (188k rows).
Sanity check for reimplementations of CatBoost/XGBoost/CANN-GBM/etc.

Citation

@software{tabular_data_modelling_pipeline,
  author = {Mun, Timothy},
  title  = {tabular-data-modelling-pipeline},
  url    = {https://github.com/timothy22000/tabular_data_modelling_pipeline},
  year   = {2026}
}

@misc{allstate2016,
  author       = {Allstate Insurance Company},
  title        = {Allstate Claims Severity},
  year         = {2016},
  url          = {https://www.kaggle.com/c/allstate-claims-severity},
  note         = {Kaggle Competition}
}

Please also cite the individual architecture papers - see the main repo README.

License

MIT for the model code and pipeline. The underlying Allstate dataset is distributed under Kaggle competition terms (non-commercial use only); this repository does not redistribute the raw data.

📦 Pipeline: tabular-data-modelling-pipeline
🤖 Companion model collections (full datasets included):
- t22000t/house-prices-tabular-models - gamma, 1.5k rows
- t22000t/bike-sharing-tabular-models - poisson, 17k rows

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

Test Gini (CANN-GBM, best) on Allstate Claims Severity (Kaggle competition)
self-reported

0.347
Test MAE (CANN-GBM, USD) on Allstate Claims Severity (Kaggle competition)
self-reported

1158.000

t22000t
/

allstate-tabular-models