Allstate Claims Severity - Tabular Models (8 architectures, Gamma)
Pre-trained models for the Allstate Claims Severity Kaggle competition, covering all eight architectures from the tabular-data-modelling-pipeline on the full 188,318-row training set.
Dataset not redistributed here. The Allstate competition rules restrict redistribution to non-commercial use. To reproduce these models, download the data from Kaggle yourself (see below) - this repo ships the config file, pre-trained weights, and baseline metrics only.
Results
All 8 architectures + NNLS-stacked ensemble, default hyperparameters (no Optuna tuning), 3-seed ensembles per DL architecture, gamma family + log link, 80/20 random split (seed 42).
| Rank | Model | Test Gini | Test MAE (USD) | Test RMSE | A/E ratio | n params | Training time |
|---|---|---|---|---|---|---|---|
| 1 | CANN-GBM | 0.3473 | 1,158 | 1,839 | 1.010 | 300,519 | 14.8 min |
| - | Stacked ensemble (NNLS) | 0.3472 | 1,144 | 1,864 | 1.063 | (9 weights) | - |
| 2 | XGBoost | 0.3468 | 1,152 | 1,850 | 1.027 | 778 trees | 0.7 min |
| 3 | CatBoost | 0.3461 | 1,165 | 1,856 | 1.015 | 946 trees | 2.3 min |
| 4 | CANN | 0.3457 | 1,168 | 1,857 | 1.016 | 300,519 | 15.3 min |
| 5 | DRN | 0.3454 | 1,175 | 1,861 | 1.004 | 300,714 | 14.4 min |
| 6 | LocalGLMnet | 0.3428 | 1,197 | 1,903 | 1.008 | 193,134 | 58.4 min |
| 7 | TabM | 0.3427 | 1,393 | 2,446 | 1.553 | 1,735,956 | 52.9 min |
| 8 | FT-Transformer† | 0.0279 | 2,142 | 3,508 | 3.052 | 700,611 | 231 min |
† FT-Transformer underfit. Despite 188k training rows, the transformer architecture failed to converge under default hyperparameters within the early-stopping window. Predictions are off by a calibration factor of ~3. Two of its three ensemble members hit a flat local minimum near the global mean; the third descended properly but couldn't recover the ensemble. We ship the weights for completeness but do not recommend using FT-T predictions from this collection - retrain with Optuna tuning if you need a competitive transformer baseline.
- Test set: 37,664 rows (20% of 188,318)
- Target:
loss(claim severity, USD) - Loss: Gamma deviance via
reg:gamma(XGBoost) /Tweedie:variance_power=1.99(CatBoost) / explicit gamma NLL (DL) - Cap: 99.5th percentile (= ~$15,200; ~940 rows winsorised)
- Random seed: 42
For reference, the Kaggle competition leaderboard top scores hit MAE ~1126 using extensive tuning + cross-validation. This pipeline's stacked ensemble at MAE 1,144 lands within 1.6% of that using only default hyperparameters - which speaks to the strength of the pipeline's default settings rather than anything novel about the modelling.
How to use this collection
Step 1: Get the data from Kaggle
The Allstate competition data is not redistributed in this repo.
# Set up Kaggle API auth: https://github.com/Kaggle/kaggle-api#api-credentials
# Accept competition rules at:
# https://www.kaggle.com/c/allstate-claims-severity/rules
# Then:
pip install kaggle
kaggle competitions download -c allstate-claims-severity
unzip allstate-claims-severity.zip
# Resulting train.csv is what these models were trained on
Or use the pipeline's downloader (handles the above):
git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"
python scripts/download_data.py --dataset allstate --kaggle
# Saves to data/allstate.csv (188318 rows × 132 cols)
Step 2: Load any of the pre-trained models
CatBoost:
from huggingface_hub import hf_hub_download
from catboost import CatBoostRegressor
import pandas as pd
path = hf_hub_download("t22000t/allstate-tabular-models", "catboost.cbm")
model = CatBoostRegressor()
model.load_model(path)
df = pd.read_csv("data/allstate.csv") # downloaded in Step 1
features = [f"cat{i}" for i in range(1, 117)] + [f"cont{i}" for i in range(1, 15)]
preds = model.predict(df[features])
XGBoost:
from huggingface_hub import hf_hub_download
import xgboost as xgb
path = hf_hub_download("t22000t/allstate-tabular-models", "xgboost.json")
booster = xgb.Booster()
booster.load_model(path)
XGBoost requires the exact preprocessing path used at training time. The easiest way to reproduce inference is to clone the pipeline repo and run the prediction script — see the pipeline README.
Step 3 (alternative): Re-run the full training
git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"
python scripts/download_data.py --dataset allstate --kaggle
OMP_NUM_THREADS=1 python train.py \
--config configs/example_allstate.py \
--input data/allstate.csv \
--skip-tuning --skip-interpretability \
--architectures catboost xgboost cann cann_gbm ft_transformer tabm localglmnet drn
Expected wall-clock: 6-7 hours on Apple M-series CPU/MPS. Most of the time is FT-Transformer (4 hours, the worst-performing architecture); the GBMs + GLM-based DL models finish in under 90 min combined.
(OMP_NUM_THREADS=1 is only needed on macOS arm64 to avoid an OpenMP
conflict; Linux runs are unaffected.)
Files
| File | What it is | Size |
|---|---|---|
example_allstate.py |
DatasetConfig (target=loss, gamma family, all 130 features) |
|
catboost.cbm |
Trained CatBoost (Tweedie:variance_power=1.99) | ~3 MB |
xgboost.json |
Trained XGBoost Booster (reg:gamma) |
5 MB |
cann_member{0,1,2}.pt |
CANN 3-seed ensemble | ~1.5 MB each |
cann_gbm_member{0,1,2}.pt |
CANN-GBM 3-seed ensemble | ~1.5 MB each |
ft_transformer_member{0,1,2}.pt |
FT-Transformer 3-seed ensemble (underfit - see results note) | ~3 MB each |
tabm_member{0,1,2}.pt |
TabM 3-seed ensemble | ~7 MB each |
localglmnet_member{0,1,2}.pt |
LocalGLMnet 3-seed ensemble | ~1 MB each |
drn_member{0,1,2}.pt |
DRN 3-seed ensemble | ~1.5 MB each |
evaluation_summary.csv |
Per-model train/test Gini, MAE, RMSE, A/E ratio, gamma deviance | 660 B |
ensemble_weights.json |
NNLS weights over the 8 base predictions | |
dashboard_dl_models.html |
Interactive Plotly dashboard | |
figures/fig_dl_*.png |
Standalone publication figures | |
model_summary.json |
Structured run record (config, metrics, timing) |
Total collection size: ~31 MB.
Training configuration
| Setting | Value |
|---|---|
| Pipeline | tabular-data-modelling-pipeline v0.1.0 |
| Architectures | All 8 (catboost, xgboost, cann, cann_gbm, ft_transformer, tabm, localglmnet, drn) |
| Hyperparameters | Defaults - no Optuna tuning |
| DL ensemble size | 3 seeds per architecture |
| Family / link | Gamma / log |
| XGBoost objective | reg:gamma |
| CatBoost loss | Tweedie:variance_power=1.99 |
| Train/test split | Random 80/20, seed 42 |
| Cap percentile | 99.5 (=$15,200; ~940 winsorised) |
| Hardware | Apple M-series, MPS device for DL |
| Total wall-clock | 6h 47m (407 min) |
Limitations
- Default hyperparameters only. No Optuna tuning. Kaggle leaderboard winners used extensive tuning + bagging - expect ~0.02-0.03 Gini lift and another ~$10-20 MAE reduction with tuning.
- FT-Transformer underfit - documented above. Don't use those weights directly; retrain with tuning if you need a transformer baseline.
- No interpretability artefacts (Captum attributions, partial
dependence plots) - skipped to keep wall-clock under control. Run
without
--skip-interpretabilityto compute them on a re-run. - All 130 features anonymised. No domain interpretability is
possible directly -
cat1...cat116andcont1...cont14carry no semantic meaning, so monotonicity constraints, base levels, and GLM factor choices were left empty. - Random split, not stratified.
lossis heavy-tailed; a quantile- stratified split would give a more representative test set. - Trained on competition
train.csvonly (test.csv is unlabelled). Not directly comparable to the official leaderboard.
Intended use
- Baseline for actuarial / claims-severity research on the canonical Kaggle dataset.
- Comparing your new tabular architecture against eight strong baselines on real insurance data.
- Teaching gamma-family regression at meaningful scale (188k rows).
- Sanity check for reimplementations of CatBoost/XGBoost/CANN-GBM/etc.
Citation
@software{tabular_data_modelling_pipeline,
author = {Mun, Timothy},
title = {tabular-data-modelling-pipeline},
url = {https://github.com/timothy22000/tabular_data_modelling_pipeline},
year = {2026}
}
@misc{allstate2016,
author = {Allstate Insurance Company},
title = {Allstate Claims Severity},
year = {2016},
url = {https://www.kaggle.com/c/allstate-claims-severity},
note = {Kaggle Competition}
}
Please also cite the individual architecture papers - see the main repo README.
License
MIT for the model code and pipeline. The underlying Allstate dataset is distributed under Kaggle competition terms (non-commercial use only); this repository does not redistribute the raw data.
Related
- 📦 Pipeline: tabular-data-modelling-pipeline
- 🤖 Companion model collections (full datasets included):
t22000t/house-prices-tabular-models- gamma, 1.5k rowst22000t/bike-sharing-tabular-models- poisson, 17k rows
Evaluation results
- Test Gini (CANN-GBM, best) on Allstate Claims Severity (Kaggle competition)self-reported0.347
- Test MAE (CANN-GBM, USD) on Allstate Claims Severity (Kaggle competition)self-reported1158.000