Tabular-to-Text Clinical Transformer (Schema-Shift Reliability Study)

Custom GPT-style Transformer trained from scratch on serialized clinical records, released as part of the study "Specificity Collapse and Calibration Drift under External Schema Shift."

⚠️ Not for clinical use. These weights are released for research reproducibility only. They were trained on a small public dataset (n=303), show measurable calibration drift and specificity collapse under distribution shift, and must not be used for diagnosis or any real medical decision.

What this is

This repo contains 5 cross-validation fold checkpoints of a tabular-to-text Transformer that consumes a Chinese serialized template of clinical records and predicts heart-disease presence.

File Description
saved_models/model_fold{0..4}.pt PyTorch state_dict for each of the 5 CV folds
modeling_gpt.py Model architecture + GPTConfig (copied from train.py)
tokenizer.pkl Pickled tiktoken.Encoding (custom BPE, Chinese clinical text)
tokenizer_en.pkl Pickled tiktoken.Encoding (custom BPE, English clinical text)

The checkpoints are plain state_dicts (no embedded config), so you must build the model with the architecture in modeling_gpt.py to load them.

Model architecture

GPT-style decoder with grouped-query attention and per-layer value embeddings. Default GPTConfig:

Param Value
vocab_size 32768
n_layer 12
n_head 6
n_kv_head 6
n_embd 768

How to load

import torch
from modeling_gpt import GPT, GPTConfig  # from this repo

config = GPTConfig()                       # match training-time config
model = GPT(config)
sd = torch.load("saved_models/model_fold0.pt", map_location="cpu", weights_only=True)
model.load_state_dict(sd)
model.eval()

Load the tokenizer (requires tiktoken):

import pickle
enc = pickle.load(open("tokenizer.pkl", "rb"))   # tiktoken.Encoding
ids = enc.encode_ordinary("年龄63岁,胸痛类型典型心绞痛")

Download the files first, e.g.:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="Zhanbingli/heart-schema-shift-transformer", local_dir="./ckpt")

Results (from the paper)

Model / Paradigm Internal CV (AUC) External Validation (AUC) External behavior
Random Forest (baseline) 0.911 ± 0.024 0.891 ± 0.042 Stable
Tabular-to-Text Transformer (this model) 0.762 ± 0.070 0.624 ± 0.033 Specificity drop
LLM 5-shot (Qwen3.5-2B) 0.755 ± 0.030 0.739 Best external neural profile
  • Internal data: UCI Heart Disease (n=303)
  • External data: Kaggle Heart Failure Prediction (n=918), schema-aligned by setting ca=0, thal=0 for records lacking those variables.

Limitations

  • Small, hypothesis-generating study — not a leaderboard claim.
  • External cohort is not a native replication of the UCI 13-feature schema.
  • Missingness is encoded differently across pipelines.
  • Trained on a Chinese serialized template; inputs are not natural language.

Citation

@misc{li2026schemashift,
  title  = {Specificity Collapse and Calibration Drift under External Schema Shift},
  author = {Li, Zhanbing},
  year   = {2026},
  doi    = {10.5281/zenodo.20611423},
  url    = {https://doi.org/10.5281/zenodo.20611423}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support