surge-fm-v2 β€” Chronos-2 fine-tuned for US grid day-ahead load forecasting

Full fine-tune of amazon/chronos-2 on 7 years (2018–2023) of hourly load data across 7 major US balancing authorities β€” PJM, CAISO, ERCOT, MISO, NYISO, ISO-NE, SPP β€” with hourly 2-m temperature (NOAA ASOS) and US-calendar features as covariates.

Results on 2025 hold-out (macro-averaged over 7 BAs)

Model Test MASE 95% CI vs seasonal-naive-24
seasonal-naive-24 (baseline) 1.044 [1.019, 1.071] β€”
XGBoost hourly-binned (Roy '25) 0.901 [0.879, 0.924] βˆ’14%
N-BEATS (Pelekis '23) 0.714 [0.692, 0.738] βˆ’32%
Chronos-Bolt zero-shot 0.688 [0.668, 0.708] βˆ’34%
Chronos-2 zero-shot + covariates 0.567 [0.550, 0.586] βˆ’46%
surge-fm-v2 (this repo) 0.492 [0.477, 0.509] βˆ’53%

PJM specifically: ~1.7 % MAPE day-ahead, matching published ISO-internal accuracy.

See github.com/tylergibbs1/surge for full methodology, benchmark code, and a Next.js playground.

Use

import torch
from chronos import BaseChronosPipeline

pipe = BaseChronosPipeline.from_pretrained(
    "Tylerbry1/surge-fm-v2",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
)

# One-shot probabilistic forecast with covariates. See the Surge repo's
# `src/surge/api/forecaster.py` for the full feature-construction recipe.

Training setup

  • Backbone: amazon/chronos-2 (119 M params)
  • Mode: full fine-tune, LR 5e-6 (cosine), 3 000 steps, batch 16, bf16, seed 42
  • Context: 2 048 hours Β· horizon: 24 hours
  • Loss: quantile regression at the full Chronos-2 quantile grid (21 levels)
  • Covariates: 2-m temperature (ASOS), hour-of-day sin/cos, day-of-week sin/cos, is_weekend, is_holiday (US federal)
  • Trained in ~5 minutes on a single NVIDIA H100 80 GB

Data splits

Split Period Rows per BA
Train 2018-07 β†’ 2023-12 ~47 400
Val 2024 8 808
Test (reported above) 2025 8 808

Limitations

  • Not bankable. For research and reference use only. Do not use for regulated bidding, financial settlement, or any decision where a legally-attested forecast is required.
  • Univariate target. The model forecasts load only; other covariates are passed through but not predicted.
  • Weather ideality. Evaluation used ground-truth ASOS temperature as "future covariate" β€” the upper bound. A real production pipeline with HRRR/GFS forecast temperature will degrade 10–20 % in MAPE terms.
  • Not all BAs. Only the 7 largest US BAs in EIA-930. Small BAs and non-RTO regions aren't represented in training.

License

MIT. Base model (Chronos-2) is Apache 2.0.

Cite

@software{surge_fm_v2,
  title  = {surge-fm-v2 β€” Chronos-2 fine-tuned for US grid load forecasting},
  author = {Surge contributors},
  year   = {2026},
  url    = {https://github.com/tylergibbs1/surge}
}
Downloads last month
112
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Tylerbry1/surge-fm-v2

Base model

amazon/chronos-2
Finetuned
(4)
this model