surge-fm-v2 β Chronos-2 fine-tuned for US grid day-ahead load forecasting
Full fine-tune of amazon/chronos-2 on 7 years (2018β2023) of hourly load data across 7 major US balancing authorities β PJM, CAISO, ERCOT, MISO, NYISO, ISO-NE, SPP β with hourly 2-m temperature (NOAA ASOS) and US-calendar features as covariates.
Results on 2025 hold-out (macro-averaged over 7 BAs)
| Model | Test MASE | 95% CI | vs seasonal-naive-24 |
|---|---|---|---|
| seasonal-naive-24 (baseline) | 1.044 | [1.019, 1.071] | β |
| XGBoost hourly-binned (Roy '25) | 0.901 | [0.879, 0.924] | β14% |
| N-BEATS (Pelekis '23) | 0.714 | [0.692, 0.738] | β32% |
| Chronos-Bolt zero-shot | 0.688 | [0.668, 0.708] | β34% |
| Chronos-2 zero-shot + covariates | 0.567 | [0.550, 0.586] | β46% |
| surge-fm-v2 (this repo) | 0.492 | [0.477, 0.509] | β53% |
PJM specifically: ~1.7 % MAPE day-ahead, matching published ISO-internal accuracy.
See github.com/tylergibbs1/surge for full methodology, benchmark code, and a Next.js playground.
Use
import torch
from chronos import BaseChronosPipeline
pipe = BaseChronosPipeline.from_pretrained(
"Tylerbry1/surge-fm-v2",
device_map="cuda" if torch.cuda.is_available() else "cpu",
torch_dtype=torch.bfloat16,
)
# One-shot probabilistic forecast with covariates. See the Surge repo's
# `src/surge/api/forecaster.py` for the full feature-construction recipe.
Training setup
- Backbone:
amazon/chronos-2(119 M params) - Mode: full fine-tune, LR 5e-6 (cosine), 3 000 steps, batch 16, bf16, seed 42
- Context: 2 048 hours Β· horizon: 24 hours
- Loss: quantile regression at the full Chronos-2 quantile grid (21 levels)
- Covariates: 2-m temperature (ASOS), hour-of-day sin/cos, day-of-week
sin/cos,
is_weekend,is_holiday(US federal) - Trained in ~5 minutes on a single NVIDIA H100 80 GB
Data splits
| Split | Period | Rows per BA |
|---|---|---|
| Train | 2018-07 β 2023-12 | ~47 400 |
| Val | 2024 | 8 808 |
| Test (reported above) | 2025 | 8 808 |
Limitations
- Not bankable. For research and reference use only. Do not use for regulated bidding, financial settlement, or any decision where a legally-attested forecast is required.
- Univariate target. The model forecasts load only; other covariates are passed through but not predicted.
- Weather ideality. Evaluation used ground-truth ASOS temperature as "future covariate" β the upper bound. A real production pipeline with HRRR/GFS forecast temperature will degrade 10β20 % in MAPE terms.
- Not all BAs. Only the 7 largest US BAs in EIA-930. Small BAs and non-RTO regions aren't represented in training.
License
MIT. Base model (Chronos-2) is Apache 2.0.
Cite
@software{surge_fm_v2,
title = {surge-fm-v2 β Chronos-2 fine-tuned for US grid load forecasting},
author = {Surge contributors},
year = {2026},
url = {https://github.com/tylergibbs1/surge}
}
- Downloads last month
- 112
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for Tylerbry1/surge-fm-v2
Base model
amazon/chronos-2