Maia3 ONNX (fp16 + fp32)
Browser-ready ONNX exports of the Maia3 family of transformer models for human chess move prediction. Each of the four size variants is provided in fp16 (half-precision weights, the headline quantized artifact) and fp32 (full precision).
These are conversions of the original PyTorch checkpoints — same weights, same
predictions — repackaged so they run client-side with
onnxruntime-web (wasm or WebGPU)
without a Python/PyTorch backend.
For the architecture, training recipe, and evaluation see the paper Chessformer: A Unified Architecture for Chess Modeling (ICLR 2026).
Files
| Variant | Params | Base model | fp16 |
fp32 |
|---|---|---|---|---|
| 3M | 3.2M | Maia3-ablate-3M | maia3-3m-ablation.fp16.onnx (6.7 MB) |
maia3-3m-ablation.fp32.onnx (12.8 MB) |
| 5M | 5.2M | Maia3-5M | maia3-5m.fp16.onnx (10.8 MB) |
maia3-5m.fp32.onnx (21.0 MB) |
| 23M | 22.9M | Maia3-23M | maia3-23m.fp16.onnx (45.8 MB) |
maia3-23m.fp32.onnx (91.0 MB) |
| 79M | 78.9M | Maia3-79M | maia3-79m.fp16.onnx (156.1 MB) |
maia3-79m.fp32.onnx (311.7 MB) |
The fp16 files store weights in float16 (≈half the size) while keeping float32 inputs/outputs, so they are numerically robust and a drop-in replacement for the float32 graph.
I/O contract
All models share one simplified signature:
inputs:
tokens float32 [batch, 64, 12] # current board only, piece-only one-hot channels
elo_self float32 [batch] # rating of the side to move
elo_oppo float32 [batch] # rating of the opponent
outputs:
logits_move float32 [batch, 4352] # move policy over the Maia3 move vocabulary
logits_value float32 [batch, 3] # [loss, draw, win] logits for the side to move
Positions are tokenized exactly as in the upstream Maia3 code (board mirrored when Black is to move; move vocabulary of 4352 with 256 promotion entries). The single 64×12 board is replicated across the model's 8-position history internally, and the ponder/timing head is dropped — matching the analysis export used by the Maia platform frontend.
Accuracy
Max per-position error of the fp16 ONNX vs. the original fp32 PyTorch checkpoint (the fp32 ONNX matches PyTorch to ~1e-7):
| Variant | move-prob error | value-prob error |
|---|---|---|
| 3M | 1.3e-4 | 2.8e-4 |
| 5M | 1.7e-4 | 1.2e-4 |
| 23M | 6.9e-5 | 5.4e-4 |
| 79M | 9.6e-5 | 2.7e-4 |
Usage (onnxruntime-web)
import * as ort from 'onnxruntime-web'
// Load a variant (fp16 recommended for the browser)
const session = await ort.InferenceSession.create('maia3-79m.fp16.onnx', {
graphOptimizationLevel: 'basic', // see note below
executionProviders: ['webgpu'], // or ['wasm']
})
// tokens: Float32Array of length batch*64*12, elos: Float32Array length batch
const feeds = {
tokens: new ort.Tensor('float32', tokens, [batch, 64, 12]),
elo_self: new ort.Tensor('float32', eloSelf, [batch]),
elo_oppo: new ort.Tensor('float32', eloOppo, [batch]),
}
const { logits_move, logits_value } = await session.run(feeds)
Optimization level: set graphOptimizationLevel: 'basic'. ORT's aggressive
all/extended CPU fusions hit a known SimplifiedLayerNormFusion bug on fp16
cast boundaries in this graph; basic loads cleanly across ORT versions and is
what the browser runtime effectively uses.
Conversion
Exported with PyTorch's ONNX exporter at opset 17, basic graph optimization
(constant folding + LayerNorm fusion, all default-domain ops), then weights
converted to float16 with float32 I/O preserved (keep_io_types). RMSNorm was
decomposed into opset-17 primitives prior to export.
License
AGPLv3, inherited from the upstream Maia3 checkpoints.
Citation
@inproceedings{monroe2026chessformer,
title={Chessformer: A Unified Architecture for Chess Modeling},
author={Daniel Monroe and George Eilender and Philip Chalmers and Zhenwei Tang and Ashton Anderson},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=2ltBRzEHyd}
}
Model tree for bqrio/maia3-onnx
Base model
UofTCSSLab/Maia3-79M