Nova Chess Engine

A style-conditioned transformer that predicts human chess moves. Given a board position, a target player rating, and two optional style parameters (classical–hypermodern preference and aggression), Nova returns a probability distribution over all legal moves calibrated to how a player of that rating and style would play.

Inference is a single forward pass of the neural network. Nova does not use Monte Carlo tree search, minimax, alpha-beta pruning, or any form of engine-style position evaluation. There is no value head and no lookahead. Move selection comes entirely from learned patterns over the training corpus — fast on CPU (~35-50 ms per position) and categorically different from search-based engines like Stockfish or Leela.

Play Nova directly at novachess.ai — Nova powers the Play and Train with Nova features on the site, where you can face Nova at any rating/style setting with built-in analysis and post-game review.

  • Developed by Nova Chess — https://novachess.ai
  • Model type: pure-policy neural network over chess moves, conditioned on position + rating + two style parameters. Single forward pass per position; no search, no value head, no game history.
  • Language(s): not applicable — input is chess positions (18-channel plane encoding), output is a distribution over 16,384 move indices
  • License: custom non-commercial (see LICENSE)

Full results and reproducibility: RESULTS.md. Source code and docs: https://github.com/novachessai/novachess-engine

Model Details

Inputs

  • positionsfloat32 tensor of shape (B, 18, 8, 8)
    • Planes 0–5: white pieces (P, N, B, R, Q, K), one-hot per square
    • Planes 6–11: black pieces (P, N, B, R, Q, K)
    • Plane 12: side to move (1s if white to move, 0s otherwise)
    • Planes 13–16: castling rights (white-kingside, white-queenside, black-kingside, black-queenside)
    • Plane 17: en-passant file indicator
  • conditioningfloat32 tensor of shape (B, 3)
    • rating_norm = (rating − 800) / (2700 − 800), clipped to [0, 1]
    • classical[0, 1] — opening preference (higher = more classical mainlines)
    • aggression[0, 1] — tactical/sacrificial tendency

Outputs

  • logitsfloat32 tensor of shape (B, 16384). Raw logits over the 16,384-index move space. Caller is responsible for masking illegal moves and applying softmax.

Move index encoding:

move_index = promotion_offset + from_square * 64 + to_square
promotion_offset:
    0      no promotion (also queen promotion)
    4096   knight promotion
    8192   bishop promotion
    12288  rook promotion

where from_square and to_square are standard 0–63 indices (a1 = 0, h8 = 63).

Architectural distinction from prior work

Nova is single-head pure-policy: the network's only output is the move distribution. There is no value head (no game-outcome prediction), no auxiliary head (no side-task supervision on captures, checks, etc.), no search at inference, no lookahead, and no use of any position evaluator before, during, or after the forward pass. Move selection comes entirely from the policy distribution the network learned by predicting actual human moves at the conditioned rating and style.

This contrasts with every published comparable model:

Model Heads at inference Search Notes
Nova (this release) 1 (policy) none single forward pass, ~35–50 ms CPU
Maia-2 (NeurIPS 2024) 3 (policy + value + auxiliary) none value head regresses W/D/L; auxiliary head predicts legal moves, captures, check delivery
Maia-3 (maia3_simplified.onnx) 2 (policy + value) none drops Maia-2's auxiliary head; retains W/D/L value head
Allie (ICLR 2025) 1 (policy) + value via search adaptive MCTS at inference policy is decoder-only over move sequences; MCTS provides per-position evaluation at runtime
Leela (LC0) 2 (policy + value) MCTS engine-strength playing model
Stockfish (NNUE) evaluation only alpha-beta not a human-move predictor

The pure-policy stance is a deliberate design choice. It keeps the model fast (one forward pass, no tree expansion), simple to deploy (just the ONNX file — no MCTS implementation, no auxiliary supervision data at training time, no value-head calibration to maintain), and forces the network to learn move quality entirely from move-selection patterns rather than offloading it to a parallel evaluator. The benchmarks in RESULTS.md show this is competitive with multi-head architectures on the move-prediction task itself.

Files in this repository

  • nova_v3b.onnx + nova_v3b.onnx.data — ONNX export with external data. Both files required at inference time and must keep their exact filenames — the .onnx file embeds a reference to the .data sidecar by name. Place in the same directory before loading.
  • nova_v3b.pt — PyTorch checkpoint (weights only) for research and fine-tuning.
  • unified_sample_600k.pkl — the 600K-position out-of-sample evaluation set used in the results reported below. Schema: {fen, actual, rating, ply, min_clock, piece_count, band, player_id, result, ...}.
  • nova_neutral_600k.pkl, nova_actual_600k.pkl, maia2_600k.pkl, maia3_600k.pkl — per-position predictions from each model on the 600K sample, used for the paired significance tests in RESULTS.md.

Uses

Direct use

  • Predict the probability distribution over legal moves that a human of a specified rating and style would play from a given position.
  • Sample moves to run as a human-like opponent or study partner.
  • Score actual human moves by P(actual_move | position, rating, style) for humanness analysis, move difficulty assessment, or anti-cheat signals.
  • Benchmark other human-move predictors against Nova on shared evaluation sets.
  • Fine-tune on specialized data (specific player corpora, specific opening systems, specific time controls) for personal or research use.

Downstream use

The model is used internally by Nova Chess to power the Play Nova and Train with Nova features at https://novachess.ai, where end users play or practice against the model at chosen rating and style settings. The same weights published in this repository are the ones served by the application.

The in-app version additionally wraps Nova's policy output with a small calibration layer used to tune playing strength across rating tiers. The primary lever is a per-tier temperature schedule. On top of that, Nova's sampled candidates pass through a probabilistic quality check: high-confidence picks (where Nova's policy concentrates significant probability mass on a single move) are played directly, while lower-confidence picks may be sent to Stockfish for a low-depth evaluation. If the evaluation falls below a tier-dependent quality threshold, the move may be replaced by re-sampling from Nova's distribution. Both the rate at which positions are evaluated and the rate at which sub-threshold moves are actually replaced vary by tier, so the bot's mistake profile matches the empirical chess.com CP-loss profile at that level — at lower tiers, more sub-optimal moves slip through (because players at that level make them); at higher tiers, far fewer do. Additional calibration components layer on top of this base flow.

Every move the in-app bot plays still originates from Nova's policy distribution. Stockfish is never used to suggest, generate, or select moves — only to evaluate moves Nova has already proposed, so that obvious blunders at higher tiers can be probabilistically caught. The model weights are never touched. The calibration layer is not part of this release; the released checkpoint is the bare policy model, exactly the surface that benchmarks and downstream research / fine-tuning should target. See the README's "In-app behavior vs the released model" section for the full distinction.

Out-of-scope use

  • Not suitable as a chess-playing engine for maximum-strength competition against search-based engines. Nova is trained on human-move prediction — it maximizes P(move | human of rating R), not P(best move) — and uses no search, no lookahead, and no position evaluation beyond the single forward pass of the policy network. For engine-quality play, a search-based evaluator such as Stockfish remains the correct choice.
  • Not intended for cheating detection as a standalone verdict. Nova probabilities can inform a cheat-detection pipeline but should not be used as sole evidence for accusations.
  • Not validated on chess variants (Chess960, King of the Hill, etc.) — trained only on standard chess.
  • Not a replacement for human coaching — move probabilities are not explanations, and the model does not produce commentary or verbal analysis.

Bias, Risks, and Limitations

  • Training-data distribution. Nova is trained on ~520M positions from Lichess rapid games played Apr–Nov 2025. The player population is self-selected (online rapid players on one platform), skews toward active users in rating bands 1100–2300, and may not represent the full distribution of human chess play. Inferences about moves at extreme ratings (particularly below 800 and above
    1. have less training-data support.
  • Style axis limitations. The classical and aggression axes capture specific operational definitions (opening move choices for classical; captures + territorial control + king pressure for aggression). They do not capture all dimensions of human chess style (combinational richness, prophylaxis, time management, etc.).
  • Rating conditioning is a scalar. Nova receives a single number for rating, not a distribution. The model has learned a continuous interpolation of playing strength, but at the high end of the rating axis the playing strength it produces may saturate below the conditioned rating.
  • No game history. Nova conditions on the current position only, not on the preceding move sequence. Two positions with identical FENs are indistinguishable to the model even if reached through very different games.
  • No check for illegal moves. The raw logits include mass on illegal move indices. Callers must apply a legal-move mask before sampling. See the README quickstart and docs/serving.md for the reference masking pattern.
  • Value / result prediction is not supported. This checkpoint is policy-only; it does not output win/draw/loss probabilities.

Training Details

Training data

Nova was trained on a large corpus of Lichess rapid games, balanced across six rating bands from 800 to 2700+. Position sampling and filtering were tuned to keep all skill levels and all game phases well-represented in training. Details of the data pipeline and cohort balancing are not published.

Training procedure

Nova is trained end-to-end with a cross-entropy objective over the 16,384-index move space. The output is the policy distribution over legal moves, with no auxiliary value head. Specific architectural dimensions and training hyperparameters are not published.

Inference cost

  • CPU (ONNX fp32): 35–50 ms per position on a modern x86 core
  • GPU (batched, H100): ~1 ms per position
  • Inference memory: approximately 500 MB RAM per worker (fp32 weights with external-data sidecar)

Evaluation

Evaluation data

A single held-out evaluation sample of 600,000 positions drawn from Lichess rapid games played in March 2026, stratified at 100,000 positions per rating band. This sample is temporally held out from Nova's training data and is shipped as unified_sample_600k.pkl on Hugging Face.

Metrics

  • hit1 — fraction of positions where the model's top prediction matches the human's actual move (top-1 accuracy)
  • hit5 — fraction of positions where the human's move is in the model's top-5 predictions
  • Mean P(actual) — mean probability mass that the model assigned to the move the human actually played
  • Mean top-5 mass — mean total probability mass assigned to the top-5 predictions

Results

On the 600,000-position sample, comparing Nova against the publicly available Maia-3 checkpoint (maia3_simplified.onnx from https://maiachess.com) and the Maia-2 rapid checkpoint:

Metric Maia-2 Maia-3 Nova (neutral style)
Top-1 hit rate 50.27 % 54.83 % 54.60 %
Top-5 hit rate 88.38 % 91.23 % 91.10 %
Mean P(actual) 38.44 % 42.10 % 42.51 %
Mean top-5 mass 89.33 % 91.96 % 92.26 %

All four Nova-vs-Maia-3 deltas are statistically significant under paired McNemar tests (for hit rates) and paired t-tests (for probability-mass metrics). Both probability-mass deltas remain significant under a player-clustered bootstrap (95% CIs reported in RESULTS.md).

Full breakdown by rating band, Maia tier (Skilled / Advanced / Master), game phase, piece count, and three filter variants (all positions; ply ≥ 10; ply ≥ 10 + clock ≥ 30 s) is in RESULTS.md.

How to use

Minimum-dependency inference example (CPU):

pip install onnxruntime python-chess numpy
import chess
import numpy as np
import onnxruntime as ort

PIECE = {"P":0,"N":1,"B":2,"R":3,"Q":4,"K":5,
         "p":6,"n":7,"b":8,"r":9,"q":10,"k":11}

def fen_to_planes(fen):
    planes = np.zeros((18, 8, 8), dtype=np.float32)
    parts = fen.split()
    board, turn, castling, ep = parts[0], parts[1], parts[2], parts[3]
    for ri, rank_str in enumerate(board.split("/")):
        rank_idx, file_idx = 7 - ri, 0
        for ch in rank_str:
            if ch.isdigit():
                file_idx += int(ch)
            else:
                planes[PIECE[ch], rank_idx, file_idx] = 1.0
                file_idx += 1
    if turn == "w":           planes[12].fill(1.0)
    if "K" in castling:       planes[13].fill(1.0)
    if "Q" in castling:       planes[14].fill(1.0)
    if "k" in castling:       planes[15].fill(1.0)
    if "q" in castling:       planes[16].fill(1.0)
    if ep != "-" and len(ep) == 2:
        planes[17, 0, ord(ep[0]) - ord("a")] = 1.0
    return planes

session = ort.InferenceSession("nova_v3b.onnx",
                               providers=["CPUExecutionProvider"])

board = chess.Board()
positions = fen_to_planes(board.fen())[np.newaxis]
# rating=1600, neutral classical + aggression
conditioning = np.array([[(1600 - 800) / (2700 - 800), 0.5, 0.5]],
                        dtype=np.float32)
logits = session.run(None, {"positions": positions,
                            "conditioning": conditioning})[0][0]

# Mask illegals + softmax
legal = np.zeros(16384, dtype=bool)
for mv in board.legal_moves:
    idx = mv.from_square * 64 + mv.to_square
    if mv.promotion == chess.KNIGHT:   idx += 4096
    elif mv.promotion == chess.BISHOP: idx += 4096 * 2
    elif mv.promotion == chess.ROOK:   idx += 4096 * 3
    legal[idx] = True
masked = np.where(legal, logits, -1e9)
probs = np.exp(masked - masked.max())
probs *= legal
probs /= probs.sum()

top = np.argsort(probs)[::-1][:5]
for i in top:
    print(f"  index {int(i):5d}  p = {probs[i]*100:.2f}%")

For production deployment notes (multi-worker setup, rate limiting, temperature schedules, observability), see docs/serving.md in the GitHub repository.

Citation

Nova Chess Engine. Nova Chess, 2026.
https://github.com/novachessai/novachess-engine
https://huggingface.co/novachess/novachess-engine

BibTeX:

@misc{novachess_2026,
  title  = {Nova Chess Engine},
  author = {Nova Chess},
  year   = {2026},
  url    = {https://github.com/novachessai/novachess-engine}
}

Acknowledgments

Nova builds on prior work in human-move prediction. The evaluation methodology (rating-band stratification, tier definitions, ply-based filters, Lichess rapid data) follows conventions established by the Maia project.

  • Maia-1 — McIlroy-Young, Sen, Kleinberg & Anderson, Aligning Superhuman AI with Human Behavior: Chess as a Model System, KDD 2020. arXiv:2006.01855
  • Maia-2 — Tang, Jiao, McIlroy-Young, Kleinberg, Sen & Anderson, Maia-2: A Unified Model for Human-AI Alignment in Chess, NeurIPS 2024. arXiv:2409.20553
  • Maia-3 — Maia project, https://maiachess.com. The specific checkpoint evaluated here is maia3_simplified.onnx published there.
  • Allie — Khoshneviszadeh, Chi, Sheller et al., Allie: Emergent Human-Like Play Through Adaptive MCTS with a Decoder-Only Transformer, ICLR 2025. arXiv:2410.03893

Contact

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for novachess/novachess-engine