Configuration-to-Performance Scaling Law with Neural Ansatz
Paper โข 2602.10300 โข Published
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("OptimizerStudy/NCPL-intermediate", trust_remote_code=True, dtype="auto")This model predicts the performance of neural network configurations using scaling laws. It is trained on the Marin and StepLaw datasets to forecast performance metrics based on model configurations.
NCPL-intermediate (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:
The model consists of:
Base Model: Qwen/Qwen3-1.7B
Numeric MLP:
Prediction Head:
The model was trained on:
The ScalingLawForecaster class can be found in the GitHub repository.
import torch
from transformers import AutoTokenizer
# Get ScalingLawForecaster from: https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law
from model import ScalingLawForecaster
# Load model
model = ScalingLawForecaster(
base_model_name="Qwen/Qwen3-1.7B",
init_from_pretrained=True,
force_fp32=True
)
# Load checkpoint
checkpoint = torch.load("pytorch_model.bin")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
# Prepare inputs
# input_ids: tokenized text sequence
# is_number_mask: boolean mask indicating which tokens are numeric
# number_values_filled: actual numeric values (0 for non-numeric tokens)
with torch.no_grad():
predictions = model(
input_ids=input_ids,
is_number_mask=is_number_mask,
number_values_filled=number_values_filled,
attention_mask=attention_mask
)
The model expects three key inputs:
This model is designed for:
If you use this model in your research, please cite:
@article{ncpl2026,
title = {Neural Configuration to Performance Scaling Law},
author = {Huaqing Zhang and Kaiyue Wen and Tengyu Ma},
journal = {arXiv preprint arXiv:2602.10300},
year = {2026},
url = {https://www.arxiv.org/abs/2602.10300}
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="OptimizerStudy/NCPL-intermediate", trust_remote_code=True)