Staleguard (int8 ONNX)
A 3-class code↔doc coherence cross-encoder: given a (code premise, prose claim)
pair it predicts {entailment, neutral, contradiction}. Fine-tuned from
microsoft/unixcoder-base,
then exported to ONNX and dynamically quantized to int8 (per-channel,
avx512_vnni) for portable CPU inference.
GitHub: Arthur920/Staleguard · Docs: arthur920.github.io/Staleguard
- Artifact:
model_quantized.onnx(~121 MB, ~4× smaller than the fp32 checkpoint) - Labels:
0=entailment, 1=neutral, 2=contradiction - Lead metric: held-out contradiction precision (repo-disjoint eval split) — ~87.6% precision / ~89.9% recall on the contradiction (alert) class.
Usage
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
repo = "Arthur920/staleguard"
tok = AutoTokenizer.from_pretrained(repo)
model = ORTModelForSequenceClassification.from_pretrained(
repo, file_name="model_quantized.onnx")
inputs = tok("def add(a, b): return a + b",
"The function returns the sum of a and b.",
truncation=True, max_length=192, return_tensors="pt")
logits = model(**inputs).logits
print(model.config.id2label[int(logits.argmax(-1))])
Notes
Int8 dynamic quantization quantizes the Linear/MatMul weights; activations stay
fp32. Parity check vs the fp32 checkpoint showed matching argmax labels on
sample pairs. Re-quantize from the fp32 export with model/quantize.py.
- Downloads last month
- 21
Model tree for Arthur920/staleguard
Base model
microsoft/unixcoder-base