varLens — Pretrained Weights

Lightweight variant-effect predictor that consumes Track Delta features from NTv3-650M-post and returns a sigmoid score per SNV. ~627K parameters — roughly 1/1000 the size of NTv3-650M.

Pretrained on 9.7M autosomal SNVs from gnomAD with MAF-based proxy labels (MAF < 1% → likely-deleterious, MAF > 5% → likely-tolerated; intermediate 1–5% excluded). Final balanced set: 4,869,209 + 4,869,209 SNVs.

Code and full documentation: github.com/omicsEye/varLens Paper: varLens: Enhancing Single Nucleotide Variation (SNV) Effect Prediction Using Language Models. Fu, D. and Rahnavard, A. (in preparation).

Files in this repo

File	Purpose
`varLens_PreTrained_9p7M.pt`	Model weights (torch `state_dict`)
`varLens_PreTrained_9p7M.safetensors`	Same weights in safetensors format
`scaler.npz`	Per-feature mean/std (length 8064 = 63×128) — required to normalize Track Delta inputs
`panel_core_v1.json`	63-track functional panel definition (same file as in the GitHub `configs/`)

All runtime artifacts (.pt or .safetensors, scaler.npz, panel_core_v1.json) are needed to reproduce inference.

Quick use

# 1. Get the code
pip install varlens

# 2. Load weights + scaler directly from this HF repo
python - <<'PY'
import varlens
model = varlens.load("domizzz2025/varLens")  # auto-downloads ckpt + scaler + panel
scores = model.score(my_track_delta)       # (N, 63, 128) -> (N,)
PY

Or via the CLI:

varlens score-features --features my_track_delta.npz --out scores.csv

Minimal Python loading example (no pip install)

import numpy as np
import torch
from varlens.model import VarLensV3   # from the GitHub package

model = VarLensV3(in_channels=63, n_positions=128, center_k=5)
model.load_state_dict(torch.load("varLens_PreTrained_9p7M.pt",
                                 map_location="cpu", weights_only=True))
model.eval()

sc = np.load("scaler.npz")
mean, std = sc["mean"], sc["std"]

# x: (N, 63, 128) Track Delta features from NTv3-650M-post
x_norm = ((x.reshape(len(x), -1) - mean) / (std + 1e-8)).reshape(x.shape)
with torch.no_grad():
    scores = torch.sigmoid(model(torch.from_numpy(x_norm).float())).numpy()

Model details

Architecture: parallel CNN + Attention branches over a 63×128 Track Delta tensor, concatenated into a 384-dim vector, then a 2-layer MLP head. Both branches use center pooling (±5 positions around the SNV, center_k=5).
Input: NTv3(alt_sequence) − NTv3(ref_sequence) across 63 curated functional tracks × 128 positions centered on the SNV. See panel_core_v1.json for the track list (DNase, ATAC, H3K4me3/27ac/4me1/36me3/27me3/9me3, CTCF, other TFs, CAGE, RNA-seq).
Parameters: ~627K
Training FLOPs: 0.16 GFLOPs/sample (vs 333.7 GFLOPs for NTv3-650M fine-tuning)
Training data: 9.7M autosomal gnomAD SNVs with MAF-proxy labels (4,869,209 likely-deleterious + 4,869,209 likely-tolerated)

Evaluation

Evaluated on 5 external variant-effect benchmarks spanning gene expression regulation, clinical pathogenicity, and trait association:

Dataset	Task	Metric
Causal eQTL	Fine-mapped causal vs non-causal eQTL	AUROC
ClinVar Pathogenic	Pathogenic vs Benign	AUROC
ClinVar Pathogenic vs. gnomAD Common Missense	Pathogenic missense vs common missense	AUROC
Mendelian Traits	Causal variants for 113 monogenic traits	AUPRC
Complex Traits	Causal variants across 83 polygenic traits	AUPRC

Full numbers and comparisons against 7 genome language models (DNABERT, DNABERT-2, HyenaDNA, NTv2-100/500M, NTv3-100/650M) are in the GitHub README.

Limitations

The MAF-proxy pretraining objective is an indirect signal for pathogenicity. Low-frequency variants correlate with pathogenic ones but are not identical. Fine-tune on labeled task data when a direct pathogenic/benign decision is needed.
Inputs must come from NTv3-650M-post, the specific 63-track panel, and a 128 bp window centered on the variant. Other NTv3 variants or track selections will not work with these weights.
The model is trained on human autosomal variants only.

Citation

@software{varLens2026,
  title  = {varLens: Enhancing Single Nucleotide Variation (SNV) Effect Prediction Using Language Models},
  author = {Fu, Dezhao and Rahnavard, Ali},
  year   = {2026},
  url    = {https://github.com/omicsEye/varLens}
}

Acknowledgments

This work was supported by the National Science Foundation under grant no. 2109688 to A.R.

Competing Interests

A.R. is the inventor of a pending patent related to the work presented.

License

MIT. See the GitHub repo for full terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including domizzz2025/varLens

varLens

Collection

Enhancing Single Nucleotide Variation (SNV) Effect Prediction Using Language Models • 2 items • Updated 24 days ago