UMA-Inverse v1

Ligand-aware protein inverse-folding model. Given a 3D protein backbone structure (and optionally co-crystallized ligands or metals), predicts an amino acid sequence that should fold to that structure.

This is the v1 baseline reported in [PREPRINT TITLE / arXiv ID once available].

Architecture

Dense PairMixer encoder with triangle multiplication, autoregressive decoder. Single-GPU friendly (bf16, max 384 total nodes per batch, gradient checkpointing). Approximately 2.17 million parameters.

The architecture deliberately uses dense pair-wise attention rather than the K-nearest-neighbor message passing common to ProteinMPNN/LigandMPNN.

Training

Trained on PDB assemblies as of Dec 2022, ≤3.5Å resolution, <6000 residues, clustered at 30% sequence identity (LigandMPNN's split).
Stage-3 epoch 11 checkpoint (training stopped when val loss plateaued).
Hardware: single GPU on academic compute cluster.

Benchmark performance

Evaluated on LigandMPNN's test splits. Median sequence recovery across 10 autoregressive samples per PDB at temperature 0.1, random decoding order, Cα-based interface mask within 5.0Å of any ligand atom.

Test set	UMA-Inverse v1	LigandMPNN	ProteinMPNN
Metals (82 PDBs)	0.441	0.775	0.406
Small molecules (316 PDBs)	0.500	0.633	0.505
Nucleotides	not evaluated (featurizer limitation)	0.505	0.358

The v1 model meaningfully underperforms LigandMPNN on metals due to coarse 6-bin element featurization that conflates Zn/Fe/Mg/Ca/Cu/Mn/Ni into a single "other" category. v2 (with per-element atomic-number embedding) addresses this.

Intended use

This model is a research artifact, not a production tool. It is appropriate for:

Reproducing the benchmarks reported in the preprint
Comparison against your own inverse-folding models
Educational and architectural study

It is not validated for actual protein design campaigns. For real design work, use LigandMPNN or ProteinMPNN.

Limitations

No nucleic acid support (DNA/RNA atoms are not represented in training).
Coarse ligand element featurization (see Architecture section).
Trained on a single 24GB VRAM GPUs; not benchmarked at scale.
No wet-lab validation.

How to use

See the GitHub repo for installation and usage: https://github.com/WSobo/UMA-Inverse

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="wsobo/uma-inverse-v1",
    filename="model.ckpt",
)
# Then load with the InferenceSession class — see GitHub README.

Verification

The checkpoint should match model.sha256 exactly. After download:

sha256sum -c model.sha256

Citation

@article{sobolewski2026umainverse,
  title={UMA-Inverse: Dense PairMixer Architecture for Ligand-Aware
         Protein Inverse Folding},
  author={Sobolewski, W.},
  year={2026},
  journal={bioRxiv},
  doi={[FILL IN AFTER UPLOAD]}
}

License

MIT (see GitHub repo).

Downloads last month: 7

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support