UMA-Inverse v1
Ligand-aware protein inverse-folding model. Given a 3D protein backbone structure (and optionally co-crystallized ligands or metals), predicts an amino acid sequence that should fold to that structure.
This is the v1 baseline reported in [PREPRINT TITLE / arXiv ID once available].
Architecture
Dense PairMixer encoder with triangle multiplication, autoregressive decoder. Single-GPU friendly (bf16, max 384 total nodes per batch, gradient checkpointing). Approximately 2.17 million parameters.
The architecture deliberately uses dense pair-wise attention rather than the K-nearest-neighbor message passing common to ProteinMPNN/LigandMPNN.
Training
- Trained on PDB assemblies as of Dec 2022, ≤3.5Å resolution, <6000 residues, clustered at 30% sequence identity (LigandMPNN's split).
- Stage-3 epoch 11 checkpoint (training stopped when val loss plateaued).
- Hardware: single GPU on academic compute cluster.
Benchmark performance
Evaluated on LigandMPNN's test splits. Median sequence recovery across 10 autoregressive samples per PDB at temperature 0.1, random decoding order, Cα-based interface mask within 5.0Å of any ligand atom.
| Test set | UMA-Inverse v1 | LigandMPNN | ProteinMPNN |
|---|---|---|---|
| Metals (82 PDBs) | 0.441 | 0.775 | 0.406 |
| Small molecules (316 PDBs) | 0.500 | 0.633 | 0.505 |
| Nucleotides | not evaluated (featurizer limitation) | 0.505 | 0.358 |
The v1 model meaningfully underperforms LigandMPNN on metals due to coarse 6-bin element featurization that conflates Zn/Fe/Mg/Ca/Cu/Mn/Ni into a single "other" category. v2 (with per-element atomic-number embedding) addresses this.
Intended use
This model is a research artifact, not a production tool. It is appropriate for:
- Reproducing the benchmarks reported in the preprint
- Comparison against your own inverse-folding models
- Educational and architectural study
It is not validated for actual protein design campaigns. For real design work, use LigandMPNN or ProteinMPNN.
Limitations
- No nucleic acid support (DNA/RNA atoms are not represented in training).
- Coarse ligand element featurization (see Architecture section).
- Trained on a single 24GB VRAM GPUs; not benchmarked at scale.
- No wet-lab validation.
How to use
See the GitHub repo for installation and usage: https://github.com/WSobo/UMA-Inverse
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="wsobo/uma-inverse-v1",
filename="model.ckpt",
)
# Then load with the InferenceSession class — see GitHub README.
Verification
The checkpoint should match model.sha256 exactly. After download:
sha256sum -c model.sha256
Citation
@article{sobolewski2026umainverse,
title={UMA-Inverse: Dense PairMixer Architecture for Ligand-Aware
Protein Inverse Folding},
author={Sobolewski, W.},
year={2026},
journal={bioRxiv},
doi={[FILL IN AFTER UPLOAD]}
}
License
MIT (see GitHub repo).
- Downloads last month
- 7