MitoInteract: Protein-Molecule Binding Affinity Prediction for Mitochondrial Apoptosis Research

Overview

MitoInteract is a dual-encoder model that predicts binding affinity (pKd) between any protein and any molecule. It combines:

  • ESM-2 650M (protein encoder) for protein sequence understanding
  • ChemBERTa (molecule encoder) for SMILES-based molecular representation
  • Bidirectional cross-attention fusion layer
  • 4-layer MLP regression head

Intended Use

This model is designed for mitochondrial apoptosis research, enabling researchers to:

  • Predict how ceramides interact with mitochondrial membrane proteins (VDAC1, VDAC2)
  • Screen BCL-2 family protein interactions with BH3 mimetic drugs (venetoclax, navitoclax, ABT-737)
  • Explore protein-lipid interactions in the apoptosis pathway
  • Run in-silico binding experiments before wet-lab validation

Quick Start

from model import load_model, predict_binding

# Load model
model, config = load_model("full_model.pt", device="cuda")

# Predict ceramide C16 binding to VDAC1
result = predict_binding(
    model,
    protein_seq="MPPYLTFGLKAGALLPLTLPYVRAEAVTKLKLTLNAFEGASK...",  # VDAC1
    smiles="CCCCCCCCCCCCCCCC(=O)N[C@@H](CO)[C@H](O)/C=C/CCCCCCCCCCCCC",  # Ceramide C16
    device="cuda"
)
print(f"Predicted pKd: {result['pKd']:.3f}")
print(f"Predicted Kd: {result['Kd_uM']:.3f} µM")

Key Apoptosis Targets

Protein Role in Apoptosis
BCL-2 Anti-apoptotic, prevents MOMP
BCL-XL Anti-apoptotic, sequesters BAX/BAK
BAX Pro-apoptotic, forms pores in outer membrane
BAK Pro-apoptotic, oligomerizes in membrane
VDAC1 Voltage-dependent anion channel, ceramide target
Cytochrome c Released during MOMP, activates caspase cascade

Key Molecules

Molecule Role
Ceramide C16 Lipid mediator, promotes MOMP via VDAC
Ceramide C2 Short-chain ceramide analog
Venetoclax BCL-2 inhibitor (FDA-approved)
Navitoclax BCL-2/BCL-XL dual inhibitor
ABT-737 BCL-2/BCL-XL/BCL-w inhibitor
Cardiolipin Mitochondrial inner membrane lipid

Training Details

  • Dataset: jglaser/binding_affinity (1.9M protein-ligand pairs)
  • Architecture: ESM-2 650M (frozen) + ChemBERTa (frozen) + Cross-Attention + MLP
  • Training: AdamW, lr=1e-3, cosine schedule, early stopping
  • Best Validation Pearson R: -0.9107

Citation

Based on:

  • BAPULM (arxiv:2411.04150) - frozen encoder + MLP pattern
  • SSM-DTA (arxiv:2206.09818) - CLS cross-attention fusion
  • ESM-2 (arxiv:2202.03555) - protein language model
  • ChemBERTa (arxiv:2010.09885) - molecular language model
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ethanolivertroy/MitoInteract

Papers for ethanolivertroy/MitoInteract