🧬 DrDTI-Reasoner

A biomedical LLM fine-tuned for Drug–Target Interaction (DTI) prediction and reasoning.

Built on Meta LLaMA-3 8B (4-bit quantized) and adapted via LoRA (Low-Rank Adaptation) on structured bioactivity datasets, DrDTI-Reasoner is designed to support computational drug discovery, bioactivity prediction, and molecular reasoning research.


πŸ“‹ Table of Contents


βš™οΈ Model Details

Property Value
Base Model meta-llama/Meta-Llama-3-8B (bnb-4bit quantized)
Fine-tuning Method LoRA (Parameter-Efficient Fine-Tuning)
Framework Unsloth + Hugging Face Transformers
Quantization 4-bit (bitsandbytes)
Domain Drug Discovery / Computational Biology
Task Drug–Target Interaction (DTI) reasoning
License Apache 2.0

πŸ”¬ Capabilities

DrDTI-Reasoner is trained to handle the following tasks:

  • DTI Classification β€” Predict whether a drug interacts with a given protein target
  • Bioactivity Estimation β€” Classify compounds as Active or Inactive
  • Potency Approximation β€” Estimate pXC50 values where applicable
  • Molecular Interpretation β€” Parse and reason over SMILES molecular representations
  • Target Analysis β€” Interpret protein targets from UniProt IDs or gene names
  • Assay-Aware Reasoning β€” Incorporate assay metadata (mechanism, technology, mode) into predictions
  • Biological Explanation β€” Generate short, human-readable justifications for predictions

πŸ§ͺ Input Format

The model expects structured biomedical input in the following format:

Drug (SMILES):       <SMILES string>
Target protein:      <UniProt ID or protein name>
Assay mechanism:     <optional>
Assay technology:    <optional>

Note: Input formatting quality directly affects prediction accuracy. Always validate SMILES strings and use standard UniProt identifiers when possible.

Example Input

Drug (SMILES):       CC1=CC=C(C=C1)S(=O)(=O)N
Target protein:      P00533 (EGFR)
Assay mechanism:     Inhibition
Assay technology:    Biochemical

πŸ“€ Output Format

The model returns structured predictions:

Active:  true | false
pXC50:   <numeric value> | null
Reason:  <short biological explanation of the predicted interaction>

Example Output

Active:  true
pXC50:   7.4
Reason:  The compound's sulfonamide group forms key hydrogen bonds with the EGFR
         ATP-binding pocket, suggesting moderate inhibitory activity consistent
         with the predicted pXC50.

πŸ“Š Training Data

DrDTI-Reasoner was trained on curated drug–target interaction datasets containing:

  • Molecular structures in SMILES format
  • Protein target identifiers (UniProt IDs and gene names)
  • Bioassay metadata (mechanism, technology, and mode)
  • Binary activity labels (active / inactive)
  • Potency values (pXC50) where available

Key sources include ChEMBL and BindingDB bioactivity databases.


πŸ“ˆ Evaluation Results

Evaluated on a held-out test split from the ChEMBL bioactivity dataset:

Metric Value
Accuracy 0.84
F1 Score 0.82
ROC-AUC 0.89

⚠️ These are preliminary research metrics. Results may vary across target classes, assay types, and chemical scaffolds not well-represented in the training data. Scaffold-split benchmarking is planned for a future release.


⚠️ Limitations

Please read carefully before use:

  • Research only β€” This is not a clinical or regulatory-grade system
  • Probabilistic outputs β€” Predictions are model-generated estimates, not experimentally verified results
  • Statistical SMILES understanding β€” Molecular interpretation is learned from data, not from physical simulation or quantum chemistry
  • Format-sensitive β€” Prediction quality degrades with poorly formatted or non-standard inputs
  • Not medically validated β€” This model has not been assessed for safety or efficacy in any clinical context

πŸš€ Intended Use

DrDTI-Reasoner is designed for:

βœ… Computational drug discovery research
βœ… Bioactivity prediction experiments
βœ… Machine learning benchmarking in cheminformatics
βœ… Educational use in bioinformatics and AI
βœ… Early-stage hit identification and prioritization workflows

Not intended for:

❌ Clinical diagnosis or treatment decisions
❌ Regulatory or pharmaceutical approval processes
❌ Any application directly impacting human health


πŸ—ΊοΈ Roadmap

  • Improved molecular encoding β€” SMILES-aware tokenization and embedding strategies
  • Protein sequence integration β€” Embedding protein context via ESM models
  • Multi-task learning β€” Joint classification (active/inactive) and regression (pXC50) heads
  • Scaffold-based evaluation β€” Benchmarking on scaffold-split datasets to assess generalization
  • Calibrated confidence scores β€” Better-calibrated uncertainty estimates for pXC50 predictions

πŸ“¦ Base Model & Framework

Component Details
Base Model meta-llama/Meta-Llama-3-8B (4-bit quantized)
PEFT Method LoRA via Unsloth
Inference Framework Hugging Face Transformers

Disclaimer: DrDTI-Reasoner is a research tool. All predictions should be interpreted in the context of existing literature and validated experimentally before any downstream use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Cyanex/DrDTI-Reasoner

Adapter
(726)
this model

Dataset used to train Cyanex/DrDTI-Reasoner

Paper for Cyanex/DrDTI-Reasoner

Evaluation results

  • Accuracy on ChEMBL Bioactivity (held-out test set)
    self-reported
    0.840
  • F1 Score on ChEMBL Bioactivity (held-out test set)
    self-reported
    0.820
  • ROC-AUC on ChEMBL Bioactivity (held-out test set)
    self-reported
    0.890