gLM-150M

Minimal HuggingFace port of the 150M parameter variant of gLM2 -- a mixed-modality genomic language model that encodes a genomic scaffold using both amino-acid and DNA tokens. Pretrained with masked language modeling on the OMG dataset.

Architecture

Parameter Value
Layers 30
Attention heads 10
Embedding dimension 640
FFN hidden dimension 1792 (SwiGLU, multiple_of=256)
Vocabulary size 37
Positional encoding RoPE (base=10000, non-interleaved)
Normalization RMSNorm
Architecture Pre-LN Transformer with SwiGLU FFN
Max sequence length 4096

Vocabulary: <cls>, <pad>, <eos>, <unk>, the 26 IUPAC amino-acid letters (L A G V S E R T I D P K Q N F Y M H W C X B U Z O, uppercase), the 4 DNA nucleotides (a t c g, lowercase), strand markers <+> / <->, and <mask> / <sep>. Amino-acid and nucleotide tokens share the alphabet by case (uppercase = amino acid, lowercase = nucleotide).

Pretraining

  • Objective: Masked language modeling (30% mask rate)
  • Data: OMG dataset (open metagenomic corpus, semantically-deduplicated)
  • Pretraining tokens: 315B (bfloat16, context length 4096)
  • Source checkpoint: tattabio/gLM2_150M

Parity Verification

All 31 representation levels (embedding + 30 transformer blocks) verified to be bit-exact (max abs diff = 0.00) against the original tattabio/gLM2_150M weights with attn_implementation="sdpa". The added eager and flash_attention_2 backends agree within fp32 kernel drift (atol = 1e-3) and bf16 cosine similarity >= 0.999 respectively. Verified on GPU with PyTorch 2.7 / CUDA 12.

Related Models

See the full gLM2 collection.

Model Parameters Notes
gLM-150M 150M This model
gLM-650M 650M Larger variant

Usage

Embedding generation

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/gLM-150M", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/gLM-150M", trust_remote_code=True)
model.eval()

# Canonical gLM2 input: amino acids (uppercase) + DNA (lowercase) + strand markers.
sequence = (
    "<+>MALTKVEKRNRIKRRVRGKISGTQASPRLSVYKSNK"
    "<+>aatttaaggaa"
    "<->MLGIDNIERVKPGGLELVDRLVAVNRVTKVTKGGRAFGFSAIVVVGNED"
)
enc = tokenizer([sequence], return_tensors="pt")

with torch.no_grad():
    out = model(**enc)

cls_emb   = out.last_hidden_state[:, 0, :]   # (batch, 640) -- CLS token
token_emb = out.last_hidden_state             # (batch, seq_len, 640)

# Intermediate layers
out_all = model(**enc, output_hidden_states=True)
layer15_emb = out_all.hidden_states[15]       # after block 15

The tokenizer also accepts plain DNA strings (no strand marker) and auto-prepares them by lowercasing, replacing U/u with t, and prepending <+>. The three calls below produce identical token sequences:

tokenizer(["ATCGATCG", "atcgatcg", "AUCGAUCG"], return_tensors="pt")

MLM logits

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/gLM-150M", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("Taykhoom/gLM-150M", trust_remote_code=True)
model.eval()

enc = tokenizer(["<+>MA<mask>K"], return_tensors="pt")
with torch.no_grad():
    logits = model(**enc).logits   # (1, seq_len, 37)

Faster attention backends

# SDPA (PyTorch 2.0+, default upstream backend) -- recommended for fp32
model = AutoModel.from_pretrained("Taykhoom/gLM-150M", trust_remote_code=True,
                                  attn_implementation="sdpa")

# Flash Attention 2 (requires flash-attn package) -- fastest on long sequences
model = AutoModel.from_pretrained("Taykhoom/gLM-150M", trust_remote_code=True,
                                  attn_implementation="flash_attention_2",
                                  dtype=torch.bfloat16)

Fine-tuning

Standard HF conventions. For sequence-level tasks, pool over non-padding positions or use the CLS token embedding as input to a prediction head.

Implementation Notes

The original gLM2 implementation uses PyTorch SDPA as the only attention backend. This HF port adds eager and flash_attention_2 as separate implementations selectable via attn_implementation, with eager falling back automatically when output_attentions=True is requested.

The eager kernel computes the QK matmul and softmax in fp32 even when the model is loaded in bf16, matching the numerical behaviour of SDPA and flash_attention_2 in mixed precision.

Citation

@article{cornman2024_glm2,
  title   = {The {OMG} dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling},
  author  = {Cornman, Andre and West-Roberts, Jacob and Camargo, Antonio Pedro and Roux, Simon and Beracochea, Martin and Mirdita, Milot and Ovchinnikov, Sergey and Hwang, Yunha},
  journal = {bioRxiv},
  year    = {2024},
  doi     = {10.1101/2024.08.14.607850}
}

Credits

Original model and code by Cornman et al. (Tatta Bio). Source: GitHub, tattabio/gLM2_150M on the Hub. The HF conversion code was authored primarily by Claude Code and reviewed manually by Taykhoom Dalal.

License

Apache 2.0, following the original repository.

Downloads last month
36
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Taykhoom/gLM-150M

Collection including Taykhoom/gLM-150M