MediMind-411M

MediMind-411M is a custom medical language model trained from scratch for biomedical and clinical text generation.

This model was trained and uploaded by Koyeliya Ghosh under the Hugging Face account koyelog.

Overview

MediMind-411M is a 411M-parameter transformer-based language model designed to generate medical-style text.
It was trained on a large medical text collection and uses a custom tokenizer.

Training Summary

  • Model name: MediMind-411M
  • Parameters: approximately 411.1M
  • Training device: Kaggle GPU T4 x2
  • Total texts loaded: 171,047
  • Training samples tokenized: 50,000
  • Total batches: 12,500
  • Final average loss: 4.9253
  • Total runtime: about 5536.5 seconds (~92 minutes)

Architecture

This model uses a decoder-only transformer architecture with:

  • Embedding dimension: 1024
  • Layers: 24
  • Attention heads: 16
  • KV heads: 8
  • RoPE positional encoding
  • RMSNorm
  • SwiGLU-style feed-forward layers

Files in this Repository

  • medimind_final.pt — final trained model weights
  • checkpoint_latest.pt — latest training checkpoint
  • vocab.json — tokenizer vocabulary
  • merges.txt — tokenizer merges

Testing

The model was tested locally in a Kaggle notebook by:

  1. Downloading the model files from this Hugging Face repository
  2. Loading the tokenizer using vocab.json and merges.txt
  3. Rebuilding the training architecture in PyTorch
  4. Loading medimind_final.pt
  5. Generating outputs from medical prompts

Example test prompts

  • Patient presents with fever and cough. Diagnosis:
  • Symptoms of diabetes include
  • Treatment for hypertension includes

Observed behavior

The model successfully generates medical-style text and terminology.
Outputs show that the model has learned domain vocabulary and sentence patterns, but generations can still be noisy, mixed-topic, or clinically unreliable.

Limitations

  • This is an early-stage base language model, not an instruction-tuned chatbot.
  • It may produce incorrect, incomplete, or hallucinated medical statements.
  • It should not be used for real medical diagnosis, treatment, or decision-making.
  • Output quality can vary depending on prompt style and decoding settings.

Intended Use

This model is intended for:

  • learning and experimentation
  • research practice
  • testing custom LLM training pipelines
  • educational exploration of medical text generation

This model is not intended for direct clinical deployment or patient-facing use.

Example Usage

from huggingface_hub import hf_hub_download
from tokenizers import ByteLevelBPETokenizer
import torch

model_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="medimind_final.pt")
vocab_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="vocab.json")
merges_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="merges.txt")

tokenizer = ByteLevelBPETokenizer(vocab_path, merges_path)

print("Load tokenizer and model architecture, then run generation.")

Future Work

Planned next improvements:

  • cleaner inference pipeline
  • better decoding and stopping rules
  • further training epochs
  • instruction tuning on medical QA data
  • model card improvements and benchmark evaluation

Author

Created by Koyeliya Ghosh
Hugging Face: koyelog

Disclaimer

This model is for research and educational purposes only.
It must not be used as a substitute for licensed medical advice or professional healthcare judgment.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support