MediMind-411M

MediMind-411M is a custom medical language model trained from scratch for biomedical and clinical text generation.

This model was trained and uploaded by Koyeliya Ghosh under the Hugging Face account koyelog.

Overview

MediMind-411M is a 411M-parameter transformer-based language model designed to generate medical-style text.
It was trained on a large medical text collection and uses a custom tokenizer.

Training Summary

Model name: MediMind-411M
Parameters: approximately 411.1M
Training device: Kaggle GPU T4 x2
Total texts loaded: 171,047
Training samples tokenized: 50,000
Total batches: 12,500
Final average loss: 4.9253
Total runtime: about 5536.5 seconds (~92 minutes)

Architecture

This model uses a decoder-only transformer architecture with:

Embedding dimension: 1024
Layers: 24
Attention heads: 16
KV heads: 8
RoPE positional encoding
RMSNorm
SwiGLU-style feed-forward layers

Files in this Repository

medimind_final.pt — final trained model weights
checkpoint_latest.pt — latest training checkpoint
vocab.json — tokenizer vocabulary
merges.txt — tokenizer merges

Testing

The model was tested locally in a Kaggle notebook by:

Downloading the model files from this Hugging Face repository
Loading the tokenizer using vocab.json and merges.txt
Rebuilding the training architecture in PyTorch
Loading medimind_final.pt
Generating outputs from medical prompts

Example test prompts

Patient presents with fever and cough. Diagnosis:
Symptoms of diabetes include
Treatment for hypertension includes

Observed behavior

The model successfully generates medical-style text and terminology.
Outputs show that the model has learned domain vocabulary and sentence patterns, but generations can still be noisy, mixed-topic, or clinically unreliable.

Limitations

This is an early-stage base language model, not an instruction-tuned chatbot.
It may produce incorrect, incomplete, or hallucinated medical statements.
It should not be used for real medical diagnosis, treatment, or decision-making.
Output quality can vary depending on prompt style and decoding settings.

Intended Use

This model is intended for:

learning and experimentation
research practice
testing custom LLM training pipelines
educational exploration of medical text generation

This model is not intended for direct clinical deployment or patient-facing use.

Example Usage

from huggingface_hub import hf_hub_download
from tokenizers import ByteLevelBPETokenizer
import torch

model_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="medimind_final.pt")
vocab_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="vocab.json")
merges_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="merges.txt")

tokenizer = ByteLevelBPETokenizer(vocab_path, merges_path)

print("Load tokenizer and model architecture, then run generation.")

Future Work

Planned next improvements:

cleaner inference pipeline
better decoding and stopping rules
further training epochs
instruction tuning on medical QA data
model card improvements and benchmark evaluation

Author

Created by Koyeliya Ghosh
Hugging Face: koyelog

Disclaimer

This model is for research and educational purposes only.
It must not be used as a substitute for licensed medical advice or professional healthcare judgment.

Downloads last month: -; Downloads are not tracked for this model. How to track