MediMind-411M
MediMind-411M is a custom medical language model trained from scratch for biomedical and clinical text generation.
This model was trained and uploaded by Koyeliya Ghosh under the Hugging Face account koyelog.
Overview
MediMind-411M is a 411M-parameter transformer-based language model designed to generate medical-style text.
It was trained on a large medical text collection and uses a custom tokenizer.
Training Summary
- Model name:
MediMind-411M - Parameters: approximately 411.1M
- Training device: Kaggle GPU T4 x2
- Total texts loaded: 171,047
- Training samples tokenized: 50,000
- Total batches: 12,500
- Final average loss: 4.9253
- Total runtime: about 5536.5 seconds (~92 minutes)
Architecture
This model uses a decoder-only transformer architecture with:
- Embedding dimension: 1024
- Layers: 24
- Attention heads: 16
- KV heads: 8
- RoPE positional encoding
- RMSNorm
- SwiGLU-style feed-forward layers
Files in this Repository
medimind_final.pt— final trained model weightscheckpoint_latest.pt— latest training checkpointvocab.json— tokenizer vocabularymerges.txt— tokenizer merges
Testing
The model was tested locally in a Kaggle notebook by:
- Downloading the model files from this Hugging Face repository
- Loading the tokenizer using
vocab.jsonandmerges.txt - Rebuilding the training architecture in PyTorch
- Loading
medimind_final.pt - Generating outputs from medical prompts
Example test prompts
Patient presents with fever and cough. Diagnosis:Symptoms of diabetes includeTreatment for hypertension includes
Observed behavior
The model successfully generates medical-style text and terminology.
Outputs show that the model has learned domain vocabulary and sentence patterns, but generations can still be noisy, mixed-topic, or clinically unreliable.
Limitations
- This is an early-stage base language model, not an instruction-tuned chatbot.
- It may produce incorrect, incomplete, or hallucinated medical statements.
- It should not be used for real medical diagnosis, treatment, or decision-making.
- Output quality can vary depending on prompt style and decoding settings.
Intended Use
This model is intended for:
- learning and experimentation
- research practice
- testing custom LLM training pipelines
- educational exploration of medical text generation
This model is not intended for direct clinical deployment or patient-facing use.
Example Usage
from huggingface_hub import hf_hub_download
from tokenizers import ByteLevelBPETokenizer
import torch
model_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="medimind_final.pt")
vocab_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="vocab.json")
merges_path = hf_hub_download(repo_id="koyelog/MediMind-411M", filename="merges.txt")
tokenizer = ByteLevelBPETokenizer(vocab_path, merges_path)
print("Load tokenizer and model architecture, then run generation.")
Future Work
Planned next improvements:
- cleaner inference pipeline
- better decoding and stopping rules
- further training epochs
- instruction tuning on medical QA data
- model card improvements and benchmark evaluation
Author
Created by Koyeliya Ghosh
Hugging Face: koyelog
Disclaimer
This model is for research and educational purposes only.
It must not be used as a substitute for licensed medical advice or professional healthcare judgment.