MicroLens · Microscopy Vision-Language Model
Gemma 4 E2B fine-tuned on 122,399 microscopy VQA pairs across 145+ genera — diatoms, freshwater and marine zooplankton, fungal spores, fish larvae. Three open versions, all Apache 2.0, all running 100 % offline.
- Base model:
unsloth/gemma-4-E2B-it(4.65 B params) - Source code:
SergheiBrinza/microlens - Android APK: GitHub Releases
- Browser demo: HuggingFace Space
- Ollama Hub:
brinzaengineeringai/microlens-v{1,2,3} - Submission for: Kaggle Gemma 4 Good Hackathon 2026 — Health & Sciences track
- License: Apache 2.0 (code, weights, datasets — see LICENSE_AUDIT.md)
Model versions in this repository
| Version | Style | When to use |
|---|---|---|
| v1 | Naive baseline | Historical reference, training-progression demo |
| v2 | Production brief — single line | Pipelines, automated databases, low-latency |
| v3 | Production rich — four sections (genus + morphology + habitat + ID cues) | Citizen science, classroom, judge-facing UIs |
Each version ships as GGUF Q4_K_M (3.4 GB) + BF16 mmproj (940 MB) for llama.cpp / Ollama / mtmd:
gguf/
├── gemma-4-e2b-it.Q4_K_M.gguf # v1 model (3.4 GB)
├── gemma-4-e2b-it.BF16-mmproj.gguf # v1 mmproj (940 MB)
├── v2/
│ ├── model.gguf # v2 model (3.4 GB)
│ └── mmproj.gguf # v2 mmproj (940 MB)
└── v3/
├── model.gguf # v3 model (3.4 GB)
└── mmproj.gguf # v3 mmproj (940 MB)
Plus the LoRA adapter, merged FP16 model, tokenizer, and processor for downstream training.
Run it
# easiest — Ollama
ollama run brinzaengineeringai/microlens-v3
# stock llama.cpp
llama-server \
-m gguf/v3/model.gguf \
--mmproj gguf/v3/mmproj.gguf \
-ngl 99
Training summary
| Metric | v2 | v3 |
|---|---|---|
| Base model | unsloth/gemma-4-E2B-it (4.65 B) | resumed from v2 checkpoint-18351 |
| Trainable params | LoRA r=32, α=64 | same adapter, 1 extra epoch |
| Train samples | 99,215 | 99,215 (rich-format augmented from KB) |
| Validation samples (held-out) | 12,331 | 220 stratified |
| Epochs | 3 | 1 |
| Hardware | RTX 3090 Ti | RTX 3090 Ti |
| Training time | 37 h | 10 h 16 m |
| Final eval loss | ≈ 0.21 | 0.0213 |
| Framework | Unsloth FastVisionModel · 4-bit QLoRA | same |
Evaluation
Stratified evaluation on held-out val.jsonl (n = 220 across 5 categories):
| Metric | v2 | v3 |
|---|---|---|
| Category accuracy | 97 – 100 % | 97 – 100 % |
| Genus accuracy (145+ genera, random ≈ 0.7 %) | ~ 45 % | ~ 45 % (no drift) |
| Rich-format compliance | n/a | 100 % |
Datasets
122,399 image–question–answer triples from public, licence-clean microscopy collections:
| Category | Source | Samples |
|---|---|---|
| Diatoms | UDE Diatoms in the Wild 2024, DiAtlas | 97 k |
| Freshwater zooplankton | ZooLake (EAWAG) | 12 k |
| Marine zooplankton | V1-Zooplankton | 10 k |
| Fungal spores | TGFC fungal | 5 k |
| Fish larvae | ZooLake | 0.2 k |
| Negative-class (no specimen) | synthetic | 1.5 k |
145+ genera with long-tail handled by category-level fallback. Top-30 genera have hand-curated knowledge-base entries (morphology + habitat + identification cues from AlgaeBase, WoRMS, ITIS, Round 1990, Krammer-Lange-Bertalot 1986–1991). Full per-source attribution: docs/LICENSE_AUDIT.md. No verbatim copyrighted text was used; KB phrasing is original paraphrase.
Limitations
- Long-tail genera (~100 of 145+ have <100 training samples each) get category-generic morphology rather than genus-specific descriptions. The 30 most-common genera have hand-curated entries.
- Pseudo-class
Fishhas no species-level annotation in the training data; v3 falls back to a category-level templated description. - Mobile cold-start on sub-$100 phones takes ~3-4 minutes (model load + 292-token prefill on 6 CPU cores). Subsequent inferences in the same session are much faster. Use desktop or browser for real-time work.
- English-first scientific output. Translation through the vanilla Gemma 4 base covers 140+ languages but may simplify rare technical terms.
⚠️ Legal — research and educational artefact
Research model · Apache 2.0 · Not a medical device · Not a certified instrument · Use at your own risk.
This model is NOT a medical device, in-vitro diagnostic (IVD), clinical decision-support tool, or regulatory-compliant water-quality measurement instrument (no ISO 17025, EPA, EU Water Framework Directive, or equivalent certification). It is not a substitute for a trained taxonomist or accredited laboratory analysis, and not a calibrated, validated, or peer-reviewed analytical method.
The model's output is a statistical pattern match against the training data distribution, rendered through learned scientific phrasing — not a physical or analytical measurement, not a peer-reviewed identification. The model can be confidently wrong, particularly on:
- specimens not represented in training (145+ genera ≠ all microscopic life);
- damaged, atypical, or out-of-focus images;
- subjects from kingdoms or phyla outside the training categories.
No warranty. This software is provided AS IS, without warranty of any kind, express or implied. In no event shall the author or contributors be liable for any claim, damages, or other liability arising from, out of, or in connection with the model. You assume all risk when downloading, deploying, modifying, or using this model. Always have qualified personnel verify any result that informs a regulatory, environmental, clinical, or health-related decision.
EU AI Act positioning (Regulation EU 2024/1689): Per Article 50 you are interacting with an AI system; outputs are AI-generated content. Per Article 2(6) the artefact is published under the research exemption. It is not a high-risk AI system within the meaning of Annex III, not a safety component of an Annex I product, and does not engage in any practice prohibited by Article 5. Gemma 4 (the upstream GPAI model) is provided by Google DeepMind; obligations under Chapter V rest with the GPAI provider, not the downstream researcher.
Full legal documents:
- TERMS.md — Terms of use · AS-IS · liability cap
- PRIVACY.md — Privacy notice · GDPR positioning · no telemetry
- AI_ACT.md — EU AI Act positioning statement
- MODEL_CARD.md — Full model card
By downloading or using these weights you acknowledge that you have read and accept the Terms of Use and Privacy Notice.
◆ Serghei Brinza · Vienna · Austria · 2026 ◆
- Downloads last month
- 1,556
4-bit