MicroLens · Microscopy Vision-Language Model

Gemma 4 E2B fine-tuned on 122,399 microscopy VQA pairs across 145+ genera — diatoms, freshwater and marine zooplankton, fungal spores, fish larvae. Three open versions, all Apache 2.0, all running 100 % offline.


Model versions in this repository

Version Style When to use
v1 Naive baseline Historical reference, training-progression demo
v2 Production brief — single line Pipelines, automated databases, low-latency
v3 Production rich — four sections (genus + morphology + habitat + ID cues) Citizen science, classroom, judge-facing UIs

Each version ships as GGUF Q4_K_M (3.4 GB) + BF16 mmproj (940 MB) for llama.cpp / Ollama / mtmd:

gguf/
├── gemma-4-e2b-it.Q4_K_M.gguf       # v1 model (3.4 GB)
├── gemma-4-e2b-it.BF16-mmproj.gguf  # v1 mmproj (940 MB)
├── v2/
│   ├── model.gguf                   # v2 model (3.4 GB)
│   └── mmproj.gguf                  # v2 mmproj (940 MB)
└── v3/
    ├── model.gguf                   # v3 model (3.4 GB)
    └── mmproj.gguf                  # v3 mmproj (940 MB)

Plus the LoRA adapter, merged FP16 model, tokenizer, and processor for downstream training.

Run it

# easiest — Ollama
ollama run brinzaengineeringai/microlens-v3

# stock llama.cpp
llama-server \
    -m gguf/v3/model.gguf \
    --mmproj gguf/v3/mmproj.gguf \
    -ngl 99

Training summary

Metric v2 v3
Base model unsloth/gemma-4-E2B-it (4.65 B) resumed from v2 checkpoint-18351
Trainable params LoRA r=32, α=64 same adapter, 1 extra epoch
Train samples 99,215 99,215 (rich-format augmented from KB)
Validation samples (held-out) 12,331 220 stratified
Epochs 3 1
Hardware RTX 3090 Ti RTX 3090 Ti
Training time 37 h 10 h 16 m
Final eval loss ≈ 0.21 0.0213
Framework Unsloth FastVisionModel · 4-bit QLoRA same

Evaluation

Stratified evaluation on held-out val.jsonl (n = 220 across 5 categories):

Metric v2 v3
Category accuracy 97 – 100 % 97 – 100 %
Genus accuracy (145+ genera, random ≈ 0.7 %) ~ 45 % ~ 45 % (no drift)
Rich-format compliance n/a 100 %

Datasets

122,399 image–question–answer triples from public, licence-clean microscopy collections:

Category Source Samples
Diatoms UDE Diatoms in the Wild 2024, DiAtlas 97 k
Freshwater zooplankton ZooLake (EAWAG) 12 k
Marine zooplankton V1-Zooplankton 10 k
Fungal spores TGFC fungal 5 k
Fish larvae ZooLake 0.2 k
Negative-class (no specimen) synthetic 1.5 k

145+ genera with long-tail handled by category-level fallback. Top-30 genera have hand-curated knowledge-base entries (morphology + habitat + identification cues from AlgaeBase, WoRMS, ITIS, Round 1990, Krammer-Lange-Bertalot 1986–1991). Full per-source attribution: docs/LICENSE_AUDIT.md. No verbatim copyrighted text was used; KB phrasing is original paraphrase.

Limitations

  • Long-tail genera (~100 of 145+ have <100 training samples each) get category-generic morphology rather than genus-specific descriptions. The 30 most-common genera have hand-curated entries.
  • Pseudo-class Fish has no species-level annotation in the training data; v3 falls back to a category-level templated description.
  • Mobile cold-start on sub-$100 phones takes ~3-4 minutes (model load + 292-token prefill on 6 CPU cores). Subsequent inferences in the same session are much faster. Use desktop or browser for real-time work.
  • English-first scientific output. Translation through the vanilla Gemma 4 base covers 140+ languages but may simplify rare technical terms.

⚠️ Legal — research and educational artefact

Research model · Apache 2.0 · Not a medical device · Not a certified instrument · Use at your own risk.

This model is NOT a medical device, in-vitro diagnostic (IVD), clinical decision-support tool, or regulatory-compliant water-quality measurement instrument (no ISO 17025, EPA, EU Water Framework Directive, or equivalent certification). It is not a substitute for a trained taxonomist or accredited laboratory analysis, and not a calibrated, validated, or peer-reviewed analytical method.

The model's output is a statistical pattern match against the training data distribution, rendered through learned scientific phrasing — not a physical or analytical measurement, not a peer-reviewed identification. The model can be confidently wrong, particularly on:

  • specimens not represented in training (145+ genera ≠ all microscopic life);
  • damaged, atypical, or out-of-focus images;
  • subjects from kingdoms or phyla outside the training categories.

No warranty. This software is provided AS IS, without warranty of any kind, express or implied. In no event shall the author or contributors be liable for any claim, damages, or other liability arising from, out of, or in connection with the model. You assume all risk when downloading, deploying, modifying, or using this model. Always have qualified personnel verify any result that informs a regulatory, environmental, clinical, or health-related decision.

EU AI Act positioning (Regulation EU 2024/1689): Per Article 50 you are interacting with an AI system; outputs are AI-generated content. Per Article 2(6) the artefact is published under the research exemption. It is not a high-risk AI system within the meaning of Annex III, not a safety component of an Annex I product, and does not engage in any practice prohibited by Article 5. Gemma 4 (the upstream GPAI model) is provided by Google DeepMind; obligations under Chapter V rest with the GPAI provider, not the downstream researcher.

Full legal documents:

By downloading or using these weights you acknowledge that you have read and accept the Terms of Use and Privacy Notice.


◆ Serghei Brinza · Vienna · Austria · 2026 ◆

Downloads last month
1,556
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Laborator/microlens-gemma4-e2b

Adapter
(12)
this model