I built AI eyes for my blind father, and it runs on tiny models

Community Article
Published June 15, 2026

My dad has been blind since I was nine. An illness took his sight, and I grew up watching him work around it.

The things that get in his way aren't dramatic. They're small, and they happen every day. How much money is in his hand. Whether the box he's holding is his cough syrup or something else. What a bill actually says, and when it's due.

So for Hugging Face's Build Small Hackathon, I built him Iris. He holds up his phone, asks a question out loud, and it answers him. "How much is this note?" "What color is this shirt?" "What's on the table?" It reads the answer back to him, out loud.

This is a short write-up of how I built it, and what I learned choosing models for a real person instead of a benchmark.

The whole thing fits under 3 billion parameters

The hackathon rule is that everything has to run on small models, under 32B parameters. Iris is far below that, around 2.5B in total:

  • Qwen3-VL-2B reads the scene and the question together (the camera plus the words).
  • Faster Whisper turns my dad's voice into text.
  • Piper speaks the answer back, in a natural voice.

Speech and voice run on the CPU, so the GPU only handles vision. The frontend is custom, built on gr.Server, and the whole screen is the button. No menus, nothing small to find.

I picked the models on my dad's tasks, not a leaderboard

This is the part I care about most. I didn't choose the vision model from a chart. I sat down with the things my dad actually does and tested them.

The clearest example: reading a medicine label, in Portuguese. One popular small vision model answered in English and got the dose wrong. That is not a small bug when it's medicine. Qwen3-VL read it correctly, in Portuguese, so Qwen got the job.

The rest followed the same rule. Faster Whisper because it's quick and accurate at hearing him. Piper because it sounds natural and runs locally. Every piece was chosen against a real task.

I built the whole thing with Claude Code, and I published the build trace openly, so anyone can see the steps and the dead ends: https://huggingface.co/datasets/build-small-hackathon/iris-agent-trace

What "build small" actually buys you

Small isn't a compromise here. It's the point.

Today Iris runs on a phone, with a free GPU doing the vision (Hugging Face ZeroGPU). But the stack is small enough that the real destination is running it entirely on the phone itself: private, instant, no connection needed. For assistive tech, that matters. Your eyes shouldn't depend on a signal.

I'm not there yet, and I won't pretend I am. But under 3 billion parameters is a real path to it.

The part I didn't plan for

I filmed my dad using it. At one point he stopped and said that, for someone who can't see, it felt like being able to see again. I'll be honest, that one got me. You can watch it here: https://youtu.be/h4AJOWuDCVc

Iris is for my father. But it's for anyone living with sight loss, anyone who can't see what's right in front of them.

It's free and open, and it runs in a phone's browser: https://huggingface.co/spaces/build-small-hackathon/iris

One honest note: Iris describes and reads. It is not a mobility aid. Small models can't judge distance reliably, and I'd rather be clear about that than oversell it.

Built by Marcus Ramalho with Claude Code, for the Build Small Hackathon (Backyard AI track).

Community

Sign up or log in to comment