GigScan
Launch a web UI for the GigScan tool
Recently, I took part in Hugging Face’s Build Small Hackathon, a challenge built around using smaller models to make something practical, usable, and real
For me, that problem came from live music. I’m always seeing gig posters around town: on venue walls, street poles, record-store windows, and Instagram stories. They are useful, beautiful, chaotic objects, but they make it into my diary far less than they should.
That’s what lead to my idea for the hackathon: GigScan. A simple app that lets a user take a photo of a gig poster and turn it into a calendar invite.
This meant leveraging small vision model’s text detection and inference speed, while shifting the focus from demonstrating an impressive frontier model to building a useful and reliable app.
This post is about what it took to get there. It is a build log, but not just a technical one. It is about teaching a small vision model to behave reliably, discovering that a trained model is not the same thing as a deployed product, and learning that the best AI tools are often the ones that get out of the way. Mostly, though, it is about the fun of building something small, making it work, and putting it in someone’s hands.
You can check out GigScan here: https://huggingface.co/spaces/build-small-hackathon/gigscan_via_modal
The thing I wanted to build was simple: take a photo of a gig poster and turn it into a calendar invite.
That was the whole product.
Not “describe this image”. Not “extract every bit of text”. Not “make a clever AI demo”. I wanted something I could imagine using on my phone while walking past a poster outside a venue. See a gig, take a photo, save the date, move on.
That constraint made the rest of the build clearer. The minimum useful output was not a transcription. It was a calendar event: name, venue, date, time if it existed, and a short description. If the model could return those fields as JSON, the app could turn them into an iCal file or a Google Calendar link.
This also made GigScan more interesting than straight OCR. Gig posters are messy on purpose: strange layouts, stylised fonts, overlapping text, and weird visual hierarchy. The task was not simply “read the text”. It was “understand this poster well enough to make a useful calendar entry”. The first MVP came together quickly. I used MiniCPM-V 4.6 (a tiny 1B parameter open weight vision model from OpenBMB), wired it into a small Gradio app, and connected the model output to calendar generation. Within a short time I had the basic loop working: poster in, event out.
When I showed the early version to friends, they understood it immediately. More importantly, they said they would use it. That became the standard: not impressive in the abstract, but useful enough that someone would actually put it on their phone.
The base model proved the idea could work. Fine-tuning was about making it reliable. Prompting got me surprisingly far. With a detailed enough prompt, the model could usually pull out the right fields and return something close to the structure I needed. But “usually” is a dangerous word when the next step is creating a calendar event.
The failures were exactly the sort of failures that would make the app feel untrustworthy. Dates came back in inconsistent formats. Years were sometimes guessed. Fields drifted. Sometimes the model would look at something music-adjacent — a record cover, a band photo, a generic music image — and confidently invent an event. A missing time is forgivable. A fake gig in someone’s calendar is not.
So I fine-tuned the model around behaviour rather than benchmarks. I wanted it to return the same fields, in the same format, and to know when not to return anything at all. The key field was is_live_music_poster, which acted like a gate for the rest of the app. If the image was not a poster, the calendar pipeline should stop.
The dataset was small and specific: a few hundred poster images, plus negative examples, labelled into the exact schema the app expected. It was not perfect, but it was enough signal for the task.
That was one of the biggest lessons of the week. I did not need to teach the model what a poster was from scratch. It already had a lot of visual understanding baked in. I was teaching it how it should behave.
On the held-out test set, the fine-tuned model became much more reliable where the app needed it: valid JSON, required fields, date formats, time formats, and poster detection all improved. The point was not that the model had become magically perfect. The point was that it had become predictable enough to build around.
Fine-tuning gave me a model. Deployment taught me that a model is not a product. I wanted to get the fine-tuned model into llama.cpp because the hackathon was about small models, efficient deployment, and learning what it takes to run this stuff outside a hosted API. MiniCPM-V 4.6 already had a GGUF path, so at first this sounded straightforward.
It was not straightforward.
The merged fine-tuned model did not convert the way I expected. The useful artifact turned out to be the LoRA adapter, which could be applied on top of the GGUF base model. Then, when I tried wiring the app through llama-cpp-python, the model appeared to run but could not actually see. The prompt was being sent and the model was responding, but valid posters were rejected or treated as blank.
That was painful because it looked like a model problem, but it was really a runtime problem. The fix was to move closer to upstream llama.cpp, compile a known-good server build, and serve the model through llama-server.
Then came the architecture question. I had the model working on a T4, but that did not mean the app architecture worked. A persistent GPU was too expensive for a small utility that might sit idle most of the time. A CPU Space could not reliably load and run the model. ZeroGPU was worth trying, but the dynamic allocation model was a poor fit for something that wanted to load a llama.cpp server and keep it warm.
The final architecture split the problem in two. Hugging Face Spaces hosted the Gradio app: the interface, the calendar flow, the trace capture, and the user experience. Modal handled inference: running the compiled llama.cpp server and returning structured output.
It was not the neatest version of the dream, but it was the version that fit the product. The first request after the backend had gone quiet could take longer, so the app warns the user. After that, performance returns to normal.
That felt like the right compromise: be honest about the tradeoff, and ship the thing that works.
Once the model path worked, the question became: how fast can someone get from poster to calendar?
GigScan is not an app people should spend time inside. Success means the opposite. The less time someone spends between seeing a poster and saving the event, the better the app is doing its job.
That is why inference runs automatically after upload. Once someone has selected a poster image, their intent is obvious. A separate “Scan” button is just another tap. The final flow became: choose calendar type, upload or take a photo, wait for the result, save the event.
The same principle shaped partial failure. If the time or venue is missing, the app can still generate something useful and tell the user they can edit the calendar invite later. But if the image is not a poster, or the model cannot understand the core event details, the app should stop.
That became the trust contract: show what was found, flag what was missing, and avoid fake certainty.
The visual design followed the same logic. I did not want GigScan to look like a default AI demo or an overdesigned startup website. It needed to feel like a music tool: simple, clear, a little poster-adjacent, and easy to use on a phone.
This was where working with an LLM was genuinely useful. I come from data analysis and dashboard design, not web development. I could take screenshots of the Gradio app, describe what felt wrong, point to poster art and colour palettes I liked, and get back CSS changes I could test immediately. Around the third design pass, it started to click. The logo worked. The buttons were obvious. The flow made sense. It had enough personality to feel intentional, but not so much that it got in the way.
That was when it started feeling like a product.
The thing I am proudest of is simple: GigScan works.
A week ago it was an idea. Now it is deployed. I can share it with friends, bring it to gigs, and see whether it helps people get more shows into their calendars. That was always the point: not to build an abstract AI workflow, but to make something small and practical that fits into a real community.
This project also changed how I think about small models. They are not just weaker versions of big models. Used well, they are practical materials for narrow tools.
MiniCPM-V did not need to be the best general-purpose vision model in the world.
It needed to be good enough at one bounded task, and tinkerable enough that I could shape its behaviour. Combined with open tooling, fine-tuning, quantization, and cheap-enough deployment, that starts to feel powerful.
AI assistance sped up the coding, debugging, CSS, documentation, and deployment work. But the product sense still had to come from me: the music context, the taste, the decision to optimise for fewer taps, the judgement about when to stop chasing a badge and ship a better architecture.
The hackathon reminded me that it is fun to build things. Not because someone says your job is at risk if you do not learn AI, and not because every tool needs to become a company. Sometimes it is enough to learn something new, build something small, and contribute it back.
Build something useful. Make it work. Put it in someone’s hands.
Launch a web UI for the GigScan tool