Puck
A mischievous desktop fairy that comments on your work
A Build Small entry — Thousand Token Wood (Creative). Play with it: the Space · Watch: 2-min demo
Puck is a small, mischievous creature that lives on your desktop. He roams, peeks at one little patch of whatever you're doing, and murmurs a single in-character line about it — then drifts on. He's not an assistant and not a notifier. He's company: marginally useful, reliably charming. Here are the notes from building him — mostly the places where the obvious approach was wrong and a small model (or a tiny trick) turned out to be the right one.
The first version alerted on everything — build failures, mentions, mail, a finished agent run. It was useful and completely charmless, and it inundated you. A round of research into notification UX (the calm-technology and "ambient/peripheral display" literature) said the quiet part out loud: the value of a companion isn't in its alerts, it's in its presence. So the alert engine went dormant and Puck was rebuilt around one loop: roam → peek → quip. ~95% ambient, near-zero interruptions. He's "marginally useful" on purpose.
The instinct is a pipeline: a vision model describes the patch → a text model writes a quip in character. Two calls, two models, more latency, more drift. Instead a single 12B VLM — Holotron-12B, H Company's computer-use model post-trained from NVIDIA's Nemotron-Nano-12B-VL — gets a system prompt that is Puck, and reacts to the image directly. The model that sees is the model that speaks — and the quip is grounded in pixels, not a lossy description.
Puck should know whether you're in Claude Code, Codex, opencode, or pi. The hackathon-shaped answer is a CLIP fingerprinter: embed labeled screenshots, match by cosine. I built it… and it was fragile. On dark terminal screens with small text, every embedding clusters around 0.85–0.95 cosine — the margin between "this is Claude" and "this is Codex" was ~0.05, and a blank dark patch would confidently match something.
CLIP-ViT-B/32 at 224² simply can't read the text that distinguishes these tools. So I stopped
asking it to. macOS ships a perfectly good on-device OCR (the Vision framework); a tiny Swift
binary reads the prompt/status line and a keyword map nails the tool deterministically — 10/10
across all five CLIs, in ~0.25s. The discriminator was never the look; it was the words
(gpt-5.5 xhigh, GLM-5.1, OpenCode, pi v0.78). And it's region-local — it reads the
patch under the sprite, unlike a window title, which lies under tabbed terminals and browsers.
Each peek should carry an emotion that drives Puck's gesture, color, and voice. Two surprises:
[amused] <line> got a lovely line
and no tag, every time. Format-following is where small models are weakest.The fix: classify the emotion from the OCR'd screen text (where the real sentiment lives — ALL-CAPS, swearing, green checkmarks, a wall of tracebacks), as a separate one-word call. A single word is the one format a small model nails.
Active camouflage: cloak Puck into the desktop. The first cut blurred and dimmed his body (frosted glass) — which looked cool and made the content behind him unreadable. The predator/thermoptic look is the opposite: the background shows through sharp, with just a shimmer + a refractive rim so you can tell a cloaked thing is there. The trick was dropping the blur entirely and using non-blurring filters (brightness/contrast/hue) for the shimmer.
Nothing here is over 32B and it fits on a laptop: a 12B VLM for eyes-and-voice-of-the-fairy, an 88M CLIP fingerprinter, on-device OCR, and an 82M Kokoro neural voice running in the browser. Small models, composed, doing something that's mostly just… delightful.
The notification engine in §1 was a bad starting point — a bot that sprays you with every event hasn't earned the right to interrupt you. But "ambient companion" isn't the ceiling; it's the foundation you build trust on.
The future I see for Puck is an assistant that earns its usefulness. It watches (as it already does), but over time it learns what actually matters to you — then it does two things you'd otherwise do yourself: it bubbles up the few things worth your attention, and it handles the trivial ones (clicking, typing, submitting) so you can stay in flow. Not a louder notifier; a quieter one that's almost always right, plus a pair of hands for the busywork.
The engine for that is the part that looks like whimsy today: sleep. Every day's peeks land in the memory garden; at night Puck blooms them into a smaller, sharper sense of your world — and that distilled signal is exactly the dataset to fine-tune a model that knows your priorities, not a generic one's. Computer-use to act, learned relevance to decide, nightly fine-tuning to improve. He starts marginally useful on purpose, so that by the time he's useful for real, you already trust him.
Built with Hugging Face · Modal · Holotron-12B (post-trained from NVIDIA Nemotron). Try Puck: the Space.
A mischievous desktop fairy that comments on your work
More from this author