TurboSkillSlug: What If Your Debug Session Grew a Shell?

Community Article

Published June 9, 2026

Upvote

Anubhav

legendarydragontamer

build-small-hackathon

The Spark
What It Does
The Stack
Technical Choices and Why
What Surprised Me
Built With Codex
What Comes Next
Try It
The Spark

I wanted to answer a question that felt too specific to have a product for it: what would it look like if something watched you code for an hour and then showed you the shape of what happened?

Not a summary. Not a code review. A shape. Something visual that traced the dead ends, the breakthroughs, the moments you went quiet.

I built TurboSkillSlug for the Build Small hackathon.

What It Does

You upload a recording of yourself narrating a build session. The slug sits quietly through it, then gives you three things:

A recap in its own voice: five short observations about what it witnessed, each grounded in something you actually said
A SKILL.md: a structured record of what you tried, why each thing failed, what finally worked, and the gotchas
A shell: a one of a kind SVG whose every visual element maps to something that happened

The shell is the part people stop on. It is not a random illustration. It is a nautilus spiral whose size comes from your session length, whose dark knots are your dead ends, whose iridescent jewels are your gotchas, and whose color gradient runs from your starting mood to your ending mood. A frustrated session that ends in relief looks completely different from a curious exploration that ends in delight.

Every shell is unique because every session is unique.

The Stack

Two models, both small:

Model	Size	Job
openai/whisper-large-v3-turbo	809M	Transcription
Qwen/Qwen2.5-7B-Instruct	7B	Extraction + slug voice

Total: 7.8B parameters. Both run via the HF Inference API. The shell is pure procedural SVG generated in Python with no model involved: logarithmic spirals, HSL color harmonies derived from sentiment, nacre texture via feTurbulence filters, bezier-smoothed curves. No diffusion, no generation. Math that looks like art.

Technical Choices and Why

Procedural SVG over generated images. I considered using a small image model for the shell. I rejected it because the whole point is that the shell is deterministically derived from your session. If I used a diffusion model, the connection between "this knot is your dead end" breaks. The visual meaning has to be traceable.

Whisper via direct HTTP, not the InferenceClient. The huggingface_hub InferenceClient's provider routing system could not handle audio content types correctly in the current version. After four rounds of debugging (provider not found, content type None, BytesIO rejected, file not found), I bypassed it entirely with a raw httpx POST. Sometimes the simplest tool is the one you build yourself.

Qwen for extraction AND the slug voice in one call. A single structured JSON response carries the session features (for the shell), the SKILL.md (for the developer), and the slug's five observations (for the soul). One model call, three outputs. Cheaper, faster, and the model sees the full context for all three.

Audio duration from the file, not the model. Early tests showed Qwen guessing session durations from transcript content ("thirty minutes of debugging" in a 90 second recording). Now the app measures the actual WAV duration and overrides the guess. The shell reflects what the slug actually heard.

What Surprised Me

The slug voice is the hardest part and it is not an engineering problem. It is a writing problem. I hand wrote 50 seed utterances to define the tone: earnest, specific, never cute. Lines like "You tried the same thing three times. The third time you changed one small word, and it listened." Getting a 7B model to generate in that register without copying the examples verbatim is an ongoing fight. A fine tuned small model would solve it; prompting a general purpose model is a workaround.

The shell quality depends almost entirely on extraction quality. Two sessions that produce the same sentiment arc and similar feature counts generate similar looking shells. Making the extraction honest (a quick easy session should end "joyful" not "resolved," a grinding marathon should end "exhausted" not "resolved") is what makes the shells visually distinct.

Built With Codex

The entire codebase was built using OpenAI Codex as the primary coding agent. Every commit in the repo is Codex attributed. The things that needed human judgment: the slug's voice, the shell's visual tuning, and the grounding constraints that keep the slug honest.

Full code: github.com/AnubhavBharadwaaj/turbo-skill-slug

What Comes Next

A fine tuned SlugVoice model on Modal, trained on hand crafted (transcript, slug line) pairs so the voice is a real small model, not a prompt wrapper
TTS so the slug speaks its recap aloud in a slow gentle cadence
A shell gallery where builders can share their shells
Input from agent session logs (Codex traces, Claude Code sessions) instead of audio, so the slug watches your AI collaboration, not just your typing