ZeroWasteKitchen: five small models, one banana curry, and what 4B parameters are actually for

Community Article Published June 14, 2026

Field notes from the Build Small Hackathon — Backyard AI track

🤗 Try it: build-small-hackathon/zerowastekitchen

The problem — mine, and probably yours too

I run a vegetarian South Indian kitchen. Every week something loses the race — coriander wilts in the fridge, paneer expires, the dal packet at the back of the cupboard quietly crosses its date, curry leaves go crisp in the wrong way.

This is not just my kitchen. Modern life almost guarantees food waste. We buy in bulk from warehouse stores because per unit it is cheaper. Then the week gets busy, nobody plans meals around what is expiring, and the discount at the till becomes a write-off in the bin. Not because we don't care — because we don't remember. The receipt knows everything that came into the house — fridge, freezer, cupboard, fruit bowl. Nothing in the kitchen tells you when each thing needs to leave.

My motto for this build: it is easier to save money than to earn it. Every rescued bunch of coriander is income I don't have to go out and make — and in a world where fewer people farm and food security is no longer something to take for granted, wasting less of what's already grown matters more than ever.

So the idea: photograph the receipt, let small models do the remembering. Track what came in, estimate when it expires, and when something is about to turn — a Telugu grandmother tells you what to cook with it. Out loud. With love and a little guilt, the way grandmothers do.

The stack: five small models, one T4 each

The hackathon rule is small models only. I went much smaller than the limit — nothing in this app is above 4.6B parameters:

Job Model Size
Reading the receipt openbmb/MiniCPM-V-4.6 4.6B
Estimating shelf life nvidia/Nemotron-Mini-4B-Instruct 4B
Character dialogue CohereLabs/tiny-aya-fire small
Writing the recipe nvidia/Nemotron-Mini-4B-Instruct 4B
Speaking it aloud openbmb/VoxCPM2 2B

Everything runs on single Modal T4 GPUs. The Gradio frontend lives on a CPU Space and calls the GPU functions by name. SQLite in the Space's persistent storage holds the pantry.

Tracking what you actually used

The pantry tab shows everything from the receipt — fresh produce, dairy, dry goods, the lot — with traffic lights: 🔴 cook today, 🟡 use this week, 🟢 fine, sorted by expiry.

My pantry, sorted by expiry When you cook something, you mark the item as used. In SQLite it is just a used flag, so the item drops out of the pantry view and out of future recipes, but the history stays. Scan receipt, cook, mark used, repeat. The loop is the whole product.

Receipt OCR with MiniCPM-V worked almost immediately and barely needed touching after. That is worth saying up front, because it is the real lesson of the project: where a small model does the narrow thing it was built for, it just works. The pain is wherever you ask a small model to behave like a big one.

The banana-apple-cabbage curry

Recipe generation is the heart of the app, and it is where the 4B model humbled me.

First version did the obvious thing: here is the pantry, here are the expiring items, write a recipe. The model used everything. Fourteen ingredients in one dish. Turkish pears boiled with french beans. A "biryani" with no rice. At one point it proposed a curry of cabbage, banana and apples. Technically it used my expiring items. Practically, nobody is eating that.

Here is an actual early output:

RECIPE NAME: Vegetable Biryani with Chilly and Saffron
INGREDIENTS USED: POSO FRENCH BEANS, TURKISH PEARS, AUBERGINE KENYAN,
KERALALA DELIGHTS, AVOCADO LARGE, CHILI CHILLY, CURRIS LEAVES PACKET,
TURIA (RIDGE GOURD), GULAB JEERA, CARROTS, GREEN B/YEVE.
STEPS:
1. Rinse POSO FRENCH BEANS and TURKISH PEARS.
2. In a large pot, bring water to a boil. Add POSO FRENCH BEANS and
   TURKISH PEARS. Cook until tender.
...

Boiled pears and beans. A biryani with no rice. Eleven ingredients.

I tried the standard moves. Lower temperature — still chaos, just confident chaos. Stricter prompt rules ("use ONLY listed ingredients", "never put fruit in savoury dishes") — the model read the rules and listed every item anyway, fruit included. A 4B model knows what a good curry is. It just cannot resist a long list when you show it a long list.

What actually fixed it

Three changes. None of them was "use a bigger model".

1. Two-pass generation, with Python holding the rails. I split one hard task into two easy ones. Pass 1: pick 4–6 items that genuinely belong in one dish. Then plain Python validates the selection — no prompt-following needed there. Pass 2 generates the recipe and only ever sees the chosen items. It physically cannot over-stuff a dish it was never shown.

# Pass 1: model picks a small subset
selection = generate(select_prompt)          # "choose 4-6 items..."

# Python validates - fruits never survive into a savoury dish,
# even if the model picks them
FRUIT_WORDS = ["pear", "apple", "banana", "grape", "mango", ...]
chosen = [p for p in pantry
          if matches(selection, p)
          and not any(f in p.lower() for f in FRUIT_WORDS)][:6]

# Pass 2: model only sees the chosen items
recipe = generate(recipe_prompt(chosen))

2. One few-shot example. A single complete, well-formed recipe in the prompt, in the exact output format. Small models copy examples far more reliably than they follow abstract rules. This one addition did more than every rule I wrote.

3. The embarrassing one: use the model's actual chat template. I had been hand-writing <|system|> style markers — a format this model never saw in training. Switching to the tokenizer's own template gave an immediate quality jump.

# Before (wrong): hand-made format the model was never trained on
prompt = f"<|system|>\n{system}\n<|user|>\n{user}\n<|assistant|>\n"

# After (correct): the model's own template
inputs = tokenizer.apply_chat_template(
    [{"role": "system", "content": system},
     {"role": "user", "content": user}],
    add_generation_prompt=True,
    return_tensors="pt", return_dict=True,
)

If your small model looks dumber than its benchmarks, check your template before blaming the weights.

After all three, same pantry:

RECIPE NAME: Dhaniya Paneer Uttapam
INGREDIENTS USED: Dhaniya, India Wala Paneer, Curry Leaves, French Beans
STEPS:
1. Heat oil in a pan, add mustard seeds and curry leaves, and let them crackle.
2. Add chopped green chillies and french beans, and saute for five minutes.
3. Add paneer cubes with salt and turmeric, and stir gently.
4. Add dhaniya leaves and cook until they wilt.
5. Serve hot with rice or roti.

Four ingredients. Proper South Indian tempering order — mustard seeds and curry leaves first, as is right. A dish a real person would cook. (Yes, the name says uttapam and the steps make a stir-fry. 4B models keep you humble.)

Ammammas recipe and narration

Why a grandmother?

Ammamma (Grandmother) in my life was special. She gave so much, endlessly — and yet when something needed correcting, she never scolded. She would tell me gently, always with a proper reason, so the lesson stayed with me long after. That is exactly the tone I wanted this app to have: not an alarm that nags you about expiry dates, but a warm voice that nudges you kindly and explains why. An Ammamma doesn't shame you for the wilting coriander. She just shows you what to make with it. Building her into the app was my way of keeping a little of that gentleness in my kitchen.

The voice that changed halfway through

I wanted Ammamma to sound like an Ammamma, and I had no voice actor. VoxCPM2's Voice Design solved the casting: describe the voice in plain text — "a very elderly Indian grandmother, soft aged voice full of warmth, slow gentle storytelling pace" — and the model creates it. I generate each character's voice once, cache the sample on a Modal volume, and clone from it after that.

Except my first cloning attempt used the weakest mode — reference audio alone — and a full recipe is narrated in chunks. Ammamma greeted me in one voice and read step 3 in another. The fix was in the documentation the whole time: Hi-Fi cloning passes the reference audio and its exact transcript together, which anchors every chunk to the same voice. Since I generated the reference myself, I had a perfect transcript by definition.

wav = model.generate(
    text=chunk,
    prompt_wav_path=ref_path,    # cached reference voice
    prompt_text=ref_text,        # its exact transcript
    reference_wav_path=ref_path, # same clip again = Hi-Fi mode
)

Now she is consistent from "Ayyo, kanna" to "serve hot with rice".

Sharp edges, briefly

For anyone walking the same road: unpinned dependencies will betray you eventually — my image floated to a torch/transformers combination where Dynamo tried to compile MiniCPM-V's einops calls and crashed. TORCHDYNAMO_DISABLE=1, set inside the function before torch imports, was the cure — applied four times before I found every model it affected. Modal's @app.cls with @modal.enter() is the difference between loading a 4B model once per container and once per request — find that pattern early. And bake big checkpoints into your Modal image at build time. Watching a 1.9GB TTS model download at 40 kiB/s during a live test is character building. I don't recommend it.

So what are small models actually for?

This is the question the hackathon really asks, and building this answered it for me.

Nemotron-Mini-4B was built for low-latency constrained tasks — NPC dialogue, RAG answers, function calling. MiniCPM-V was built for vision on the edge. VoxCPM2 does one thing: speech. Each one, pointed at its narrow target, worked from day one. The only long struggle was the one place I asked a small model to do open-ended composition — a big-model task. And the fix was not scale. It was decomposition: one hard task became two easy ones, with deterministic code holding the rails between them.

Small models are not lesser large models. They are components. The engineering is not in the prompt — it is in the architecture around the prompt.

Future work: a LoRA fine-tune of the recipe model on the open RecipeNLG dataset (~2M recipes, formatted as pantry-list → recipe pairs) is the obvious next step beyond scaffolding at 4B. Scoped honestly out of a two-week hackathon — it belongs in this paragraph, not in my deadline.

Meanwhile there is paneer in my fridge with two days left, and an Ammamma who knows exactly what to do about it. Easier to save than to earn — one rescued recipe at a time. 🌿


Try it: build-small-hackathon/zerowastekitchen — scan a receipt and say hello to Ammamma.

Community

Sign up or log in to comment