TinCan Speech Commands Model

A compact English speech-command recognition model for tincan app.

This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.

12 custom words
and 35 words from the Google Speech Commands dataset v2

Highlights

47-class English command recognizer
ONNX export for portable inference
Small model artifact: model.onnx is approximately 378 KB
Based on NVIDIA NeMo's MatchboxNet command-recognition model family

Base Model

This model uses NVIDIA NeMo's commandrecognition_en_matchboxnet3x2x64_v2 MatchboxNet command-recognition architecture.

Base model reference: commandrecognition_en_matchboxnet3x2x64_v2

Metrics

These metrics describe the currently exported model.onnx artifact.

Metric	Value
Validation loss	0.1493
Validation micro top-1 accuracy	95.28%
Validation macro accuracy	94.61%

Supported Commands

Custom TinCan commands:

astra, bali, boston, capri, delhi, dublin, frisco, monaco, oslo, paris, seatown, tokyo

Google Speech Commands labels:

yes, no, up, down, left, right, on, off, stop, go, zero, one, two, three, four, five, six, seven, eight, nine, bed, bird, cat, dog, happy, house, marvin, sheila, tree, wow, backward, forward, follow, learn, visual

Inference Notes

The model outputs logits over the 47 labels listed in labels.json. Use the output index to look up the predicted command label.

Training Provenance

Field	Value
Model name	`commandrecognition_en_matchboxnet3x2x64_v2`
Export format	ONNX
Epochs	10
Batch size	32

Limitations

This is a closed-vocabulary command recognizer, not a general speech-to-text model.
The model is intended for English short-command recognition.
Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.

Downloads last month: 34

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train HashNuke/tincan-wakewords

Evaluation results

Validation loss on TinCan Speech Commands validation set
self-reported

0.149
Validation micro top-1 accuracy on TinCan Speech Commands validation set
self-reported

95.280
Validation macro accuracy on TinCan Speech Commands validation set
self-reported

94.610