TinCan Speech Commands Model

A compact English speech-command recognition model for tincan app.

This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.

  • 12 custom words
  • and 35 words from the Google Speech Commands dataset v2

Highlights

  • 47-class English command recognizer
  • ONNX export for portable inference
  • Small model artifact: model.onnx is approximately 378 KB
  • Based on NVIDIA NeMo's MatchboxNet command-recognition model family

Base Model

This model uses NVIDIA NeMo's commandrecognition_en_matchboxnet3x2x64_v2 MatchboxNet command-recognition architecture.

Base model reference: commandrecognition_en_matchboxnet3x2x64_v2

Metrics

These metrics describe the currently exported model.onnx artifact.

Metric Value
Validation loss 0.1493
Validation micro top-1 accuracy 95.28%
Validation macro accuracy 94.61%

Supported Commands

Custom TinCan commands:

astra, bali, boston, capri, delhi, dublin, frisco, monaco, oslo, paris, seatown, tokyo

Google Speech Commands labels:

yes, no, up, down, left, right, on, off, stop, go, zero, one, two, three, four, five, six, seven, eight, nine, bed, bird, cat, dog, happy, house, marvin, sheila, tree, wow, backward, forward, follow, learn, visual

Inference Notes

The model outputs logits over the 47 labels listed in labels.json. Use the output index to look up the predicted command label.

Training Provenance

Field Value
Model name commandrecognition_en_matchboxnet3x2x64_v2
Export format ONNX
Epochs 10
Batch size 32

Limitations

  • This is a closed-vocabulary command recognizer, not a general speech-to-text model.
  • The model is intended for English short-command recognition.
  • Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train HashNuke/tincan-wakewords

Evaluation results

  • Validation loss on TinCan Speech Commands validation set
    self-reported
    0.149
  • Validation micro top-1 accuracy on TinCan Speech Commands validation set
    self-reported
    95.280
  • Validation macro accuracy on TinCan Speech Commands validation set
    self-reported
    94.610