Instructions to use Reza2kn/Shenava-Rizeh-Pizeh-v1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use Reza2kn/Shenava-Rizeh-Pizeh-v1.0 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("Reza2kn/Shenava-Rizeh-Pizeh-v1.0") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Shenava — Rizeh Pizeh v1.0 (6.9M) · Persian streaming ASR
Rizeh Pizeh (ریزهپیزه, “teeny‑tiny”) is the 6.9M smallest of the Shenava‑1 family — distilled down the cascade, built to run real‑time on a 2015 Cortex‑A7 (RTF ≈ 0.91 fp32, tract). fp32 NeMo source; formats below.
The Shenava‑1 family
A knowledge‑distillation cascade of on‑device Persian ASR models — one teacher distilled down to a 6.9M student. This model is one member; its siblings:
Reza2kn/Shenava-Koochik-v1.0— Koochik v1.0 (114M) · teacher / flagship — on-device WER recordReza2kn/Shenava-Rizeh-v1.0— Rizeh v1.0 (32M) · mid-tier studentReza2kn/Shenava-Rizeh-Pizeh-v1.0— Rizeh Pizeh v1.0 (6.9M) · tiniest — real-time on a 2015 Cortex-A7 ◀ this model (or its parent)
Benchmark — fair WER/CER
ITN + Persian‑digit normalizer (the double‑benchmark convention), decoded @ att_context_size=[70,13].
| Member | golden‑6669 WER | CER | FLEURS‑fa WER | CER |
|---|---|---|---|---|
| Koochik v1.0 (114M) | 7.49% | 2.30% | 10.64% | 3.79% |
| Rizeh v1.0 (32M) | 12.11% | 3.94% | 14.45% | 5.10% |
| Rizeh Pizeh v1.0 (6.9M) | 24.55% | 8.89% | 26.95% | 10.22% |
Koochik v1.0 is #2 on the public double‑benchmark, behind only cloud Gemini — the best on‑device Persian ASR, beating a 1.5B Whisper‑Persian by >2× WER at 1/13 the size.
Deployment formats (own repos)
Shenava-Rizeh-Pizeh-v1.0-ONNX-fp16Shenava-Rizeh-Pizeh-v1.0-CoreML-fp16Shenava-Rizeh-Pizeh-v1.0-tract— fp32 cache‑aware streaming (Cortex‑A7)
6.9M, d_model 144 / 12 layers, ×8 subsampling, multi‑latency [[70,13],[70,6],[70,1],[70,0]].
Tokenizer: ve_tok_v4 (SentencePiece BPE‑1024 +blank, digit/punct/«»‑aware). Numbers are spoken‑form; apply ITN at display for digits. Part of VisualEars / Shenava.
- Downloads last month
- -
Model tree for Reza2kn/Shenava-Rizeh-Pizeh-v1.0
Base model
nvidia/stt_fa_fastconformer_hybrid_large