Automatic Speech Recognition
MLX
NeMo
Belarusian
conformer
transducer
rnnt
apple-silicon
speech-recognition
asr
belarusian
Instructions to use molind/conformer-transducer-be-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use molind/conformer-transducer-be-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir conformer-transducer-be-mlx molind/conformer-transducer-be-mlx
- NeMo
How to use molind/conformer-transducer-be-mlx with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("molind/conformer-transducer-be-mlx") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Conformer-Transducer (RNN-T) Belarusian (MLX)
Code: molind/mlx-conformer
NVIDIA's Conformer-Transducer Large model for Belarusian speech recognition, packaged for MLX inference on Apple Silicon.
Original model: nvidia/stt_be_conformer_transducer_large
Results
| Dataset | WER | Speed |
|---|---|---|
| CommonVoice 24.0 test (500 samples) | 6.29% | 7.6 samples/s |
Best open-source Belarusian ASR result we are aware of on CommonVoice 24.0.
Usage
pip install mlx numpy pyyaml sentencepiece torch
git clone https://github.com/molind/mlx-conformer
cd mlx-conformer
python mlx_conformer.py \
--download nvidia/stt_be_conformer_transducer_large \
--output models
python mlx_conformer.py \
--model models/stt_be_conformer_transducer_large \
--type transducer \
--audio test.mp3
Architecture
- 17 Conformer encoder layers, d_model=512, 8 heads
- LSTM prediction network (1 layer, 640 hidden)
- Joint network (640 hidden, ReLU)
- 1024 BPE vocabulary + blank
- ~120M parameters
License
Original model by NVIDIA, licensed under CC-BY-4.0.
Model tree for molind/conformer-transducer-be-mlx
Base model
nvidia/stt_be_conformer_transducer_large