Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
563.5
TFLOPS
12
5
30
Yuriy Perezhohin
PRO
yuriyvnv
Follow
hectorAJM's profile picture
lucazsh's profile picture
mbcardoso's profile picture
21 followers
ยท
24 following
https://scholar.google.com/citations?user=I5uzFtwAAAAJ&hl=en
yuriyvnv
yuriyperezhohin
AI & ML interests
Automatic Speech Recognition, Embeddings, Code Generation, Synthetic Data Generation and Filtering
Recent Activity
posted
an
update
2 days ago
๐ Four Qwen3-ASR (0.6B and 1.7B) Fine-Tunes for Portuguese and Dutch. Both the 1.7B and 0.6B variants of Alibaba's Qwen3-ASR, fine-tuned for European Portuguese and Dutch and bundled in a single collection. ๐ Collection: https://huggingface.co/collections/yuriyvnv/qwen-asr-for-portuguese-and-dutch-17b-and-06b Headline numbers โ Common Voice 22 test, with the zero-shot baseline. ๐ต๐น Qwen3-ASR-1.7B-PT โ 12.91% โ 8.50% WER (-34%) ๐ต๐น Qwen3-ASR-0.6B-PT โ 18.26% โ 11.85% WER (-35%) ๐ณ๐ฑ Qwen3-ASR-1.7B-NL โ 6.68% โ 5.28% WER (-21%) ๐ณ๐ฑ Qwen3-ASR-0.6B-NL โ 12.46% โ 8.31% WER (-33%) The 0.6B variants are the more interesting half of the release. They give up only a few WER points compared to the 1.7B at a third of the parameters โ relevant for edge hardware, CPU inference, or anywhere keeping inference cost down. The Dutch 0.6B in particular lands at 8.3% WER on CV22, competitive with much larger systems. The Dutch 1.7B started from a strong 6.7% zero-shot, so the absolute gain is smaller โ Qwen already handles Dutch well, and the fine-tune mostly sharpens it on Common Voice's casing and punctuation conventions. Training stuck close to Qwen's official SFT recipe (lr 2e-5, linear schedule, 2% warmup, bf16, gradient checkpointing on a single H100). The data is the differentiator: Common Voice 22 train + validation augmented with synthetic OpenAI-TTS speech, filtered by the WAVe multimodal embedding model that scores clips at the word level and drops the ones that don't align well with their transcripts. ๐ฆ Full pipeline โ synthetic data generation, WAVe filtering, training scripts, evaluation protocol โ is open-source: github.com/yuriyvnv/TTS-Augmented-ASR @hf-audio . #asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice
updated
a model
2 days ago
yuriyvnv/Qwen3-ASR-1.7B-NL
updated
a collection
2 days ago
Qwen-ASR for Portuguese & Dutch, 1.7B & 0.6B
View all activity
Organizations
yuriyvnv
's datasets
6
Sort:ย Recently updated
yuriyvnv/synthetic_asr_et_sl
Viewer
โข
Updated
Feb 17
โข
80.7k
โข
99
yuriyvnv/synthetic_transcript_nl
Viewer
โข
Updated
Nov 24, 2025
โข
34.9k
โข
86
yuriyvnv/capes_synthetic_audio_filtered
Viewer
โข
Updated
Jul 20, 2025
โข
72.6k
โข
66
yuriyvnv/synthetic_transcript_pt
Viewer
โข
Updated
Jul 19, 2025
โข
253k
โข
93
yuriyvnv/triage_synthetic_classification
Viewer
โข
Updated
Jun 27, 2025
โข
100
โข
18
yuriyvnv/triage_transcriptions
Viewer
โข
Updated
Jun 25, 2025
โข
87
โข
26