guranshsaran
/

soundsense-eend-ola-lite

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SoundSense AI - EEND-OLA Lite Diarizer

This is a custom diarization model trained from scratch for SoundSense AI.

Base model / source:

Architecture inspired by EEND-OLA (End-to-End Neural Diarization with Online Attractors), https://arxiv.org/abs/2006.02616
Custom implementation: 4-block Transformer encoder + dual LSTM heads (OD + PSE)

Training data:

Simulated diarization mixtures generated from LibriSpeech train-clean-360 (921 unique speakers, 3000 training samples)

Use:

Part of SoundSense AI hackathon submission (Stage 3: Diarization).
Determines who is speaking at each 32ms audio frame using Power Set Encoding (8 classes covering up to 3 simultaneous speakers).

Limitations:

Built for prototype/demo use.
Trained on simulated mixtures, not real conversational/overlapping speech (e.g. CALLHOME, AMI). Real-world diarization error rate (DER) not yet benchmarked.
PSE accuracy: 46.9% (vs 12.5% random chance); OD accuracy: 77.5% (vs 50% random chance).
Performance should be verified on the target environment before deployment.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for guranshsaran/soundsense-eend-ola-lite

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

Paper • 2006.02616 • Published Mar 7, 2021