YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SoundSense AI - EEND-OLA Lite Diarizer

This is a custom diarization model trained from scratch for SoundSense AI.

Base model / source:

  • Architecture inspired by EEND-OLA (End-to-End Neural Diarization with Online Attractors), https://arxiv.org/abs/2006.02616
  • Custom implementation: 4-block Transformer encoder + dual LSTM heads (OD + PSE)

Training data:

  • Simulated diarization mixtures generated from LibriSpeech train-clean-360 (921 unique speakers, 3000 training samples)

Use:

  • Part of SoundSense AI hackathon submission (Stage 3: Diarization).
  • Determines who is speaking at each 32ms audio frame using Power Set Encoding (8 classes covering up to 3 simultaneous speakers).

Limitations:

  • Built for prototype/demo use.
  • Trained on simulated mixtures, not real conversational/overlapping speech (e.g. CALLHOME, AMI). Real-world diarization error rate (DER) not yet benchmarked.
  • PSE accuracy: 46.9% (vs 12.5% random chance); OD accuracy: 77.5% (vs 50% random chance).
  • Performance should be verified on the target environment before deployment.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for guranshsaran/soundsense-eend-ola-lite