arxiv:2110.02011

Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection

Published on Nov 12, 2021

Authors:

Abstract

Sound Event Detection Transformer (SEDT) is introduced as the first event-based and end-to-end model for sound event detection, utilizing a one-dimensional Detection Transformer architecture with audio query branches and one-to-many matching strategies.

AI-generated summary

Sound event detection (SED) has gained increasing attention with its wide application in surveillance, video indexing, etc. Existing models in SED mainly generate frame-level prediction, converting it into a sequence multi-label classification problem. A critical issue with the frame-based model is that it pursues the best frame-level prediction rather than the best event-level prediction. Besides, it needs post-processing and cannot be trained in an end-to-end way. This paper firstly presents the one-dimensional Detection Transformer (1D-DETR), inspired by Detection Transformer for image object detection. Furthermore, given the characteristics of SED, the audio query branch and a one-to-many matching strategy for fine-tuning the model are added to 1D-DETR to form Sound Event Detection Transformer (SEDT). To our knowledge, SEDT is the first event-based and end-to-end SED model. Experiments are conducted on the URBAN-SED dataset and the DCASE2019 Task4 dataset, and both show that SEDT can achieve competitive performance.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2110.02011 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2110.02011 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2110.02011 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.