Papers
arxiv:2605.03929

PHALAR: Phasors for Learned Musical Audio Representations

Published on May 6
Authors:
,
,
,
,
,
,

Abstract

PHALAR is a contrastive framework for stem retrieval that achieves superior accuracy with reduced parameters and faster training by incorporating learned spectral pooling and complex-valued heads to enforce pitch and phase equivariance.

AI-generated summary

Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increase of up to approx 70% over the state-of-the-art while requiring <50% of the parameters and a 7times training speedup. By utilizing a Learned Spectral Pooling layer and a complex-valued head, PHALAR enforces pitch-equivariant and phase-equivariant biases. PHALAR establishes new retrieval state-of-the-art across MoisesDB, Slakh, and ChocoChorales, correlating significantly higher with human coherence judgment than semantic baselines. Finally, zero-shot beat tracking and linear chord probing confirm that PHALAR captures robust musical structures beyond the retrieval task.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.03929
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.03929 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.03929 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.03929 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.