APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music
APEX is the first large-scale multi-task learning framework for jointly predicting popularity and aesthetic quality of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging MERT-v1-95M audio embeddings.
What does APEX predict?
Given any audio file, APEX predicts 7 scores:
Popularity:
| Score | Range | Description |
|---|---|---|
score_streams |
0β100 | Predicted streaming engagement score |
score_likes |
0β100 | Predicted likes engagement score |
Aesthetic Quality (from SongEval):
| Score | Range | Description |
|---|---|---|
coherence |
1β5 | Structural and harmonic coherence |
musicality |
1β5 | Overall musical quality |
memorability |
1β5 | How memorable the song is |
clarity |
1β5 | Clarity of production and mix |
naturalness |
1β5 | Naturalness of the generated audio |
Architecture
Usage
Installation
pip install torch transformers soundfile torchaudio
Inference
from transformers import AutoModel
model = AutoModel.from_pretrained("amaai-lab/apex", trust_remote_code=True)
results = model.predict("my_song.mp3", save_json="results.json")
print(results["score_streams"]) # popularity score 0-100
print(results["score_likes"]) # popularity score 0-100
print(results["coherence"]) # aesthetic score 1-5
- Downloads last month
- -
