APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music

APEX is the first large-scale multi-task learning framework for jointly predicting popularity and aesthetic quality of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging MERT-v1-95M audio embeddings.

What does APEX predict?

Given any audio file, APEX predicts 7 scores:

Popularity:

Score	Range	Description
`score_streams`	0–100	Predicted streaming engagement score
`score_likes`	0–100	Predicted likes engagement score

Aesthetic Quality (from SongEval):

Score	Range	Description
`coherence`	1–5	Structural and harmonic coherence
`musicality`	1–5	Overall musical quality
`memorability`	1–5	How memorable the song is
`clarity`	1–5	Clarity of production and mix
`naturalness`	1–5	Naturalness of the generated audio

Architecture

Usage

Installation

pip install torch transformers soundfile torchaudio

Inference

from transformers import AutoModel

model   = AutoModel.from_pretrained("amaai-lab/apex", trust_remote_code=True)
results = model.predict("my_song.mp3", save_json="results.json")

print(results["score_streams"])  # popularity score 0-100
print(results["score_likes"])    # popularity score 0-100
print(results["coherence"])      # aesthetic score 1-5

Downloads last month: -