How to use espnet/fastspeech2_conformer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-audio", model="espnet/fastspeech2_conformer")
# Load model directly from transformers import AutoTokenizer, AutoModelForTextToSpectrogram tokenizer = AutoTokenizer.from_pretrained("espnet/fastspeech2_conformer") model = AutoModelForTextToSpectrogram.from_pretrained("espnet/fastspeech2_conformer")