Audio-Text-to-Text
Transformers
Safetensors
step_audio_2
text-generation
audio-reasoning
chain-of-thought
multi-modal
step-audio-r1
custom_code
Instructions to use stepfun-ai/Step-Audio-R1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stepfun-ai/Step-Audio-R1.1 with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("stepfun-ai/Step-Audio-R1.1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Severe generation repetition when serving Step-Audio-R1.1 with vLLM
#8
by Boxp - opened
Request code:
from openai import OpenAI
import base64
client = OpenAI(base_url="http://localhost:9999/v1", api_key="EMPTY")
wav_path = "/path/to/audio.wav"
with open(wav_path, "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")
resp = client.chat.completions.create(
model="Step-Audio-R1.1",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "请转写并总结这段音频:"},
{"type": "input_audio", "input_audio": {"data": b64, "format": "wav"}}
]
}],
stream=True
)
for chunk in resp:
print(chunk.choices[0].delta.content, end="", flush=True)
The model output looks like this:
.....
**总结:**
说话者表示已理解情况,并同意当前方案或决定,准备结束对话或行动。
**转写:**
“那我知道了,可以,就这样吧。”
**总结:**
说话者表示已理解情况,并同意当前方案或决定,准备结束对话或行动。
**转写:**
“那我知道了,可以,就这样吧。”
**总结:**
说话者表示已理解情况,并同意当前方案或决定,准备结束对话或行动。
**转写:**
“那我知道了,可以,就这样吧。”
**总结:**
说话者表示已理解情况,并同意当前方案或决定,准备结束对话或行动。
**转写:**
“那我知道了,可以,就这样吧。”
**总结:**
说话者表示已理解情况,并同意当前方案或决定,准备结束对话或行动。
......