EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11 • 18
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 132
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Paper • 2506.08967 • Published Jun 10 • 2
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub By jsulz and 3 others • Feb 12 • 72
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 4 items • Updated 22 days ago • 32
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.28k