add ASR playground link

This commit is contained in:
YaoyaoChang
2026-01-21 10:26:17 -08:00
parent f7c6d2dec9
commit 616a167275
2 changed files with 5 additions and 1 deletions
+4
View File
@@ -1,5 +1,7 @@
# VibeVoice-ASR
[![Hugging Face](https://img.shields.io/badge/HuggingFace-Collection-orange?logo=huggingface)](https://huggingface.co/microsoft/VibeVoice-ASR)
[![Live Playground](https://img.shields.io/badge/Live-Playground-green?logo=gradio)](https://aka.ms/vibevoice-asr)
**VibeVoice-ASR** is the latest addition to the **VibeVoice** family. While the original VibeVoice / VibeVoice-Realtime focused on expressive TTS, **VibeVoice-ASR** focuses on understanding long-form speech with high precision and rich metadata.
It is a unified speech-to-text model designed to handle **1-hour long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **User-Customized Context**.
@@ -15,6 +17,8 @@ It is a unified speech-to-text model designed to handle **1-hour long-form audio
- **📝 Rich Transcription (Who, When, What)**:
The model performs ASR, Diarization, and Timestamping simultaneously. The output is a structured sequence indicating *who* said *what* at *which time*.
[Try it here.](https://aka.ms/vibevoice-asr)
## 🏗️ Model Architecture
<p align="center">