update ASR README: multilingual
This commit is contained in:
@@ -3,7 +3,7 @@
|
||||
[](https://huggingface.co/microsoft/VibeVoice-ASR)
|
||||
[](https://aka.ms/vibevoice-asr)
|
||||
|
||||
**VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords**.
|
||||
**VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords** and over **50 languages**.
|
||||
|
||||
**Model:** [VibeVoice-ASR-7B](https://huggingface.co/microsoft/VibeVoice-ASR)<br>
|
||||
**Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)<br>
|
||||
@@ -22,6 +22,9 @@
|
||||
|
||||
- **📝 Rich Transcription (Who, When, What)**:
|
||||
The model jointly performs ASR, diarization, and timestamping, producing a structured output that indicates *who* said *what* and *when*.
|
||||
|
||||
- **🌍 Multilingual & Code-Switching Support**:
|
||||
It supports over 50 languages, requires no explicit language setting, and natively handles code-switching within and across utterances. Language distribution can be found [here](#language-distribution)
|
||||
|
||||
|
||||
## 🏗️ Model Architecture
|
||||
|
||||
Reference in New Issue
Block a user