Language support

2026-01-24 05:10:47 +00:00
parent c0c2af984e
commit a3e99daedd
3 changed files with 39 additions and 1 deletions
@@ -21,7 +21,9 @@

 <h3>📰 News</h3>

-<strong>2026-01-21: 📣 We open-sourced <a href="docs/vibevoice-asr.md"><strong>VibeVoice-ASR</strong></a>, a unified speech-to-text model designed to handle 60-minute long-form audio in a single pass, generating structured transcriptions containing Who (Speaker), When (Timestamps), and What (Content), with support for User-Customized Context. Try it in [Playground](https://aka.ms/vibevoice-asr)</strong>.
+<strong>2026-01-21: 📣 We open-sourced <a href="docs/vibevoice-asr.md"><strong>VibeVoice-ASR</strong></a>, a unified speech-to-text model designed to handle 60-minute long-form audio in a single pass, generating structured transcriptions containing Who (Speaker), When (Timestamps), and What (Content), with support for User-Customized Context. Try it in [Playground](https://aka.ms/vibevoice-asr)</strong>. 
+- ⭐️ VibeVoice-ASR is natively multilingual — see the [supported languages](docs/vibevoice-asr.md#language-distribution) for details.
+- 🔥 The VibeVoice-ASR [finetuning code](finetuning-asr/README.md) is now available!

 2025-12-16: 📣 We added experimental speakers to <a href="docs/vibevoice-realtime-0.5b.md"><strong>VibeVoice‑Realtime‑0.5B</strong></a> for exploration, including multilingual voices in nine languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) and 11 distinct English style voices. [Try it](docs/vibevoice-realtime-0.5b.md#optional-more-experimental-voices). More speaker types will be added over time.

@@ -79,6 +79,34 @@ python demo/vibevoice_asr_gradio_demo.py --model_path microsoft/VibeVoice-ASR --
 python demo/vibevoice_asr_inference_from_file.py --model_path microsoft/VibeVoice-ASR --audio_files [add a audio path here] 
 ```

+### Results
+
+#### Multilingual
+| Dataset        | Language  | DER  | cpWER | tcpWER | WER  |
+|----------------|-----------|------|-------|--------|------|
+| MLC-Challenge  | English   | 4.28 | 11.48 | 13.02  | 7.99  |
+| MLC-Challenge  | French    | 3.80 | 18.80 | 19.64  | 15.21 |
+| MLC-Challenge  | German    | 1.04 | 17.10 | 17.26  | 16.30 |
+| MLC-Challenge  | Italian   | 2.08 | 15.76 | 15.91  | 13.91 |
+| MLC-Challenge  | Japanese  | 0.82 | 15.33 | 15.41  | 14.69 |
+| MLC-Challenge  | Korean    | 4.52 | 15.35 | 16.07  | 9.65  |
+| MLC-Challenge  | Portuguese| 7.98 | 29.91 | 31.65  | 21.54 |
+| MLC-Challenge  | Russian   | 0.90 | 12.94 | 12.98  | 12.40 |
+| MLC-Challenge  | Spanish   | 2.67 | 10.51 | 11.71  | 8.04  |
+| MLC-Challenge  | Thai      | 4.09 | 14.91 | 15.57  | 13.61 |
+| MLC-Challenge  | Vietnamese| 0.16 | 14.57 | 14.57  | 14.43 |
+
+---
+
+| Dataset        | Language  | DER  | cpWER | tcpWER | WER  |
+|----------------|-----------|------|-------|--------|------|
+| AISHELL-4      | Chinese   | 6.77 | 24.99 | 25.35  | 21.40 |
+| AMI-IHM        | English   | 11.92| 20.41 | 20.82  | 18.81 |
+| AMI-SDM        | English   | 13.43| 28.82 | 29.80  | 24.65 |
+| AliMeeting     | Chinese   | 10.92| 29.33 | 29.51  | 27.40 |
+| MLC-Challenge  | Average   | 3.42 | 14.81 | 15.66  | 12.07|
+
+
 ## Finetuning
 LoRA (Low-Rank Adaptation) fine-tuning is supported. See [Finetuning](../finetuning-asr/README.md) for detailed guide.

@@ -86,3 +114,11 @@ LoRA (Low-Rank Adaptation) fine-tuning is supported. See [Finetuning](../finetun
 ## 📄 License

 This project is licensed under the [MIT License](../LICENSE).
+
+
+## Language Distribution
+<p align="center">
+  <img src="../Figures/language_distribution_horizontal.png" alt="Language Distribution" width="80%">
+</p>
+
+