diff --git a/README.md b/README.md index d946d1f..1adacce 100644 --- a/README.md +++ b/README.md @@ -70,14 +70,15 @@ For more information, demos, and examples, please visit our [Project Page](https - **📝 Rich Transcription (Who, When, What)**: The model jointly performs ASR, diarization, and timestamping, producing a structured output that indicates *who* said *what* and *when*. +[📖 Documentation](docs/vibevoice-asr.md) | [🤗 Hugging Face](https://huggingface.co/microsoft/VibeVoice-ASR) | [🎮 Playground](https://aka.ms/vibevoice-asr) + +

DER
cpWER
tcpWER

-[📖 Documentation](docs/vibevoice-asr.md) | [🤗 Hugging Face](https://huggingface.co/microsoft/VibeVoice-ASR) | [🎮 Playground](https://aka.ms/vibevoice-asr) -
@@ -102,12 +103,13 @@ https://github.com/user-attachments/assets/acde5602-dc17-4314-9e3b-c630bc84aefa Supports English, Chinese and other languages. +[📖 Documentation](docs/vibevoice-tts.md) | [🤗 Hugging Face](https://huggingface.co/microsoft/VibeVoice-1.5B) | [📊 Paper](https://arxiv.org/pdf/2508.19205) + +
VibeVoice Results
-[📖 Documentation](docs/vibevoice-tts.md) | [🤗 Hugging Face](https://huggingface.co/microsoft/VibeVoice-1.5B) | [📊 Paper](https://arxiv.org/pdf/2508.19205) - **English**