diff --git a/Figures/DER.jpg b/Figures/DER.jpg new file mode 100644 index 0000000..6d87843 Binary files /dev/null and b/Figures/DER.jpg differ diff --git a/Figures/cpWER.jpg b/Figures/cpWER.jpg new file mode 100644 index 0000000..d5ef00b Binary files /dev/null and b/Figures/cpWER.jpg differ diff --git a/Figures/tcpWER.jpg b/Figures/tcpWER.jpg new file mode 100644 index 0000000..b0dcb01 Binary files /dev/null and b/Figures/tcpWER.jpg differ diff --git a/README.md b/README.md index 41a98de..d01d56b 100644 --- a/README.md +++ b/README.md @@ -20,9 +20,6 @@

📰 News

-New -Realtime TTS - 2026-01-21: 📣 We open-sourced VibeVoice-ASR, a unified speech-to-text model designed to handle 60-minute long-form audio in a single pass, generating structured transcriptions containing Who (Speaker), When (Timestamps), and What (Content), with support for User-Customized Context. 2025-12-16: 📣 We added more experimental speakers for exploration, including multilingual voices and 11 distinct English style voices. [Try it](docs/vibevoice-realtime-0.5b.md#optional-more-experimental-voices). More speaker types will be added over time. diff --git a/docs/vibevoice-asr.md b/docs/vibevoice-asr.md index e96214a..e1318f2 100644 --- a/docs/vibevoice-asr.md +++ b/docs/vibevoice-asr.md @@ -21,6 +21,13 @@ It is a unified speech-to-text model designed to handle **1-hour long-form audio VibeVoice ASR Architecture

+## Evaluation +

+ DER + cpWER + tcpWER +

+ ## Installation We recommend to use NVIDIA Deep Learning Container to manage the CUDA environment.