98 Commits

Author SHA1 Message Date
Zhiliang Peng 3c976491d4 Update README with new TTS report and ICLR oral acceptance
Updated TTS report link and added conference acceptance note.
2026-03-31 12:24:50 +08:00
Jianwei Yu c766f12e23 docs: add Vibing download links
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 03:28:59 +00:00
Jianwei Yu 8f133837dc docs: add Vibing demo video to news section
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 02:33:10 +00:00
Jianwei Yu 0857b6d59f docs: fix news bold formatting
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 01:48:20 +00:00
Jianwei Yu c8371b6bb6 docs: add Vibing voice input adoption news
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 01:43:59 +00:00
Jianwei Yu b691f99191 docs: add Trendshift #1 trending badge to README
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-28 17:07:00 +00:00
Jianwei Yu 5cd81bb497 fix: restore sequential encoder (batch encoder causes OOM)
Batch encoder across multiple requests caused GPU OOM when vLLM
scheduler sends many audio items at once. The encoder intermediates
(~700MB per 69s audio) compete with KV cache for GPU memory.

Sequential encoding is stable and proven correct. The encoder
(267ms per request) is not the primary throughput bottleneck when
encoder cache is enabled (default).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 18:48:06 +00:00
Jianwei Yu cd945395d4 feat: set nginx workers to 2×dp for optimal HTTP throughput
Nginx worker_processes now defaults to 2×N (where N is the number of DP
replicas) instead of 'auto'. This ensures enough HTTP handler processes
to fully saturate all GPU backends under heavy concurrent load.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 09:16:05 +00:00
Jianwei Yu e6b65abb9b fix: auto-tune per-worker env vars in DP mode
Pass VIBEVOICE_FFMPEG_MAX_CONCURRENCY and VLLM_MEDIA_LOADING_THREAD_COUNT
to each worker subprocess so they inherit the correct settings regardless
of how the container is launched (--skip-deps or not).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 07:57:49 +00:00
Jianwei Yu 3817f74d46 feat: nginx-based data parallel for optimal ASR throughput
When --dp N is specified (N > 1), the launcher now starts N independent
vLLM processes behind an nginx reverse proxy instead of using vLLM's
built-in DP coordinator. This avoids the single-process HTTP bottleneck
when handling large base64 audio payloads, achieving near-linear scaling
(7.2x with 8 GPUs at 4096 concurrent requests).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 07:43:32 +00:00
JianweiYu 9634518ca4 Add data parallel (DP) support to vLLM server launcher
- Add --dp/--data-parallel-size flag for running independent model replicas
  across multiple GPUs with automatic load balancing behind a single port
- Add --tp/--tensor-parallel-size flag (previously hardcoded to 1)
- Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide
  covering DP, TP, and hybrid (DP × TP) configurations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-24 11:53:31 +00:00
JianweiYu 09ca114fa3 Add Gradio ASR demo with video support and demo audio/video files
- Add gradio_asr_demo_api_video.py: Gradio web UI supporting audio/video upload,
  streaming output, hotwords, and Cloudflare tunnel
- Add demo/asr_demo/: demo audio and video files for the Gradio interface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-22 06:11:51 +00:00
Zhiliang Peng 4c419978c9 Merge pull request #255 from sd983527/main
Add news about VibeVoice ASR Transformers integration
2026-03-06 14:08:47 +08:00
Yan Xia 7e73beec97 Add news about VibeVoice ASR Transformers integration
- Added announcement that VibeVoice ASR is now part of Transformers v5.3.0 release
- Linked to the official Hugging Face Transformers release page
- Positioned as the latest news item with today's date

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 13:32:21 +08:00
Li Dong 7ef9dbe300 Merge pull request #247 from Damon-Salvetore/fix/vllm-version-compat
fix: vllm-version-stable
2026-02-28 11:12:24 +08:00
Damon-Salvetore 165e17e5ed fix: vllm-version-stable 2026-02-25 07:30:43 +00:00
Jianwei Yu 1807b858d4 Merge pull request #236 from Damon-Salvetore/main
fix backend
2026-02-10 00:07:05 +08:00
YingboHAO a4add8e52f fix backend 2026-02-08 09:58:19 +00:00
Jianwei Yu ce3d40c78f Merge pull request #233 from Damon-Salvetore/main
Add hot words support
2026-02-07 12:32:03 +08:00
YingboHAO 0508c3e86f fix 2026-02-06 14:38:16 +00:00
YingboHAO 7761242bf3 fix 2026-02-06 05:52:48 +00:00
YingboHAO bb54f78d0e feat: add hotwords support for vLLM ASR 2026-02-04 10:33:20 +00:00
YaoyaoChang 0aa8cb4c64 fx default speaker 2026-02-03 00:35:04 -08:00
YaoyaoChang e43c1e2cdb streaming use transformers==4.51.3 2026-02-03 00:30:52 -08:00
Jianwei Yu e16491d65e Merge pull request #228 from Damon-Salvetore/vllm-1
[Fix] Resolve occasional infinite loops during vLLM inference
2026-02-03 10:38:40 +08:00
YingboHAO e26f1c263f 1 2026-02-02 13:50:27 +00:00
YingboHAO 0055161273 Add test_api_auto_recover.py and test audio files 2026-02-02 13:49:01 +00:00
Zhiliang Peng b2aee8015c Delete docs/VibeVoice-ASR-Report.pdf 2026-01-28 19:33:37 +08:00
YaoyaoChang 2ee94fab1d update ASR architechture figure 2026-01-27 05:11:35 -08:00
YaoyaoChang 3140709188 update README 2026-01-27 21:06:31 +08:00
YaoyaoChang c435ae05d5 update README
Added a section on LoRA fine-tuning to the ASR documentation.
2026-01-27 21:01:40 +08:00
YaoyaoChang 0e1a0d39fd update README 2026-01-27 20:59:25 +08:00
YaoyaoChang 142a00112e update ASR README: multilingual 2026-01-27 20:58:10 +08:00
YaoyaoChang 4648c50ea0 update ASR Technical Report link to Arxiv 2026-01-27 12:58:06 +08:00
MLSDCherryPick cbbdb69474 add VibeVoice-ASR technique report arxiv link 2026-01-27 02:45:16 +00:00
YaoyaoChang a69e77c036 1. unify env for TTS and ASR; 2. avoid transformers 5.0.0 temporarily 2026-01-26 03:29:02 -08:00
YaoyaoChang a00f431e14 tts support latest transformers(4.57.6) 2026-01-26 03:28:10 -08:00
Jianwei Yu c4ee4fe716 Merge pull request #213 from Damon-Salvetore/vllm-1
Replace install_deps.sh with start_server.py one-click deployment
2026-01-26 16:49:38 +08:00
YingboHAO 1eb04f53a2 Replace install_deps.sh with start_server.py one-click deployment 2026-01-26 07:34:54 +00:00
ikeshav26 d11d756b61 fix: issues in error handling 2026-01-26 14:18:34 +08:00
YaoyaoChang 0926f242ce add CONTRIBUTING.md 2026-01-25 22:07:40 -08:00
DDXDB 1c5dbc4190 Add XPU sdpa Support 2026-01-26 14:00:31 +08:00
ThanhNguyxn 523713e806 fix(demo): add MPS and CPU support for ASR inference demo
- Add MPS device choice and auto-detect MPS availability
- Change default attention implementation to 'auto' with smart fallback
- Auto-detect flash_attention_2 availability on CUDA, fallback to sdpa
- Use sdpa for MPS and CPU devices (flash_attention_2 not supported)
- Use float32 dtype for MPS/CPU devices for better compatibility

Fixes #206
2026-01-26 13:56:11 +08:00
ThanhNguyxn 5cf026569e fix: handle torch.dtype serialization in config classes
Fixes #199 - Object of type dtype is not JSON serializable

When loading models with torch_dtype as a torch.dtype object (e.g.,
torch.bfloat16), transformers would fail to serialize the config to
JSON for logging purposes, raising TypeError.

This fix:
- Adds _convert_dtype_to_string() helper function to convert torch.dtype
  objects to their string representation (e.g., 'bfloat16')
- Overrides to_dict() method in VibeVoiceConfig, VibeVoiceASRConfig,
  and VibeVoiceStreamingConfig to apply this conversion

The fix is backward compatible - string dtype values and None values
continue to work as expected.
2026-01-26 13:45:55 +08:00
YaoyaoChang e67b15f47d update 2026-01-25 21:41:42 -08:00
MLSDCherryPick d9068541cf 1 2026-01-25 16:11:02 +00:00
YaoyaoChang c28e23f80c update language distribution figure 2026-01-25 00:15:11 -08:00
MLSDCherryPick 81bf8baa89 1 2026-01-25 05:14:39 +00:00
MLSDCherryPick e4036e46f4 1 2026-01-24 08:28:05 +00:00
Jianwei Yu 3c50e50d18 Merge pull request #203 from Damon-Salvetore/vibevoice-vllm
Add vLLM plugin support for high-performance ASR serving
2026-01-24 16:17:10 +08:00