Commit Graph

91 Commits

Author SHA1 Message Date
Jianwei Yu cd945395d4 feat: set nginx workers to 2×dp for optimal HTTP throughput
Nginx worker_processes now defaults to 2×N (where N is the number of DP
replicas) instead of 'auto'. This ensures enough HTTP handler processes
to fully saturate all GPU backends under heavy concurrent load.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 09:16:05 +00:00
Jianwei Yu e6b65abb9b fix: auto-tune per-worker env vars in DP mode
Pass VIBEVOICE_FFMPEG_MAX_CONCURRENCY and VLLM_MEDIA_LOADING_THREAD_COUNT
to each worker subprocess so they inherit the correct settings regardless
of how the container is launched (--skip-deps or not).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 07:57:49 +00:00
Jianwei Yu 3817f74d46 feat: nginx-based data parallel for optimal ASR throughput
When --dp N is specified (N > 1), the launcher now starts N independent
vLLM processes behind an nginx reverse proxy instead of using vLLM's
built-in DP coordinator. This avoids the single-process HTTP bottleneck
when handling large base64 audio payloads, achieving near-linear scaling
(7.2x with 8 GPUs at 4096 concurrent requests).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 07:43:32 +00:00
JianweiYu 9634518ca4 Add data parallel (DP) support to vLLM server launcher
- Add --dp/--data-parallel-size flag for running independent model replicas
  across multiple GPUs with automatic load balancing behind a single port
- Add --tp/--tensor-parallel-size flag (previously hardcoded to 1)
- Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide
  covering DP, TP, and hybrid (DP × TP) configurations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-24 11:53:31 +00:00
JianweiYu 09ca114fa3 Add Gradio ASR demo with video support and demo audio/video files
- Add gradio_asr_demo_api_video.py: Gradio web UI supporting audio/video upload,
  streaming output, hotwords, and Cloudflare tunnel
- Add demo/asr_demo/: demo audio and video files for the Gradio interface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-22 06:11:51 +00:00
Zhiliang Peng 4c419978c9 Merge pull request #255 from sd983527/main
Add news about VibeVoice ASR Transformers integration
2026-03-06 14:08:47 +08:00
Yan Xia 7e73beec97 Add news about VibeVoice ASR Transformers integration
- Added announcement that VibeVoice ASR is now part of Transformers v5.3.0 release
- Linked to the official Hugging Face Transformers release page
- Positioned as the latest news item with today's date

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 13:32:21 +08:00
Li Dong 7ef9dbe300 Merge pull request #247 from Damon-Salvetore/fix/vllm-version-compat
fix: vllm-version-stable
2026-02-28 11:12:24 +08:00
Damon-Salvetore 165e17e5ed fix: vllm-version-stable 2026-02-25 07:30:43 +00:00
Jianwei Yu 1807b858d4 Merge pull request #236 from Damon-Salvetore/main
fix backend
2026-02-10 00:07:05 +08:00
YingboHAO a4add8e52f fix backend 2026-02-08 09:58:19 +00:00
Jianwei Yu ce3d40c78f Merge pull request #233 from Damon-Salvetore/main
Add hot words support
2026-02-07 12:32:03 +08:00
YingboHAO 0508c3e86f fix 2026-02-06 14:38:16 +00:00
YingboHAO 7761242bf3 fix 2026-02-06 05:52:48 +00:00
YingboHAO bb54f78d0e feat: add hotwords support for vLLM ASR 2026-02-04 10:33:20 +00:00
YaoyaoChang 0aa8cb4c64 fx default speaker 2026-02-03 00:35:04 -08:00
YaoyaoChang e43c1e2cdb streaming use transformers==4.51.3 2026-02-03 00:30:52 -08:00
Jianwei Yu e16491d65e Merge pull request #228 from Damon-Salvetore/vllm-1
[Fix] Resolve occasional infinite loops during vLLM inference
2026-02-03 10:38:40 +08:00
YingboHAO e26f1c263f 1 2026-02-02 13:50:27 +00:00
YingboHAO 0055161273 Add test_api_auto_recover.py and test audio files 2026-02-02 13:49:01 +00:00
Zhiliang Peng b2aee8015c Delete docs/VibeVoice-ASR-Report.pdf 2026-01-28 19:33:37 +08:00
YaoyaoChang 2ee94fab1d update ASR architechture figure 2026-01-27 05:11:35 -08:00
YaoyaoChang 3140709188 update README 2026-01-27 21:06:31 +08:00
YaoyaoChang c435ae05d5 update README
Added a section on LoRA fine-tuning to the ASR documentation.
2026-01-27 21:01:40 +08:00
YaoyaoChang 0e1a0d39fd update README 2026-01-27 20:59:25 +08:00
YaoyaoChang 142a00112e update ASR README: multilingual 2026-01-27 20:58:10 +08:00
YaoyaoChang 4648c50ea0 update ASR Technical Report link to Arxiv 2026-01-27 12:58:06 +08:00
MLSDCherryPick cbbdb69474 add VibeVoice-ASR technique report arxiv link 2026-01-27 02:45:16 +00:00
YaoyaoChang a69e77c036 1. unify env for TTS and ASR; 2. avoid transformers 5.0.0 temporarily 2026-01-26 03:29:02 -08:00
YaoyaoChang a00f431e14 tts support latest transformers(4.57.6) 2026-01-26 03:28:10 -08:00
Jianwei Yu c4ee4fe716 Merge pull request #213 from Damon-Salvetore/vllm-1
Replace install_deps.sh with start_server.py one-click deployment
2026-01-26 16:49:38 +08:00
YingboHAO 1eb04f53a2 Replace install_deps.sh with start_server.py one-click deployment 2026-01-26 07:34:54 +00:00
ikeshav26 d11d756b61 fix: issues in error handling 2026-01-26 14:18:34 +08:00
YaoyaoChang 0926f242ce add CONTRIBUTING.md 2026-01-25 22:07:40 -08:00
DDXDB 1c5dbc4190 Add XPU sdpa Support 2026-01-26 14:00:31 +08:00
ThanhNguyxn 523713e806 fix(demo): add MPS and CPU support for ASR inference demo
- Add MPS device choice and auto-detect MPS availability
- Change default attention implementation to 'auto' with smart fallback
- Auto-detect flash_attention_2 availability on CUDA, fallback to sdpa
- Use sdpa for MPS and CPU devices (flash_attention_2 not supported)
- Use float32 dtype for MPS/CPU devices for better compatibility

Fixes #206
2026-01-26 13:56:11 +08:00
ThanhNguyxn 5cf026569e fix: handle torch.dtype serialization in config classes
Fixes #199 - Object of type dtype is not JSON serializable

When loading models with torch_dtype as a torch.dtype object (e.g.,
torch.bfloat16), transformers would fail to serialize the config to
JSON for logging purposes, raising TypeError.

This fix:
- Adds _convert_dtype_to_string() helper function to convert torch.dtype
  objects to their string representation (e.g., 'bfloat16')
- Overrides to_dict() method in VibeVoiceConfig, VibeVoiceASRConfig,
  and VibeVoiceStreamingConfig to apply this conversion

The fix is backward compatible - string dtype values and None values
continue to work as expected.
2026-01-26 13:45:55 +08:00
YaoyaoChang e67b15f47d update 2026-01-25 21:41:42 -08:00
MLSDCherryPick d9068541cf 1 2026-01-25 16:11:02 +00:00
YaoyaoChang c28e23f80c update language distribution figure 2026-01-25 00:15:11 -08:00
MLSDCherryPick 81bf8baa89 1 2026-01-25 05:14:39 +00:00
MLSDCherryPick e4036e46f4 1 2026-01-24 08:28:05 +00:00
Jianwei Yu 3c50e50d18 Merge pull request #203 from Damon-Salvetore/vibevoice-vllm
Add vLLM plugin support for high-performance ASR serving
2026-01-24 16:17:10 +08:00
MLSDCherryPick 71356b87dd Language support 2026-01-24 05:17:26 +00:00
MLSDCherryPick 7d12252de3 Language support 2026-01-24 05:11:34 +00:00
MLSDCherryPick a3e99daedd Language support 2026-01-24 05:10:47 +00:00
YingboHAO 04f8bc40b0 Update test_api.py 2026-01-23 17:47:31 +00:00
YingboHAO 4df5b0582f Add vLLM plugin support for high-performance ASR serving 2026-01-23 17:32:24 +00:00
YaoyaoChang c0c2af984e update README for finetuning-asr 2026-01-22 06:20:11 -08:00
Zhiliang Peng 05e1a022e5 Update FT README
Clarified the purpose of the toy dataset in the README.
2026-01-22 21:49:47 +08:00