VibeVoice

Author	SHA1	Message	Date
Jianwei Yu	cd945395d4	feat: set nginx workers to 2×dp for optimal HTTP throughput Nginx worker_processes now defaults to 2×N (where N is the number of DP replicas) instead of 'auto'. This ensures enough HTTP handler processes to fully saturate all GPU backends under heavy concurrent load. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 09:16:05 +00:00
Jianwei Yu	3817f74d46	feat: nginx-based data parallel for optimal ASR throughput When --dp N is specified (N > 1), the launcher now starts N independent vLLM processes behind an nginx reverse proxy instead of using vLLM's built-in DP coordinator. This avoids the single-process HTTP bottleneck when handling large base64 audio payloads, achieving near-linear scaling (7.2x with 8 GPUs at 4096 concurrent requests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 07:43:32 +00:00
JianweiYu	9634518ca4	Add data parallel (DP) support to vLLM server launcher - Add --dp/--data-parallel-size flag for running independent model replicas across multiple GPUs with automatic load balancing behind a single port - Add --tp/--tensor-parallel-size flag (previously hardcoded to 1) - Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide covering DP, TP, and hybrid (DP × TP) configurations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-24 11:53:31 +00:00
Damon-Salvetore	165e17e5ed	fix: vllm-version-stable	2026-02-25 07:30:43 +00:00
YingboHAO	bb54f78d0e	feat: add hotwords support for vLLM ASR	2026-02-04 10:33:20 +00:00
YingboHAO	e26f1c263f	1	2026-02-02 13:50:27 +00:00
YingboHAO	1eb04f53a2	Replace install_deps.sh with start_server.py one-click deployment	2026-01-26 07:34:54 +00:00
YingboHAO	4df5b0582f	Add vLLM plugin support for high-performance ASR serving	2026-01-23 17:32:24 +00:00

8 Commits