Commit Graph

8 Commits

Author SHA1 Message Date
Jianwei Yu cd945395d4 feat: set nginx workers to 2×dp for optimal HTTP throughput
Nginx worker_processes now defaults to 2×N (where N is the number of DP
replicas) instead of 'auto'. This ensures enough HTTP handler processes
to fully saturate all GPU backends under heavy concurrent load.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 09:16:05 +00:00
Jianwei Yu 3817f74d46 feat: nginx-based data parallel for optimal ASR throughput
When --dp N is specified (N > 1), the launcher now starts N independent
vLLM processes behind an nginx reverse proxy instead of using vLLM's
built-in DP coordinator. This avoids the single-process HTTP bottleneck
when handling large base64 audio payloads, achieving near-linear scaling
(7.2x with 8 GPUs at 4096 concurrent requests).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 07:43:32 +00:00
JianweiYu 9634518ca4 Add data parallel (DP) support to vLLM server launcher
- Add --dp/--data-parallel-size flag for running independent model replicas
  across multiple GPUs with automatic load balancing behind a single port
- Add --tp/--tensor-parallel-size flag (previously hardcoded to 1)
- Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide
  covering DP, TP, and hybrid (DP × TP) configurations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-24 11:53:31 +00:00
Damon-Salvetore 165e17e5ed fix: vllm-version-stable 2026-02-25 07:30:43 +00:00
YingboHAO bb54f78d0e feat: add hotwords support for vLLM ASR 2026-02-04 10:33:20 +00:00
YingboHAO e26f1c263f 1 2026-02-02 13:50:27 +00:00
YingboHAO 1eb04f53a2 Replace install_deps.sh with start_server.py one-click deployment 2026-01-26 07:34:54 +00:00
YingboHAO 4df5b0582f Add vLLM plugin support for high-performance ASR serving 2026-01-23 17:32:24 +00:00