feat: set nginx workers to 2×dp for optimal HTTP throughput

Nginx worker_processes now defaults to 2×N (where N is the number of DP
replicas) instead of 'auto'. This ensures enough HTTP handler processes
to fully saturate all GPU backends under heavy concurrent load.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Jianwei Yu
2026-03-27 09:16:05 +00:00
parent e6b65abb9b
commit cd945395d4
2 changed files with 12 additions and 5 deletions
+2 -2
View File
@@ -47,9 +47,9 @@ The launcher supports two types of GPU parallelism via `--tp` and `--dp` flags:
### Data Parallel (Recommended for scaling throughput)
Run 4 independent replicas on 4 GPUs with automatic load balancing behind a single port.
Run N independent replicas on N GPUs with automatic load balancing behind a single port.
When `--dp N` is specified (N > 1), the launcher automatically starts N independent vLLM
processes behind an nginx reverse proxy for optimal throughput:
processes behind an nginx reverse proxy (2×N workers) for optimal throughput:
```bash
docker run -d --gpus '"device=0,1,2,3"' --name vibevoice-vllm \