feat: set nginx workers to 2×dp for optimal HTTP throughput
Nginx worker_processes now defaults to 2×N (where N is the number of DP replicas) instead of 'auto'. This ensures enough HTTP handler processes to fully saturate all GPU backends under heavy concurrent load. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -47,9 +47,9 @@ The launcher supports two types of GPU parallelism via `--tp` and `--dp` flags:
|
|||||||
|
|
||||||
### Data Parallel (Recommended for scaling throughput)
|
### Data Parallel (Recommended for scaling throughput)
|
||||||
|
|
||||||
Run 4 independent replicas on 4 GPUs with automatic load balancing behind a single port.
|
Run N independent replicas on N GPUs with automatic load balancing behind a single port.
|
||||||
When `--dp N` is specified (N > 1), the launcher automatically starts N independent vLLM
|
When `--dp N` is specified (N > 1), the launcher automatically starts N independent vLLM
|
||||||
processes behind an nginx reverse proxy for optimal throughput:
|
processes behind an nginx reverse proxy (2×N workers) for optimal throughput:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run -d --gpus '"device=0,1,2,3"' --name vibevoice-vllm \
|
docker run -d --gpus '"device=0,1,2,3"' --name vibevoice-vllm \
|
||||||
|
|||||||
@@ -146,11 +146,18 @@ def _install_nginx() -> None:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def _write_nginx_config(frontend_port: int, backend_ports: list[int]) -> str:
|
def _write_nginx_config(frontend_port: int, backend_ports: list[int],
|
||||||
"""Write nginx config for round-robin load balancing."""
|
num_workers: int = 0) -> str:
|
||||||
|
"""Write nginx config for round-robin load balancing.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
num_workers: Number of nginx worker processes. 0 = auto (2 × num backends).
|
||||||
|
"""
|
||||||
|
if num_workers <= 0:
|
||||||
|
num_workers = len(backend_ports) * 2
|
||||||
backends = "\n".join(f" server 127.0.0.1:{p};" for p in backend_ports)
|
backends = "\n".join(f" server 127.0.0.1:{p};" for p in backend_ports)
|
||||||
config = textwrap.dedent(f"""\
|
config = textwrap.dedent(f"""\
|
||||||
worker_processes auto;
|
worker_processes {num_workers};
|
||||||
worker_rlimit_nofile 65536;
|
worker_rlimit_nofile 65536;
|
||||||
error_log /dev/stderr warn;
|
error_log /dev/stderr warn;
|
||||||
pid /tmp/nginx.pid;
|
pid /tmp/nginx.pid;
|
||||||
|
|||||||
Reference in New Issue
Block a user