3817f74d46
When --dp N is specified (N > 1), the launcher now starts N independent vLLM processes behind an nginx reverse proxy instead of using vLLM's built-in DP coordinator. This avoids the single-process HTTP bottleneck when handling large base64 audio payloads, achieving near-linear scaling (7.2x with 8 GPUs at 4096 concurrent requests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>