Batch encoder across multiple requests caused GPU OOM when vLLM
scheduler sends many audio items at once. The encoder intermediates
(~700MB per 69s audio) compete with KV cache for GPU memory.
Sequential encoding is stable and proven correct. The encoder
(267ms per request) is not the primary throughput bottleneck when
encoder cache is enabled (default).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Nginx worker_processes now defaults to 2×N (where N is the number of DP
replicas) instead of 'auto'. This ensures enough HTTP handler processes
to fully saturate all GPU backends under heavy concurrent load.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass VIBEVOICE_FFMPEG_MAX_CONCURRENCY and VLLM_MEDIA_LOADING_THREAD_COUNT
to each worker subprocess so they inherit the correct settings regardless
of how the container is launched (--skip-deps or not).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When --dp N is specified (N > 1), the launcher now starts N independent
vLLM processes behind an nginx reverse proxy instead of using vLLM's
built-in DP coordinator. This avoids the single-process HTTP bottleneck
when handling large base64 audio payloads, achieving near-linear scaling
(7.2x with 8 GPUs at 4096 concurrent requests).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add --dp/--data-parallel-size flag for running independent model replicas
across multiple GPUs with automatic load balancing behind a single port
- Add --tp/--tensor-parallel-size flag (previously hardcoded to 1)
- Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide
covering DP, TP, and hybrid (DP × TP) configurations
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add gradio_asr_demo_api_video.py: Gradio web UI supporting audio/video upload,
streaming output, hotwords, and Cloudflare tunnel
- Add demo/asr_demo/: demo audio and video files for the Gradio interface
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>