- Add --dp/--data-parallel-size flag for running independent model replicas
across multiple GPUs with automatic load balancing behind a single port
- Add --tp/--tensor-parallel-size flag (previously hardcoded to 1)
- Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide
covering DP, TP, and hybrid (DP × TP) configurations
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add gradio_asr_demo_api_video.py: Gradio web UI supporting audio/video upload,
streaming output, hotwords, and Cloudflare tunnel
- Add demo/asr_demo/: demo audio and video files for the Gradio interface
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>