VibeVoice

Files

T

History

Jianwei Yu 5cd81bb497 fix: restore sequential encoder (batch encoder causes OOM)

Batch encoder across multiple requests caused GPU OOM when vLLM
scheduler sends many audio items at once. The encoder intermediates
(~700MB per 69s audio) compete with KV cache for GPU memory.

Sequential encoding is stable and proven correct. The encoder
(267ms per request) is not the primary throughput bottleneck when
encoder cache is enabled (default).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-03-27 18:48:06 +00:00

scripts

feat: set nginx workers to 2×dp for optimal HTTP throughput

2026-03-27 09:16:05 +00:00

tests

feat: add hotwords support for vLLM ASR

2026-02-04 10:33:20 +00:00

tools

Add vLLM plugin support for high-performance ASR serving

2026-01-23 17:32:24 +00:00

__init__.py

fix

2026-02-06 14:38:16 +00:00

inputs.py

Add vLLM plugin support for high-performance ASR serving

2026-01-23 17:32:24 +00:00

model.py

fix: restore sequential encoder (batch encoder causes OOM)

2026-03-27 18:48:06 +00:00