VibeVoice

Author	SHA1	Message	Date
Jianwei Yu	5cd81bb497	fix: restore sequential encoder (batch encoder causes OOM) Batch encoder across multiple requests caused GPU OOM when vLLM scheduler sends many audio items at once. The encoder intermediates (~700MB per 69s audio) compete with KV cache for GPU memory. Sequential encoding is stable and proven correct. The encoder (267ms per request) is not the primary throughput bottleneck when encoder cache is enabled (default). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 18:48:06 +00:00
Damon-Salvetore	165e17e5ed	fix: vllm-version-stable	2026-02-25 07:30:43 +00:00
YingboHAO	a4add8e52f	fix backend	2026-02-08 09:58:19 +00:00
YingboHAO	0508c3e86f	fix	2026-02-06 14:38:16 +00:00
YingboHAO	7761242bf3	fix	2026-02-06 05:52:48 +00:00
YingboHAO	4df5b0582f	Add vLLM plugin support for high-performance ASR serving	2026-01-23 17:32:24 +00:00