6 Commits

Author SHA1 Message Date
Jianwei Yu 5cd81bb497 fix: restore sequential encoder (batch encoder causes OOM)
Batch encoder across multiple requests caused GPU OOM when vLLM
scheduler sends many audio items at once. The encoder intermediates
(~700MB per 69s audio) compete with KV cache for GPU memory.

Sequential encoding is stable and proven correct. The encoder
(267ms per request) is not the primary throughput bottleneck when
encoder cache is enabled (default).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 18:48:06 +00:00
Damon-Salvetore 165e17e5ed fix: vllm-version-stable 2026-02-25 07:30:43 +00:00
YingboHAO a4add8e52f fix backend 2026-02-08 09:58:19 +00:00
YingboHAO 0508c3e86f fix 2026-02-06 14:38:16 +00:00
YingboHAO 7761242bf3 fix 2026-02-06 05:52:48 +00:00
YingboHAO 4df5b0582f Add vLLM plugin support for high-performance ASR serving 2026-01-23 17:32:24 +00:00