VibeVoice

Author	SHA1	Message	Date
Zhiliang Peng	3c976491d4	Update README with new TTS report and ICLR oral acceptance Updated TTS report link and added conference acceptance note.	2026-03-31 12:24:50 +08:00
Jianwei Yu	c766f12e23	docs: add Vibing download links Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 03:28:59 +00:00
Jianwei Yu	8f133837dc	docs: add Vibing demo video to news section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 02:33:10 +00:00
Jianwei Yu	0857b6d59f	docs: fix news bold formatting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 01:48:20 +00:00
Jianwei Yu	c8371b6bb6	docs: add Vibing voice input adoption news Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 01:43:59 +00:00
Jianwei Yu	b691f99191	docs: add Trendshift #1 trending badge to README Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 17:07:00 +00:00
Jianwei Yu	5cd81bb497	fix: restore sequential encoder (batch encoder causes OOM) Batch encoder across multiple requests caused GPU OOM when vLLM scheduler sends many audio items at once. The encoder intermediates (~700MB per 69s audio) compete with KV cache for GPU memory. Sequential encoding is stable and proven correct. The encoder (267ms per request) is not the primary throughput bottleneck when encoder cache is enabled (default). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 18:48:06 +00:00
Jianwei Yu	cd945395d4	feat: set nginx workers to 2×dp for optimal HTTP throughput Nginx worker_processes now defaults to 2×N (where N is the number of DP replicas) instead of 'auto'. This ensures enough HTTP handler processes to fully saturate all GPU backends under heavy concurrent load. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 09:16:05 +00:00
Jianwei Yu	e6b65abb9b	fix: auto-tune per-worker env vars in DP mode Pass VIBEVOICE_FFMPEG_MAX_CONCURRENCY and VLLM_MEDIA_LOADING_THREAD_COUNT to each worker subprocess so they inherit the correct settings regardless of how the container is launched (--skip-deps or not). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 07:57:49 +00:00
Jianwei Yu	3817f74d46	feat: nginx-based data parallel for optimal ASR throughput When --dp N is specified (N > 1), the launcher now starts N independent vLLM processes behind an nginx reverse proxy instead of using vLLM's built-in DP coordinator. This avoids the single-process HTTP bottleneck when handling large base64 audio payloads, achieving near-linear scaling (7.2x with 8 GPUs at 4096 concurrent requests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 07:43:32 +00:00
JianweiYu	9634518ca4	Add data parallel (DP) support to vLLM server launcher - Add --dp/--data-parallel-size flag for running independent model replicas across multiple GPUs with automatic load balancing behind a single port - Add --tp/--tensor-parallel-size flag (previously hardcoded to 1) - Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide covering DP, TP, and hybrid (DP × TP) configurations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-24 11:53:31 +00:00
JianweiYu	09ca114fa3	Add Gradio ASR demo with video support and demo audio/video files - Add gradio_asr_demo_api_video.py: Gradio web UI supporting audio/video upload, streaming output, hotwords, and Cloudflare tunnel - Add demo/asr_demo/: demo audio and video files for the Gradio interface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-22 06:11:51 +00:00
Zhiliang Peng	4c419978c9	Merge pull request #255 from sd983527/main Add news about VibeVoice ASR Transformers integration	2026-03-06 14:08:47 +08:00
Yan Xia	7e73beec97	Add news about VibeVoice ASR Transformers integration - Added announcement that VibeVoice ASR is now part of Transformers v5.3.0 release - Linked to the official Hugging Face Transformers release page - Positioned as the latest news item with today's date Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 13:32:21 +08:00
Li Dong	7ef9dbe300	Merge pull request #247 from Damon-Salvetore/fix/vllm-version-compat fix: vllm-version-stable	2026-02-28 11:12:24 +08:00
Damon-Salvetore	165e17e5ed	fix: vllm-version-stable	2026-02-25 07:30:43 +00:00
Jianwei Yu	1807b858d4	Merge pull request #236 from Damon-Salvetore/main fix backend	2026-02-10 00:07:05 +08:00
YingboHAO	a4add8e52f	fix backend	2026-02-08 09:58:19 +00:00
Jianwei Yu	ce3d40c78f	Merge pull request #233 from Damon-Salvetore/main Add hot words support	2026-02-07 12:32:03 +08:00
YingboHAO	0508c3e86f	fix	2026-02-06 14:38:16 +00:00
YingboHAO	7761242bf3	fix	2026-02-06 05:52:48 +00:00
YingboHAO	bb54f78d0e	feat: add hotwords support for vLLM ASR	2026-02-04 10:33:20 +00:00
YaoyaoChang	0aa8cb4c64	fx default speaker	2026-02-03 00:35:04 -08:00
YaoyaoChang	e43c1e2cdb	streaming use transformers==4.51.3	2026-02-03 00:30:52 -08:00
Jianwei Yu	e16491d65e	Merge pull request #228 from Damon-Salvetore/vllm-1 [Fix] Resolve occasional infinite loops during vLLM inference	2026-02-03 10:38:40 +08:00
YingboHAO	e26f1c263f	1	2026-02-02 13:50:27 +00:00
YingboHAO	0055161273	Add test_api_auto_recover.py and test audio files	2026-02-02 13:49:01 +00:00
Zhiliang Peng	b2aee8015c	Delete docs/VibeVoice-ASR-Report.pdf	2026-01-28 19:33:37 +08:00
YaoyaoChang	2ee94fab1d	update ASR architechture figure	2026-01-27 05:11:35 -08:00
YaoyaoChang	3140709188	update README	2026-01-27 21:06:31 +08:00
YaoyaoChang	c435ae05d5	update README Added a section on LoRA fine-tuning to the ASR documentation.	2026-01-27 21:01:40 +08:00
YaoyaoChang	0e1a0d39fd	update README	2026-01-27 20:59:25 +08:00
YaoyaoChang	142a00112e	update ASR README: multilingual	2026-01-27 20:58:10 +08:00
YaoyaoChang	4648c50ea0	update ASR Technical Report link to Arxiv	2026-01-27 12:58:06 +08:00
MLSDCherryPick	cbbdb69474	add VibeVoice-ASR technique report arxiv link	2026-01-27 02:45:16 +00:00
YaoyaoChang	a69e77c036	1. unify env for TTS and ASR; 2. avoid transformers 5.0.0 temporarily	2026-01-26 03:29:02 -08:00
YaoyaoChang	a00f431e14	tts support latest transformers(4.57.6)	2026-01-26 03:28:10 -08:00
Jianwei Yu	c4ee4fe716	Merge pull request #213 from Damon-Salvetore/vllm-1 Replace install_deps.sh with start_server.py one-click deployment	2026-01-26 16:49:38 +08:00
YingboHAO	1eb04f53a2	Replace install_deps.sh with start_server.py one-click deployment	2026-01-26 07:34:54 +00:00
ikeshav26	d11d756b61	fix: issues in error handling	2026-01-26 14:18:34 +08:00
YaoyaoChang	0926f242ce	add CONTRIBUTING.md	2026-01-25 22:07:40 -08:00
DDXDB	1c5dbc4190	Add XPU sdpa Support	2026-01-26 14:00:31 +08:00
ThanhNguyxn	523713e806	fix(demo): add MPS and CPU support for ASR inference demo - Add MPS device choice and auto-detect MPS availability - Change default attention implementation to 'auto' with smart fallback - Auto-detect flash_attention_2 availability on CUDA, fallback to sdpa - Use sdpa for MPS and CPU devices (flash_attention_2 not supported) - Use float32 dtype for MPS/CPU devices for better compatibility Fixes #206	2026-01-26 13:56:11 +08:00
ThanhNguyxn	5cf026569e	fix: handle torch.dtype serialization in config classes Fixes #199 - Object of type dtype is not JSON serializable When loading models with torch_dtype as a torch.dtype object (e.g., torch.bfloat16), transformers would fail to serialize the config to JSON for logging purposes, raising TypeError. This fix: - Adds _convert_dtype_to_string() helper function to convert torch.dtype objects to their string representation (e.g., 'bfloat16') - Overrides to_dict() method in VibeVoiceConfig, VibeVoiceASRConfig, and VibeVoiceStreamingConfig to apply this conversion The fix is backward compatible - string dtype values and None values continue to work as expected.	2026-01-26 13:45:55 +08:00
YaoyaoChang	e67b15f47d	update	2026-01-25 21:41:42 -08:00
MLSDCherryPick	d9068541cf	1	2026-01-25 16:11:02 +00:00
YaoyaoChang	c28e23f80c	update language distribution figure	2026-01-25 00:15:11 -08:00
MLSDCherryPick	81bf8baa89	1	2026-01-25 05:14:39 +00:00
MLSDCherryPick	e4036e46f4	1	2026-01-24 08:28:05 +00:00
Jianwei Yu	3c50e50d18	Merge pull request #203 from Damon-Salvetore/vibevoice-vllm Add vLLM plugin support for high-performance ASR serving	2026-01-24 16:17:10 +08:00

1 2

98 Commits