VibeVoice

Author	SHA1	Message	Date
Jianwei Yu	cd945395d4	feat: set nginx workers to 2×dp for optimal HTTP throughput Nginx worker_processes now defaults to 2×N (where N is the number of DP replicas) instead of 'auto'. This ensures enough HTTP handler processes to fully saturate all GPU backends under heavy concurrent load. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 09:16:05 +00:00
Jianwei Yu	e6b65abb9b	fix: auto-tune per-worker env vars in DP mode Pass VIBEVOICE_FFMPEG_MAX_CONCURRENCY and VLLM_MEDIA_LOADING_THREAD_COUNT to each worker subprocess so they inherit the correct settings regardless of how the container is launched (--skip-deps or not). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 07:57:49 +00:00
Jianwei Yu	3817f74d46	feat: nginx-based data parallel for optimal ASR throughput When --dp N is specified (N > 1), the launcher now starts N independent vLLM processes behind an nginx reverse proxy instead of using vLLM's built-in DP coordinator. This avoids the single-process HTTP bottleneck when handling large base64 audio payloads, achieving near-linear scaling (7.2x with 8 GPUs at 4096 concurrent requests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 07:43:32 +00:00
JianweiYu	9634518ca4	Add data parallel (DP) support to vLLM server launcher - Add --dp/--data-parallel-size flag for running independent model replicas across multiple GPUs with automatic load balancing behind a single port - Add --tp/--tensor-parallel-size flag (previously hardcoded to 1) - Update docs/vibevoice-vllm-asr.md with multi-GPU deployment guide covering DP, TP, and hybrid (DP × TP) configurations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-24 11:53:31 +00:00
JianweiYu	09ca114fa3	Add Gradio ASR demo with video support and demo audio/video files - Add gradio_asr_demo_api_video.py: Gradio web UI supporting audio/video upload, streaming output, hotwords, and Cloudflare tunnel - Add demo/asr_demo/: demo audio and video files for the Gradio interface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-22 06:11:51 +00:00
Zhiliang Peng	4c419978c9	Merge pull request #255 from sd983527/main Add news about VibeVoice ASR Transformers integration	2026-03-06 14:08:47 +08:00
Yan Xia	7e73beec97	Add news about VibeVoice ASR Transformers integration - Added announcement that VibeVoice ASR is now part of Transformers v5.3.0 release - Linked to the official Hugging Face Transformers release page - Positioned as the latest news item with today's date Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 13:32:21 +08:00
Li Dong	7ef9dbe300	Merge pull request #247 from Damon-Salvetore/fix/vllm-version-compat fix: vllm-version-stable	2026-02-28 11:12:24 +08:00
Damon-Salvetore	165e17e5ed	fix: vllm-version-stable	2026-02-25 07:30:43 +00:00
Jianwei Yu	1807b858d4	Merge pull request #236 from Damon-Salvetore/main fix backend	2026-02-10 00:07:05 +08:00
YingboHAO	a4add8e52f	fix backend	2026-02-08 09:58:19 +00:00
Jianwei Yu	ce3d40c78f	Merge pull request #233 from Damon-Salvetore/main Add hot words support	2026-02-07 12:32:03 +08:00
YingboHAO	0508c3e86f	fix	2026-02-06 14:38:16 +00:00
YingboHAO	7761242bf3	fix	2026-02-06 05:52:48 +00:00
YingboHAO	bb54f78d0e	feat: add hotwords support for vLLM ASR	2026-02-04 10:33:20 +00:00
YaoyaoChang	0aa8cb4c64	fx default speaker	2026-02-03 00:35:04 -08:00
YaoyaoChang	e43c1e2cdb	streaming use transformers==4.51.3	2026-02-03 00:30:52 -08:00
Jianwei Yu	e16491d65e	Merge pull request #228 from Damon-Salvetore/vllm-1 [Fix] Resolve occasional infinite loops during vLLM inference	2026-02-03 10:38:40 +08:00
YingboHAO	e26f1c263f	1	2026-02-02 13:50:27 +00:00
YingboHAO	0055161273	Add test_api_auto_recover.py and test audio files	2026-02-02 13:49:01 +00:00
Zhiliang Peng	b2aee8015c	Delete docs/VibeVoice-ASR-Report.pdf	2026-01-28 19:33:37 +08:00
YaoyaoChang	2ee94fab1d	update ASR architechture figure	2026-01-27 05:11:35 -08:00
YaoyaoChang	3140709188	update README	2026-01-27 21:06:31 +08:00
YaoyaoChang	c435ae05d5	update README Added a section on LoRA fine-tuning to the ASR documentation.	2026-01-27 21:01:40 +08:00
YaoyaoChang	0e1a0d39fd	update README	2026-01-27 20:59:25 +08:00
YaoyaoChang	142a00112e	update ASR README: multilingual	2026-01-27 20:58:10 +08:00
YaoyaoChang	4648c50ea0	update ASR Technical Report link to Arxiv	2026-01-27 12:58:06 +08:00
MLSDCherryPick	cbbdb69474	add VibeVoice-ASR technique report arxiv link	2026-01-27 02:45:16 +00:00
YaoyaoChang	a69e77c036	1. unify env for TTS and ASR; 2. avoid transformers 5.0.0 temporarily	2026-01-26 03:29:02 -08:00
YaoyaoChang	a00f431e14	tts support latest transformers(4.57.6)	2026-01-26 03:28:10 -08:00
Jianwei Yu	c4ee4fe716	Merge pull request #213 from Damon-Salvetore/vllm-1 Replace install_deps.sh with start_server.py one-click deployment	2026-01-26 16:49:38 +08:00
YingboHAO	1eb04f53a2	Replace install_deps.sh with start_server.py one-click deployment	2026-01-26 07:34:54 +00:00
ikeshav26	d11d756b61	fix: issues in error handling	2026-01-26 14:18:34 +08:00
YaoyaoChang	0926f242ce	add CONTRIBUTING.md	2026-01-25 22:07:40 -08:00
DDXDB	1c5dbc4190	Add XPU sdpa Support	2026-01-26 14:00:31 +08:00
ThanhNguyxn	523713e806	fix(demo): add MPS and CPU support for ASR inference demo - Add MPS device choice and auto-detect MPS availability - Change default attention implementation to 'auto' with smart fallback - Auto-detect flash_attention_2 availability on CUDA, fallback to sdpa - Use sdpa for MPS and CPU devices (flash_attention_2 not supported) - Use float32 dtype for MPS/CPU devices for better compatibility Fixes #206	2026-01-26 13:56:11 +08:00
ThanhNguyxn	5cf026569e	fix: handle torch.dtype serialization in config classes Fixes #199 - Object of type dtype is not JSON serializable When loading models with torch_dtype as a torch.dtype object (e.g., torch.bfloat16), transformers would fail to serialize the config to JSON for logging purposes, raising TypeError. This fix: - Adds _convert_dtype_to_string() helper function to convert torch.dtype objects to their string representation (e.g., 'bfloat16') - Overrides to_dict() method in VibeVoiceConfig, VibeVoiceASRConfig, and VibeVoiceStreamingConfig to apply this conversion The fix is backward compatible - string dtype values and None values continue to work as expected.	2026-01-26 13:45:55 +08:00
YaoyaoChang	e67b15f47d	update	2026-01-25 21:41:42 -08:00
MLSDCherryPick	d9068541cf	1	2026-01-25 16:11:02 +00:00
YaoyaoChang	c28e23f80c	update language distribution figure	2026-01-25 00:15:11 -08:00
MLSDCherryPick	81bf8baa89	1	2026-01-25 05:14:39 +00:00
MLSDCherryPick	e4036e46f4	1	2026-01-24 08:28:05 +00:00
Jianwei Yu	3c50e50d18	Merge pull request #203 from Damon-Salvetore/vibevoice-vllm Add vLLM plugin support for high-performance ASR serving	2026-01-24 16:17:10 +08:00
MLSDCherryPick	71356b87dd	Language support	2026-01-24 05:17:26 +00:00
MLSDCherryPick	7d12252de3	Language support	2026-01-24 05:11:34 +00:00
MLSDCherryPick	a3e99daedd	Language support	2026-01-24 05:10:47 +00:00
YingboHAO	04f8bc40b0	Update test_api.py	2026-01-23 17:47:31 +00:00
YingboHAO	4df5b0582f	Add vLLM plugin support for high-performance ASR serving	2026-01-23 17:32:24 +00:00
YaoyaoChang	c0c2af984e	update README for finetuning-asr	2026-01-22 06:20:11 -08:00
Zhiliang Peng	05e1a022e5	Update FT README Clarified the purpose of the toy dataset in the README.	2026-01-22 21:49:47 +08:00

1 2

91 Commits