VibeVoice

Author	SHA1	Message	Date
DDXDB	1c5dbc4190	Add XPU sdpa Support	2026-01-26 14:00:31 +08:00
ThanhNguyxn	523713e806	fix(demo): add MPS and CPU support for ASR inference demo - Add MPS device choice and auto-detect MPS availability - Change default attention implementation to 'auto' with smart fallback - Auto-detect flash_attention_2 availability on CUDA, fallback to sdpa - Use sdpa for MPS and CPU devices (flash_attention_2 not supported) - Use float32 dtype for MPS/CPU devices for better compatibility Fixes #206	2026-01-26 13:56:11 +08:00
ThanhNguyxn	5cf026569e	fix: handle torch.dtype serialization in config classes Fixes #199 - Object of type dtype is not JSON serializable When loading models with torch_dtype as a torch.dtype object (e.g., torch.bfloat16), transformers would fail to serialize the config to JSON for logging purposes, raising TypeError. This fix: - Adds _convert_dtype_to_string() helper function to convert torch.dtype objects to their string representation (e.g., 'bfloat16') - Overrides to_dict() method in VibeVoiceConfig, VibeVoiceASRConfig, and VibeVoiceStreamingConfig to apply this conversion The fix is backward compatible - string dtype values and None values continue to work as expected.	2026-01-26 13:45:55 +08:00
YaoyaoChang	e67b15f47d	update	2026-01-25 21:41:42 -08:00
MLSDCherryPick	d9068541cf	1	2026-01-25 16:11:02 +00:00
YaoyaoChang	c28e23f80c	update language distribution figure	2026-01-25 00:15:11 -08:00
MLSDCherryPick	81bf8baa89	1	2026-01-25 05:14:39 +00:00
MLSDCherryPick	e4036e46f4	1	2026-01-24 08:28:05 +00:00
Jianwei Yu	3c50e50d18	Merge pull request #203 from Damon-Salvetore/vibevoice-vllm Add vLLM plugin support for high-performance ASR serving	2026-01-24 16:17:10 +08:00
MLSDCherryPick	71356b87dd	Language support	2026-01-24 05:17:26 +00:00
MLSDCherryPick	7d12252de3	Language support	2026-01-24 05:11:34 +00:00
MLSDCherryPick	a3e99daedd	Language support	2026-01-24 05:10:47 +00:00
YingboHAO	04f8bc40b0	Update test_api.py	2026-01-23 17:47:31 +00:00
YingboHAO	4df5b0582f	Add vLLM plugin support for high-performance ASR serving	2026-01-23 17:32:24 +00:00
YaoyaoChang	c0c2af984e	update README for finetuning-asr	2026-01-22 06:20:11 -08:00
Zhiliang Peng	05e1a022e5	Update FT README Clarified the purpose of the toy dataset in the README.	2026-01-22 21:49:47 +08:00
Zhiliang Peng	59c90e7633	Merge pull request #197 from pengzhiliang/vibevoice_asr_ft add VibeVoice-ASR finetuning code	2026-01-22 21:45:35 +08:00
pengzhiliang	8516386ce4	update ft readme	2026-01-22 05:44:34 -08:00
pengzhiliang	cef628e1b5	update ft code	2026-01-22 05:20:25 -08:00
pengzhiliang	db2f1d9ff3	init vibevoice-asr ft	2026-01-22 05:04:33 -08:00
YaoyaoChang	875115c000	update README	2026-01-22 01:28:21 -08:00
YaoyaoChang	c0d7616e5a	update README	2026-01-22 01:26:44 -08:00
YaoyaoChang	0e0caf2f08	update README	2026-01-22 01:25:30 -08:00
YaoyaoChang	96f8ac6a49	update README	2026-01-22 01:24:58 -08:00
YaoyaoChang	0f8954a600	update README	2026-01-22 01:21:56 -08:00
YaoyaoChang	eb3533d791	update README	2026-01-22 00:51:33 -08:00
YaoyaoChang	5022277022	update README	2026-01-22 00:51:00 -08:00
YaoyaoChang	6c523ec087	update README	2026-01-22 00:49:58 -08:00
YaoyaoChang	883e3acc67	update README	2026-01-22 00:39:49 -08:00
YaoyaoChang	32a7040ce0	restructure README	2026-01-22 00:37:22 -08:00
YaoyaoChang	ce90a49960	fix env bug	2026-01-21 22:03:52 -08:00
MLSDCherryPick	1b6e8b56ea	asr evaluation	2026-01-22 03:44:34 +00:00
MLSDCherryPick	84e469c68e	asr evaluation	2026-01-22 03:43:31 +00:00
MLSDCherryPick	c03a707fd6	add video demo	2026-01-21 19:43:50 +00:00
YaoyaoChang	a3750c229b	Revise VibeVoice-ASR documentation for clarity Updated the description and key features of VibeVoice-ASR to clarify its capabilities and improve accuracy in transcription.	2026-01-22 02:59:10 +08:00
YaoyaoChang	c4352fee63	fx	2026-01-21 10:36:27 -08:00
YaoyaoChang	616a167275	add ASR playground link	2026-01-21 10:26:17 -08:00
YaoyaoChang	f7c6d2dec9	update asr eval results	2026-01-21 09:50:24 -08:00
YaoyaoChang	c9c778cc58	fx	2026-01-21 08:25:53 -08:00
Zhiliang Peng	56cb11e7b2	Add VibeVoice-ASR	2026-01-21 22:18:33 +08:00
YaoyaoChang	6c7369bb31	fix	2025-12-16 17:12:12 -08:00
YaoyaoChang	4adbe76674	more experimental voices	2025-12-16 04:21:09 -08:00
Wenhui Wang	d295d1e1d0	Update vibevoice-realtime-0.5b.md	2025-12-09 12:28:32 +08:00
YaoyaoChang	eb09b39cc3	fix	2025-12-08 20:20:11 -08:00
RaihanulHaque	9b06438560	feat: add __init__.py files to enable module imports Add __init__.py files to vibevoice/modular and vibevoice/processor directories to properly export classes and enable package imports. This allows users to import the package after installation: - from vibevoice import VibeVoiceStreamingForConditionalGenerationInference - from vibevoice.modular import VibeVoiceStreamingConfig - from vibevoice.processor import VibeVoiceStreamingProcessor Fixes import errors when using `pip install -e .`	2025-12-09 10:48:11 +08:00
YaoyaoChang	c1c5e40bef	add star history	2025-12-08 18:41:51 -08:00
Wenhui Wang	73a9711d8e	Update vibevoice_tokenizer_processor.py	2025-12-09 10:16:55 +08:00
YaoyaoChang	04d19f8352	add experimental multi-lingual speakers	2025-12-08 08:29:00 -08:00
hydropix	79470ff576	Fix: Remove unnecessary Path() conversion for HuggingFace model IDs The model_path was being converted to a Path object and then back to string for from_pretrained() calls. This is unnecessary since HuggingFace accepts strings directly, and causes issues on Windows where Path() converts forward slashes to backslashes (e.g., "microsoft/VibeVoice-Realtime-0.5B" becomes "microsoft\VibeVoice-Realtime-0.5B"). This fix: - Keeps model_path as a string (no behavior change on Linux/macOS) - Fixes Windows compatibility for HuggingFace repo IDs - Removes redundant str() conversions	2025-12-08 10:27:58 +08:00
Wenhui Wang	a507d67f8e	Update README	2025-12-05 21:49:07 +08:00

1 2

57 Commits