Commit Graph

57 Commits

Author SHA1 Message Date
DDXDB 1c5dbc4190 Add XPU sdpa Support 2026-01-26 14:00:31 +08:00
ThanhNguyxn 523713e806 fix(demo): add MPS and CPU support for ASR inference demo
- Add MPS device choice and auto-detect MPS availability
- Change default attention implementation to 'auto' with smart fallback
- Auto-detect flash_attention_2 availability on CUDA, fallback to sdpa
- Use sdpa for MPS and CPU devices (flash_attention_2 not supported)
- Use float32 dtype for MPS/CPU devices for better compatibility

Fixes #206
2026-01-26 13:56:11 +08:00
ThanhNguyxn 5cf026569e fix: handle torch.dtype serialization in config classes
Fixes #199 - Object of type dtype is not JSON serializable

When loading models with torch_dtype as a torch.dtype object (e.g.,
torch.bfloat16), transformers would fail to serialize the config to
JSON for logging purposes, raising TypeError.

This fix:
- Adds _convert_dtype_to_string() helper function to convert torch.dtype
  objects to their string representation (e.g., 'bfloat16')
- Overrides to_dict() method in VibeVoiceConfig, VibeVoiceASRConfig,
  and VibeVoiceStreamingConfig to apply this conversion

The fix is backward compatible - string dtype values and None values
continue to work as expected.
2026-01-26 13:45:55 +08:00
YaoyaoChang e67b15f47d update 2026-01-25 21:41:42 -08:00
MLSDCherryPick d9068541cf 1 2026-01-25 16:11:02 +00:00
YaoyaoChang c28e23f80c update language distribution figure 2026-01-25 00:15:11 -08:00
MLSDCherryPick 81bf8baa89 1 2026-01-25 05:14:39 +00:00
MLSDCherryPick e4036e46f4 1 2026-01-24 08:28:05 +00:00
Jianwei Yu 3c50e50d18 Merge pull request #203 from Damon-Salvetore/vibevoice-vllm
Add vLLM plugin support for high-performance ASR serving
2026-01-24 16:17:10 +08:00
MLSDCherryPick 71356b87dd Language support 2026-01-24 05:17:26 +00:00
MLSDCherryPick 7d12252de3 Language support 2026-01-24 05:11:34 +00:00
MLSDCherryPick a3e99daedd Language support 2026-01-24 05:10:47 +00:00
YingboHAO 04f8bc40b0 Update test_api.py 2026-01-23 17:47:31 +00:00
YingboHAO 4df5b0582f Add vLLM plugin support for high-performance ASR serving 2026-01-23 17:32:24 +00:00
YaoyaoChang c0c2af984e update README for finetuning-asr 2026-01-22 06:20:11 -08:00
Zhiliang Peng 05e1a022e5 Update FT README
Clarified the purpose of the toy dataset in the README.
2026-01-22 21:49:47 +08:00
Zhiliang Peng 59c90e7633 Merge pull request #197 from pengzhiliang/vibevoice_asr_ft
add VibeVoice-ASR finetuning code
2026-01-22 21:45:35 +08:00
pengzhiliang 8516386ce4 update ft readme 2026-01-22 05:44:34 -08:00
pengzhiliang cef628e1b5 update ft code 2026-01-22 05:20:25 -08:00
pengzhiliang db2f1d9ff3 init vibevoice-asr ft 2026-01-22 05:04:33 -08:00
YaoyaoChang 875115c000 update README 2026-01-22 01:28:21 -08:00
YaoyaoChang c0d7616e5a update README 2026-01-22 01:26:44 -08:00
YaoyaoChang 0e0caf2f08 update README 2026-01-22 01:25:30 -08:00
YaoyaoChang 96f8ac6a49 update README 2026-01-22 01:24:58 -08:00
YaoyaoChang 0f8954a600 update README 2026-01-22 01:21:56 -08:00
YaoyaoChang eb3533d791 update README 2026-01-22 00:51:33 -08:00
YaoyaoChang 5022277022 update README 2026-01-22 00:51:00 -08:00
YaoyaoChang 6c523ec087 update README 2026-01-22 00:49:58 -08:00
YaoyaoChang 883e3acc67 update README 2026-01-22 00:39:49 -08:00
YaoyaoChang 32a7040ce0 restructure README 2026-01-22 00:37:22 -08:00
YaoyaoChang ce90a49960 fix env bug 2026-01-21 22:03:52 -08:00
MLSDCherryPick 1b6e8b56ea asr evaluation 2026-01-22 03:44:34 +00:00
MLSDCherryPick 84e469c68e asr evaluation 2026-01-22 03:43:31 +00:00
MLSDCherryPick c03a707fd6 add video demo 2026-01-21 19:43:50 +00:00
YaoyaoChang a3750c229b Revise VibeVoice-ASR documentation for clarity
Updated the description and key features of VibeVoice-ASR to clarify its capabilities and improve accuracy in transcription.
2026-01-22 02:59:10 +08:00
YaoyaoChang c4352fee63 fx 2026-01-21 10:36:27 -08:00
YaoyaoChang 616a167275 add ASR playground link 2026-01-21 10:26:17 -08:00
YaoyaoChang f7c6d2dec9 update asr eval results 2026-01-21 09:50:24 -08:00
YaoyaoChang c9c778cc58 fx 2026-01-21 08:25:53 -08:00
Zhiliang Peng 56cb11e7b2 Add VibeVoice-ASR 2026-01-21 22:18:33 +08:00
YaoyaoChang 6c7369bb31 fix 2025-12-16 17:12:12 -08:00
YaoyaoChang 4adbe76674 more experimental voices 2025-12-16 04:21:09 -08:00
Wenhui Wang d295d1e1d0 Update vibevoice-realtime-0.5b.md 2025-12-09 12:28:32 +08:00
YaoyaoChang eb09b39cc3 fix 2025-12-08 20:20:11 -08:00
RaihanulHaque 9b06438560 feat: add __init__.py files to enable module imports
Add __init__.py files to vibevoice/modular and vibevoice/processor
directories to properly export classes and enable package imports.

This allows users to import the package after installation:
- from vibevoice import VibeVoiceStreamingForConditionalGenerationInference
- from vibevoice.modular import VibeVoiceStreamingConfig
- from vibevoice.processor import VibeVoiceStreamingProcessor

Fixes import errors when using `pip install -e .`
2025-12-09 10:48:11 +08:00
YaoyaoChang c1c5e40bef add star history 2025-12-08 18:41:51 -08:00
Wenhui Wang 73a9711d8e Update vibevoice_tokenizer_processor.py 2025-12-09 10:16:55 +08:00
YaoyaoChang 04d19f8352 add experimental multi-lingual speakers 2025-12-08 08:29:00 -08:00
hydropix 79470ff576 Fix: Remove unnecessary Path() conversion for HuggingFace model IDs
The model_path was being converted to a Path object and then back to string
for from_pretrained() calls. This is unnecessary since HuggingFace accepts
strings directly, and causes issues on Windows where Path() converts forward
slashes to backslashes (e.g., "microsoft/VibeVoice-Realtime-0.5B" becomes
"microsoft\VibeVoice-Realtime-0.5B").

This fix:
- Keeps model_path as a string (no behavior change on Linux/macOS)
- Fixes Windows compatibility for HuggingFace repo IDs
- Removes redundant str() conversions
2025-12-08 10:27:58 +08:00
Wenhui Wang a507d67f8e Update README 2025-12-05 21:49:07 +08:00