Commit Graph

74 Commits

Author SHA1 Message Date
copilot-swe-agent[bot] 61ecb098d6 Improve error handling and logging for AudioMediaIO compatibility
- Add warnings to inform users which compatibility mode is being used
- Handle both AttributeError and ImportError for better coverage
- Add __init__ method to inherited class for consistency
- Provide clear diagnostic messages when patching fails

Co-authored-by: donglixp <1070872+donglixp@users.noreply.github.com>
2026-01-29 02:24:53 +00:00
copilot-swe-agent[bot] b4cd7c479f Fix vLLM AudioMediaIO compatibility issue
Add try-except blocks to handle both old and new vLLM versions where AudioMediaIO may not exist or may have been moved. This fixes the AttributeError when using newer vLLM versions.

- Handle missing AudioMediaIO by creating standalone implementation
- Add fallback for utils module patching
- Maintain backward compatibility with older vLLM versions

Co-authored-by: donglixp <1070872+donglixp@users.noreply.github.com>
2026-01-29 02:22:47 +00:00
copilot-swe-agent[bot] 11dd7420ec Initial plan 2026-01-29 02:19:04 +00:00
Zhiliang Peng b2aee8015c Delete docs/VibeVoice-ASR-Report.pdf 2026-01-28 19:33:37 +08:00
YaoyaoChang 2ee94fab1d update ASR architechture figure 2026-01-27 05:11:35 -08:00
YaoyaoChang 3140709188 update README 2026-01-27 21:06:31 +08:00
YaoyaoChang c435ae05d5 update README
Added a section on LoRA fine-tuning to the ASR documentation.
2026-01-27 21:01:40 +08:00
YaoyaoChang 0e1a0d39fd update README 2026-01-27 20:59:25 +08:00
YaoyaoChang 142a00112e update ASR README: multilingual 2026-01-27 20:58:10 +08:00
YaoyaoChang 4648c50ea0 update ASR Technical Report link to Arxiv 2026-01-27 12:58:06 +08:00
MLSDCherryPick cbbdb69474 add VibeVoice-ASR technique report arxiv link 2026-01-27 02:45:16 +00:00
YaoyaoChang a69e77c036 1. unify env for TTS and ASR; 2. avoid transformers 5.0.0 temporarily 2026-01-26 03:29:02 -08:00
YaoyaoChang a00f431e14 tts support latest transformers(4.57.6) 2026-01-26 03:28:10 -08:00
Jianwei Yu c4ee4fe716 Merge pull request #213 from Damon-Salvetore/vllm-1
Replace install_deps.sh with start_server.py one-click deployment
2026-01-26 16:49:38 +08:00
YingboHAO 1eb04f53a2 Replace install_deps.sh with start_server.py one-click deployment 2026-01-26 07:34:54 +00:00
ikeshav26 d11d756b61 fix: issues in error handling 2026-01-26 14:18:34 +08:00
YaoyaoChang 0926f242ce add CONTRIBUTING.md 2026-01-25 22:07:40 -08:00
DDXDB 1c5dbc4190 Add XPU sdpa Support 2026-01-26 14:00:31 +08:00
ThanhNguyxn 523713e806 fix(demo): add MPS and CPU support for ASR inference demo
- Add MPS device choice and auto-detect MPS availability
- Change default attention implementation to 'auto' with smart fallback
- Auto-detect flash_attention_2 availability on CUDA, fallback to sdpa
- Use sdpa for MPS and CPU devices (flash_attention_2 not supported)
- Use float32 dtype for MPS/CPU devices for better compatibility

Fixes #206
2026-01-26 13:56:11 +08:00
ThanhNguyxn 5cf026569e fix: handle torch.dtype serialization in config classes
Fixes #199 - Object of type dtype is not JSON serializable

When loading models with torch_dtype as a torch.dtype object (e.g.,
torch.bfloat16), transformers would fail to serialize the config to
JSON for logging purposes, raising TypeError.

This fix:
- Adds _convert_dtype_to_string() helper function to convert torch.dtype
  objects to their string representation (e.g., 'bfloat16')
- Overrides to_dict() method in VibeVoiceConfig, VibeVoiceASRConfig,
  and VibeVoiceStreamingConfig to apply this conversion

The fix is backward compatible - string dtype values and None values
continue to work as expected.
2026-01-26 13:45:55 +08:00
YaoyaoChang e67b15f47d update 2026-01-25 21:41:42 -08:00
MLSDCherryPick d9068541cf 1 2026-01-25 16:11:02 +00:00
YaoyaoChang c28e23f80c update language distribution figure 2026-01-25 00:15:11 -08:00
MLSDCherryPick 81bf8baa89 1 2026-01-25 05:14:39 +00:00
MLSDCherryPick e4036e46f4 1 2026-01-24 08:28:05 +00:00
Jianwei Yu 3c50e50d18 Merge pull request #203 from Damon-Salvetore/vibevoice-vllm
Add vLLM plugin support for high-performance ASR serving
2026-01-24 16:17:10 +08:00
MLSDCherryPick 71356b87dd Language support 2026-01-24 05:17:26 +00:00
MLSDCherryPick 7d12252de3 Language support 2026-01-24 05:11:34 +00:00
MLSDCherryPick a3e99daedd Language support 2026-01-24 05:10:47 +00:00
YingboHAO 04f8bc40b0 Update test_api.py 2026-01-23 17:47:31 +00:00
YingboHAO 4df5b0582f Add vLLM plugin support for high-performance ASR serving 2026-01-23 17:32:24 +00:00
YaoyaoChang c0c2af984e update README for finetuning-asr 2026-01-22 06:20:11 -08:00
Zhiliang Peng 05e1a022e5 Update FT README
Clarified the purpose of the toy dataset in the README.
2026-01-22 21:49:47 +08:00
Zhiliang Peng 59c90e7633 Merge pull request #197 from pengzhiliang/vibevoice_asr_ft
add VibeVoice-ASR finetuning code
2026-01-22 21:45:35 +08:00
pengzhiliang 8516386ce4 update ft readme 2026-01-22 05:44:34 -08:00
pengzhiliang cef628e1b5 update ft code 2026-01-22 05:20:25 -08:00
pengzhiliang db2f1d9ff3 init vibevoice-asr ft 2026-01-22 05:04:33 -08:00
YaoyaoChang 875115c000 update README 2026-01-22 01:28:21 -08:00
YaoyaoChang c0d7616e5a update README 2026-01-22 01:26:44 -08:00
YaoyaoChang 0e0caf2f08 update README 2026-01-22 01:25:30 -08:00
YaoyaoChang 96f8ac6a49 update README 2026-01-22 01:24:58 -08:00
YaoyaoChang 0f8954a600 update README 2026-01-22 01:21:56 -08:00
YaoyaoChang eb3533d791 update README 2026-01-22 00:51:33 -08:00
YaoyaoChang 5022277022 update README 2026-01-22 00:51:00 -08:00
YaoyaoChang 6c523ec087 update README 2026-01-22 00:49:58 -08:00
YaoyaoChang 883e3acc67 update README 2026-01-22 00:39:49 -08:00
YaoyaoChang 32a7040ce0 restructure README 2026-01-22 00:37:22 -08:00
YaoyaoChang ce90a49960 fix env bug 2026-01-21 22:03:52 -08:00
MLSDCherryPick 1b6e8b56ea asr evaluation 2026-01-22 03:44:34 +00:00
MLSDCherryPick 84e469c68e asr evaluation 2026-01-22 03:43:31 +00:00