MuseTalk Mirror (A.I.M.I)
Mirror of TMElyralab/MuseTalk V1.5 plus its inference-time dependencies, re-hosted for stable URLs inside the A.I.M.I desktop product. Contents are unmodified.
MuseTalk re-syncs the lips of an existing video to match a new audio track (mouth-region editing, rest of frame passes through). Pairs with our TTS + Voice-Clone stack for full "text โ lip-synced video" workflows.
Files
| Folder / File | Upstream | Size | Purpose |
|---|---|---|---|
musetalkV15/unet.pth |
TMElyralab/MuseTalk | 3.24 GB | MuseTalk V1.5 UNet weights |
musetalkV15/musetalk.json |
TMElyralab/MuseTalk | 748 B | UNet config |
sd-vae-ft-mse/diffusion_pytorch_model.bin |
stabilityai/sd-vae-ft-mse | 319 MB | VAE for face latents |
sd-vae-ft-mse/config.json |
stabilityai/sd-vae-ft-mse | 547 B | VAE config |
whisper/pytorch_model.bin |
openai/whisper-tiny | 144 MB | Audio feature extraction (tiny) |
dwpose/dw-ll_ucoco_384.pth |
yzd-v/DWPose | 388 MB | Face bbox + pose detection |
face-parse-bisent/79999_iter.pth |
ManyOtherFunctions/face-parse-bisent | 51 MB | BiSeNet face-region parser |
face-parse-bisent/resnet18-5c106cde.pth |
pytorch.org/models | 45 MB | ResNet18 backbone for face-parser |
Total: ~4.1 GB.
Licenses
| Component | License |
|---|---|
| MuseTalk | MIT (Tencent Music Entertainment Lyra Lab) |
| SD-VAE-ft-MSE | CreativeML Open RAIL-M (Stability AI) |
| Whisper | MIT (OpenAI) |
| DWPose | Apache 2.0 |
| face-parse-bisent | MIT |
| ResNet18 (pretrained) | BSD-3-Clause (PyTorch / Facebook) |
All components are commercial-use-compatible. Redistributed unchanged. See upstream repos for full license texts.
Attribution
- MuseTalk: Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou โ MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting (2024).
- Whisper: Alec Radford et al. โ Robust Speech Recognition via Large-Scale Weak Supervision (OpenAI, 2022).
- DWPose: Zhendong Yang, Ailing Zeng, Chun Yuan, Yu Li โ Effective Whole-body Pose Estimation with Two-stages Distillation (ICCV 2023).
- BiSeNet: Changqian Yu et al. โ BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation (ECCV 2018).
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support