MuseTalk Mirror (A.I.M.I)

Mirror of TMElyralab/MuseTalk V1.5 plus its inference-time dependencies, re-hosted for stable URLs inside the A.I.M.I desktop product. Contents are unmodified.

MuseTalk re-syncs the lips of an existing video to match a new audio track (mouth-region editing, rest of frame passes through). Pairs with our TTS + Voice-Clone stack for full "text โ†’ lip-synced video" workflows.

Files

Folder / File Upstream Size Purpose
musetalkV15/unet.pth TMElyralab/MuseTalk 3.24 GB MuseTalk V1.5 UNet weights
musetalkV15/musetalk.json TMElyralab/MuseTalk 748 B UNet config
sd-vae-ft-mse/diffusion_pytorch_model.bin stabilityai/sd-vae-ft-mse 319 MB VAE for face latents
sd-vae-ft-mse/config.json stabilityai/sd-vae-ft-mse 547 B VAE config
whisper/pytorch_model.bin openai/whisper-tiny 144 MB Audio feature extraction (tiny)
dwpose/dw-ll_ucoco_384.pth yzd-v/DWPose 388 MB Face bbox + pose detection
face-parse-bisent/79999_iter.pth ManyOtherFunctions/face-parse-bisent 51 MB BiSeNet face-region parser
face-parse-bisent/resnet18-5c106cde.pth pytorch.org/models 45 MB ResNet18 backbone for face-parser

Total: ~4.1 GB.

Licenses

Component License
MuseTalk MIT (Tencent Music Entertainment Lyra Lab)
SD-VAE-ft-MSE CreativeML Open RAIL-M (Stability AI)
Whisper MIT (OpenAI)
DWPose Apache 2.0
face-parse-bisent MIT
ResNet18 (pretrained) BSD-3-Clause (PyTorch / Facebook)

All components are commercial-use-compatible. Redistributed unchanged. See upstream repos for full license texts.

Attribution

  • MuseTalk: Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou โ€” MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting (2024).
  • Whisper: Alec Radford et al. โ€” Robust Speech Recognition via Large-Scale Weak Supervision (OpenAI, 2022).
  • DWPose: Zhendong Yang, Ailing Zeng, Chun Yuan, Yu Li โ€” Effective Whole-body Pose Estimation with Two-stages Distillation (ICCV 2023).
  • BiSeNet: Changqian Yu et al. โ€” BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation (ECCV 2018).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support