Multi-Modal Deepfake Detection โ€” trained weights

Trained weights for Multi-Model_Deepfake_Detection, a CLI that scores a video for manipulation by fusing three independent modalities (visual frames, audio track, spoken transcript).

Do not download these files by hand โ€” clone the GitHub repo and run python download_weights.py, which places every file where config.yaml expects it.

Contents

File Role Val metric
video_model_colab_ft/ SigLIP ViT, full fine-tune on FaceForensics++ (c23) face crops AUC 0.861 / frame, 0.939 / video
clf_ep6_torchscript.pt MLP over WavLM-base-plus embeddings (WaveFake + JSUT) AUC 0.999
text_model/head_best.pt Linear head over DistilBERT for transcript claim scoring (FibVID) AUC 0.909
fusion_model.pkl HistGradientBoosting meta-classifier over the three probabilities AUC 0.998 (simulated joint data)

Validation methodology and known limitations (audio language confound, simulated fusion training data) are documented in the project README.

Licensing

Code and these trained heads: MIT. The video model is a fine-tune of prithivMLmods/deepfake-detector-model-v1 and retains its upstream terms. Training datasets are not redistributed here: FaceForensics++ requires a signed EULA; WaveFake is CC BY-SA; JSUT is for research use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AUA27/deepfake-multimodal-weights