Multi-Modal Deepfake Detection — trained weights

Trained weights for Multi-Model_Deepfake_Detection, a CLI that scores a video for manipulation by fusing three independent modalities (visual frames, audio track, spoken transcript).

Do not download these files by hand — clone the GitHub repo and run python download_weights.py, which places every file where config.yaml expects it.

File	Role	Val metric
`video_model_colab_ft/`	SigLIP ViT, full fine-tune on FaceForensics++ (c23) face crops	AUC 0.861 / frame, 0.939 / video
`clf_ep6_torchscript.pt`	MLP over WavLM-base-plus embeddings (WaveFake + JSUT)	AUC 0.999
`text_model/head_best.pt`	Linear head over DistilBERT for transcript claim scoring (FibVID)	AUC 0.909
`fusion_model.pkl`	HistGradientBoosting meta-classifier over the three probabilities	AUC 0.998 (simulated joint data)

Validation methodology and known limitations (audio language confound, simulated fusion training data) are documented in the project README.

Licensing

Code and these trained heads: MIT. The video model is a fine-tune of prithivMLmods/deepfake-detector-model-v1 and retains its upstream terms. Training datasets are not redistributed here: FaceForensics++ requires a signed EULA; WaveFake is CC BY-SA; JSUT is for research use.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AUA27/deepfake-multimodal-weights

Base model

google/siglip2-base-patch16-512

Finetuned

prithivMLmods/deepfake-detector-model-v1

Finetuned

(3)

this model

AUA27
/

deepfake-multimodal-weights

Multi-Modal Deepfake Detection — trained weights

Contents

Licensing

Model tree for AUA27/deepfake-multimodal-weights