chaoshuo
/

TwoHamsters

image-classification

vision-language

multimodal-safety

concept-detection

Model card Files Files and versions

TwoHamsters / README.md

chaoshuo's picture

Update README.md

1e01253 verified 4 days ago

|

History Blame Contribute Delete

2.89 kB

	---
	language:
	- en
	pipeline_tag: other
	tags:
	- image-classification
	- vision-language
	- safety
	- multimodal-safety
	- concept-detection
	- twohamsters
	- mccu
	datasets:
	- TwoHamsters
	---

	# MCCU-ViT

	MCCU-ViT is a multi-head visual classifier for auditing whether generated images contain unsafe multi-concept combinations from the MCCU/TwoHamsters benchmark. The model uses an OpenCLIP `ViT-L-14-CLIPA-336` visual backbone pretrained on `datacomp1b`, followed by three lightweight classification heads:

	- `Concept_1` head
	- `Concept_2` head
	- harm-category head

	The classifier is intended for post-hoc evaluation of text-to-image or image-conditioned generation outputs. Given an image, it predicts the most likely first concept, second concept, and harm category. A final `Safe/Other` label is appended to each prediction head for images outside the known unsafe concept or category set.

	## Files

	This repository contains:

	\| File \| Description \|
	\|---\|---\|
	\| `backbone.pt` \| Fine-tuned OpenCLIP visual-backbone weights. \|
	\| `vit_multi_task_heads.pt` \| Multi-task classification heads: `head_c1`, `head_c2`, and `head_cls`. \|
	\| `TwoHamsters_text-only_17.5k.csv` \| CSV used to recover concept/category label vocabularies and provide benchmark prompts. \|

	## Intended Use

	MCCU-ViT is designed for research use in multimodal safety evaluation, especially:

	- detecting whether a generated image contains a target unsafe concept pair;
	- computing attack or erasure metrics such as UAR/MDR/CMDR;
	- auditing generated images from text-to-image, image-editing, and image-fusion systems.


	## Ethical Considerations

	This model is released to support research on T2I generation safety and robustness. It should not be used to generate harmful content, target individuals or groups, or make high-stakes moderation decisions without human review and additional validation.

	## Quick Start

	Install dependencies:

	```bash
	pip install torch pillow pandas open_clip_torch huggingface_hub
	```

	Run inference:

	```python
	from PIL import Image
	from example_inference import MCCUViT

	model = MCCUViT.from_pretrained("chaoshuo/TwoHamsters", device="cuda")
	pred = model.predict(Image.open("example.png"))
	print(pred)
	```

	Output format:

	```python
	{
	"concept_1": "...",
	"concept_2": "...",
	"category": "...",
	"concept_1_confidence": 0.0,
	"concept_2_confidence": 0.0,
	"category_confidence": 0.0,
	}
	```

	## Citation

	If you use MCCU-ViT or the MCCU/TwoHamsters benchmark, please cite the corresponding paper or project release.
	~~~bibtex
	@article{zhang2026twohamsters,
	title={TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models},
	author={Zhang, Chaoshuo and Liang, Yibo and Tian, Mengke and Lin, Chenhao and Zhao, Zhengyu and Yang, Le and Zhang, Chong and Zhang, Yang and Shen, Chao},
	journal={arXiv preprint arXiv:2604.15967},
	year={2026}
	}
	~~~