| --- |
| license: mit |
| datasets: |
| - Parveshiiii/AI-vs-Real |
| base_model: |
| - microsoft/swinv2-tiny-patch4-window16-256 |
| pipeline_tag: image-classification |
| library_name: transformers |
| tags: |
| - safety |
| - Modotte |
| - SoTA |
| --- |
| # Modotte |
|
|
| <p align="center"> |
| <img |
| src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/m7ddTjuxxLUntdXVk0t5N.png" |
| alt="AIRealNet Banner" |
| width="90%" |
| style="border-radius:15px;" |
| /> |
| </p> |
|
|
| --- |
|
|
| - [GitHub Repository](https://github.com/XenArcAI/AIRealNet) |
| - [Live Demo](https://huggingface.co/spaces/Parveshiiii/AIRealNet) |
| |
| ## Overview |
|
|
| In an era of rapidly advancing AI-generated imagery, deepfakes, and synthetic media, the need for reliable detection tools has never been higher. **AIRealNet** is a binary image classifier explicitly designed to distinguish **AI-generated images** from **real human photographs**. This model is optimized to detect conventional AI-generated content while adhering to strict privacy standards—avoiding personal or sensitive images. |
|
|
| * **Class 0:** AI-generated image |
| * **Class 1:** Real human image |
|
|
| By leveraging the robust **SwinV2 Tiny** architecture as its backbone, AIRealNet achieves a high degree of accuracy while remaining lightweight enough for practical deployment. |
|
|
| --- |
|
|
| ## Key Features |
|
|
| 1. **High Accuracy on Public Datasets:** |
| Despite using a **14k-image fine-tuning split(Part of main fine tuning split)**, AIRealNet demonstrates exceptional accuracy and robustness in detecting AI-generated images. |
|
|
| 2. **Balanced Training Split:** |
| The dataset contains a balanced number of AI-generated and real images, ensuring unbiased training and minimizing class imbalance issues. |
|
|
| * **AI-Generated:** 60% |
| * **Human-Images:** 40% |
|
|
| 4. **Ethical Design:** |
| No personal photos were included, even if edited or AI-modified, respecting privacy and ethical AI principles. |
|
|
| 5. **Fast and Scalable:** |
| Based on a transformer vision model, AIRealNet can be deployed efficiently in both research and production environments. |
|
|
| --- |
|
|
| ## Training Data |
|
|
| * **Dataset:** `Parveshiiii/AI-vs-Real` (open-sourced subset of main dataset ) |
| * **Size:** 14k images (balanced between AI and human) |
| * **Split:** Used the train split for fine-tuning; validation performed on a separate balanced subset. |
| * **Notes:** Images sourced from public datasets and AI generation tools. Edited personal photos were intentionally excluded. |
|
|
| --- |
|
|
| ## Limitations |
|
|
| While AIRealNet performs exceptionally well on typical AI-generated images, users should note: |
|
|
| 1. **Subtle Edits:** The model struggles with nano-scale edits or ultra-precise modifications, like “nano banana” edits. |
| 2. **Edited Personal Images(over precise):** Images of real people that have been AI-modified are **not detected**, aligning with privacy and ethical guidelines. |
| 3. **Domain Generalization:** Performance may vary on images from completely unseen AI generators or extremely unconventional content. |
|
|
| --- |
|
|
| ## Performance Metrics |
|
|
| > Metrics shown are from **Epoch 2**, chosen to illustrate stable performance after fine-tuning. |
|
|
| <p align="center"> |
| <img |
| src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/3NVa0KLX0iAxTP2e6IlGH.png" |
| alt="AIRealNet Banner" |
| width="90%" |
| style="border-radius:15px;" |
| /> |
| </p> |
|
|
| **Note:** Extremely low loss and high accuracy are due to the controlled dataset environment. Real-world performance may be lower depending on the image domain.(In our testing this is model is over accurate despite it can't detect Nano-Banana images(only edited fully generated images can be detected over accurately)) |
|
|
| --- |
|
|
| ## Demo and Usage |
|
|
| 1. **Installing dependecies** |
|
|
| ```python |
| pip install -U transformers |
| ``` |
| 2. **Loading and running a demo** |
|
|
| ```python |
| from transformers import pipeline |
| |
| pipe = pipeline("image-classification", model="Modotte/AIRealNet") |
| pipe("https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/eVkKUTdiInUl6pbIUghQC.png")# example image |
| ``` |
| # Demo |
|
|
| * **Given Image**(Checkout Maths best filtered dataset focused on reasoning on Modotte) |
|
|
| <p align="center"> |
| <img |
| src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/eVkKUTdiInUl6pbIUghQC.png" |
| alt="AIRealNet Banner" |
| width="90%" |
| style="border-radius:15px;" |
| /> |
| </p> |
|
|
| * **Model Output** |
|
|
| ```bash |
| [{'label': 'artificial', 'score': 0.9865425825119019}, |
| {'label': 'real', 'score': 0.013457471504807472}] |
| ``` |
| **Note:** its correct as the image was generated by a diffusion model |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| * Detect AI-generated imagery on social media, research publications, and digital media platforms. |
| * Assist content moderators, researchers, and fact-checkers in identifying synthetic media. |
| * **Not intended** for legal verification without human corroboration. |
|
|
| --- |
|
|
| ## Ethical Considerations |
|
|
| * **Privacy-first Approach:** Personal photos, even if AI-edited, were excluded. |
| * **Responsible Deployment:** Users should combine model predictions with human review to avoid false positives or negatives. |
| * **Transparency:** The model card openly communicates its limitations and dataset design to prevent misuse. |
|
|
| --- |
|
|
| ## How It Works |
|
|
| 1. Images are preprocessed and resized to `256x256`. |
| 2. Features are extracted using the **SwinV2 Tiny** vision transformer backbone. |
| 3. A binary classification head outputs probabilities for AI-generated vs real human images. |
| 4. Predictions are interpreted as class 0 (AI) or class 1 (Human). |
|
|
| --- |
|
|
| ## Future Work |
|
|
| Future iterations aim to: |
|
|
| * Improve detection of subtle AI-generated edits and “nano banana” modifications. |
| * Expand training data with diverse AI generators to enhance generalization. |
| * Explore multi-modal detection capabilities (e.g., video, metadata, and image combined). |
|
|
| --- |
|
|
| ### Citation |
| ```bibtex |
| @misc{Modotte_AIRealNet_2025, |
| title={AIRealNet: A Fine-Tuned Vision Transformer for Detecting AI-Generated vs Real Human Images}, |
| author={Parvesh Rawal}, |
| publisher={Hugging Face}, |
| year={2025}, |
| url={https://huggingface.co/Modotte/AIRealNet} |
| } |
| ``` |
|
|
| ## References |
|
|
| * Microsoft SwinV2 Tiny: [https://github.com/microsoft/Swin-Transformer](https://github.com/microsoft/Swin-Transformer) |
| * Parveshiiii/AI-vs-Real dataset (subset): Open-sourced by our team member |
|
|
| --- |
|
|
|
|