Aniimage-1
Aniimage-1 is the first latent diffusion model developed by 8BitStudio. The model is a 256x256 anime image generation model trained from scratch using a UNet + VAE + CLIP architecture. Aniimage-1 has been trained on 830,001 anime images from Danbooru. It is not based off of any existing models, the unet is trained from scratch.
Model Details
| Resolution | 256ร256 |
| Architecture | Latent Diffusion (UNet + VAE + CLIP) |
| Parameters | ~400M |
| Training Steps | 88,000 |
| Batch Size | 64 |
| Dataset | ~830K curated anime images from Danbooru |
| GPU | NVIDIA RTX 5060 Ti 16GB |
| Scheduler | DDIM or DPM ++ 2M |
Requirements
- GPU: ~3.4 GB VRAM minimum (recommend 4+ GB)
- CPU: ~2 GB RAM. Image generation is extremely slow on cpu.
Quick Start
after downloading, install the dependencies.
pip install torch torchvision diffusers transformers safetensors pillow huggingface_hub
python generate_hf.py
recommended settings: Scheduler on DPM ++ 2M with 25 steps and a cfg of 7.5. recommended negative prompt: "low quality, ugly, blurry, distorted, deformed, bad anatomy, bad proportions, extra limbs, missing limbs, watermark, text, signature, washed out, flat colors, manga panel, disfigured, poorly drawn, jpeg artifacts, cropped, out of frame"
Prompting
Aniimage uses plain text captions meaning for the best result use plain english.
Do "A smiling anime girl with red hair and a school uniform" Not "1girl, solo, smile, red_hair, school_uniform, anime_coloring"
Capabilities
- Anime character generation with varied hair colors and styles
- School uniforms, fantasy outfits, maid dresses, and more
- Background scenes: cherry blossoms, night sky, interiors, nature
Limitations
- 256ร256 resolution โ fine details like hands and small features can be rough
- Faces can sometimes look similar or 'melty' across different prompts
- Complex multi-character scenes may have merging issues
- Little to none NSFW content โ trained on mostly SFW dataset only
- Does worse when generating men due to dataset bias
What's Next
Aniimage-1.5 โ a 512ร512 fine-tune of this model is currently in development, which will significantly improve detail and clarity. Code for training may be released at some point on github
License
Apache 2.0
- Downloads last month
- -
