Sage-T2I
Photorealistic Diffusion Transformer โ 1024ร1024 native generation + 4K upscale
A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs. Generates photorealistic images at 1024ร1024 resolution natively, upscalable to 4K (3840ร3840) using real LANCZOS interpolation โ no SRGAN, no ESRGAN, no fake upscalers.
This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.
| Hub | Link |
|---|---|
| Model | itriedcoding/sage-t2i |
| Space | itriedcoding/sage-t2i |
| Source | GitHub |
Model Architecture
| Component | Details |
|---|---|
| Type | Diffusion Transformer (DiT) with cross-attention |
| Parameters | 43.4M (trained), up to 300M (configurable) |
| Text Encoder | CLIP ViT-L/14 (frozen) |
| Image VAE | KL-F8 (frozen) |
| Hidden Size | 384 |
| Layers | 12 |
| Heads | 6 |
| Config | 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference |
| Training Resolution | 128x128 latent -> 1024x1024 (pos_embed interpolation) |
| Upscaling | Real PIL LANCZOS to 3840x3840 (true 4K) |
Capabilities
- Native 1024x1024 generation - real diffusion, no tiling/chaining
- 4K output - professional-grade LANCZOS upscale
- Multi-resolution - 256, 512, 1024 all supported via pos_embed interpolation
- Photorealism - Trained on real STL-10 photographs, not synthetic data
- No simulations, no fakes - every pixel comes from the diffusion process
Training
- Dataset: STL-10 (5000 real labeled photographs, 10 classes)
- Hardware: CPU (optimized), AMD/NVIDIA GPU support
- Optimizer: SGD with momentum
Usage
Local Inference
from model.pipeline import SageT2IPipeline
pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt")
image = pipe("a photorealistic cat", num_steps=50, output_size=1024)
image.save("output.png")
Gradio Web UI
python app.py
Local Training
python train_local.py
Deployment
Deploy to Hugging Face (Model Hub + Space)
The project includes an automated deployment script. It will:
- Verify the checkpoint is real (size + tensor count checks)
- Create a Model Hub repository with weights, config, and pipeline code
- Create a Gradio Space with the interactive web demo
# Set your token (get one at https://hf.co/settings/tokens)
set HF_TOKEN=hf_your_token_here
# Deploy both model hub and space
python deploy_to_hf.py
# Deploy just the model hub
python deploy_to_hf.py --model-only
# Deploy just the space
python deploy_to_hf.py --space-only
The script will prompt for your token if HF_TOKEN is not set.
Manual Deployment
Model Hub
git lfs install
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
# Copy checkpoint into checkpoints/ directory
git lfs track "checkpoints/*.pt"
git add .
git commit -m "Add model checkpoint"
git push
Space (Gradio Web UI)
- Go to https://huggingface.co/new-space
- Set Space name:
sage-t2i - Select SDK: Gradio
- Select hardware: CPU upgrade (recommended)
- Upload the Space files (
app.py,.space,requirements.txt, model package) - For the model checkpoint, either:
- Upload via git LFS to the Space repo, or
- Set
MODEL_PATHSpace secret to point to the model hub
Self-Hosted
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
pip install -r requirements.txt
python app.py
HuggingFace Resources
- Model Hub: https://huggingface.co/itriedcoding/sage-t2i
- Gradio Space: https://huggingface.co/spaces/itriedcoding/sage-t2i
- Duplicate Space: https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true