JLT: Clean-Latent Prediction in Latent Diffusion Transformers

ImageNet 256×256 samples from JLT-B/1 using 50-step Heun sampling.

Authors

Funing Fu · Tenghui Wang · Guanyu Zhou · Junyong Cen · Qichao Zhu

Overview

JLT investigates whether predicting clean data is better than predicting velocity in latent space. Under the same architecture, training settings, and FLUX.2 VAE representation, clean-latent prediction achieves FID 2.50 vs. velocity prediction at FID 6.56 — a 62% improvement on ImageNet 256×256.

This model is trained on FLUX.2 VAE latent space with clean-latent prediction target.

Results

Model	Target	FID-50K ↓	IS ↑
JLT-B/1	x (clean)	2.50	232.51
DiT-B/1	v (velocity)	6.56	132.12

Method

Under the linear corruption path z_t = t * x + (1-t) * epsilon:

Clean prediction (JLT): predict x directly, attenuating low-variance latent directions
Velocity prediction (DiT): predict v = x - epsilon, adding an isotropic unit floor to all directions

Key insight: velocity prediction amplifies low-variance latent directions while clean prediction attenuates them.

Architecture

Component	Specification
Transformer Blocks	12
Hidden Dimension	768
Attention Heads	12
Parameters	130M
Tokenizer	FLUX.2 VAE (frozen)

Usage

Download

huggingface-cli download dawn-neo/JLT checkpoint-last.pth

Evaluation

# Requires pre-encoded ImageNet latents and torch-fidelity
python main_jit.py \
    --model JiT-B/1 --vae_type flux2 \
    --data_path /path/to/imagenet_latents_256 --use_latent_cache \
    --online_eval --eval_freq 1 --gen_bsz 128 --num_images 50000 \
    --cfg 2.9 --num_sampling_steps 50 \
    --resume checkpoint-last.pth --output_dir ./eval_output

For full training and inference code, see the GitHub repository.

Citation

@article{fu2026jlt,
  title={{JLT}: {C}lean-{L}atent {P}rediction in {L}atent {D}iffusion {T}ransformers},
  author={Fu, Funing and Wang, Tenghui and Zhou, Guanyu and Cen, Junyong and Zhu, Qichao},
  journal = {arXiv preprint arXiv:2605.27102},
  year={2026}
}

Acknowledgements

JiT - Base architecture
FLUX.2 VAE - Latent space
Li & He. "Back to Basics" - Clean prediction insight

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dawn-neo/JLT

JLT

Collection

JLT: Clean-Latent Prediction in Latent Diffusion Transformers • 2 items • Updated 4 days ago • 1

Papers for dawn-neo/JLT

JLT: Clean-Latent Prediction in Latent Diffusion Transformers

Paper • 2605.27102 • Published 8 days ago • 31

Back to Basics: Let Denoising Generative Models Denoise

Paper • 2511.13720 • Published Nov 17, 2025 • 70