YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π§ͺ LiquidGen: Liquid Neural Network Image Generator
A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.
LiquidGen replaces self-attention in diffusion models with Closed-form Continuous-depth (CfC) liquid dynamics β making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
π Quick Start (Colab)
- Open
LiquidGen_Colab_Notebook.ipynbin Google Colab - Select a dataset preset (see table below)
- Run all cells β latents are pre-cached automatically, then training starts
Training is optimized for Colab free tier:
- Latent pre-caching: Encode all images with VAE once β save to disk β train on pure tensors
- No VAE during training β saves ~1GB VRAM, enables larger batches (32+)
- Small curated datasets that download in seconds (not 5GB WikiArt!)
Dataset Presets
| Preset | Images | Download | Classes | Description |
|---|---|---|---|---|
paintings_mini |
~200 | 1.7MB | 27 styles | Instant smoke test |
paintings |
~8K | 204MB | 27 styles | Recommended β best quality/speed tradeoff |
cartoon |
~2.5K | 181MB | unconditional | Cartoon/anime images |
flowers |
~8K | 331MB | unconditional | Flower photography |
wikiart_stream |
~80K | streaming | 27 styles | Full WikiArt via streaming (set max_images) |
ποΈ Architecture
Input Image β Flux VAE Encoder β Noisy Latent β LiquidGen Backbone β Predicted Velocity β Euler ODE β VAE Decoder β Output
Key Components
| Component | What it does | Replaces |
|---|---|---|
| LiquidTimeConstant | Ξ±Β·x + (1-Ξ±)Β·stimulus with learnable decay Ξ± = exp(-softplus(Ο)) |
Residual connections |
| GatedDepthwiseStimulusConv | Local spatial context via gated DW-conv | Self-attention (local) |
| ZigzagScan1D | Global context via zigzag-ordered 1D conv | Self-attention (global) |
| AdaptiveGroupNorm | Timestep conditioning via scale/shift | AdaLN in DiT |
| U-Net Long Skips | Skip connections from shallow to deep blocks | Standard residual |
Core Innovation: Liquid Time Constants
From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
x_{t+1} = exp(-Ξt/Ο_t) Β· x_t + (1 - exp(-Ξt/Ο_t)) Β· h(x_t, u_t)
Our parallelizable version (inspired by LiquidTAD 2025):
Ξ± = exp(-softplus(Ο)) # Per-channel learnable retention
output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation
No sequential ODE solving. No attention. Fully parallelizable.
π Model Sizes
| Model | Params | VRAM (train) | Best For |
|---|---|---|---|
| LiquidGen-S | ~55M | ~4-6 GB | 256px, fast experiments |
| LiquidGen-B | ~140M | ~8-10 GB | 256/512px, balanced |
| LiquidGen-L | ~280M | ~12-14 GB | 512px, high quality |
All fit in 16GB VRAM (Colab free T4). Training on cached latents = no VAE overhead.
π§ Training
from train import TrainConfig, train
config = TrainConfig(
model_size="small",
dataset_preset="paintings", # 8K paintings, 204MB, 27 styles
image_size=256,
batch_size=32, # Large batches OK with cached latents!
num_epochs=100,
learning_rate=1e-4,
)
train(config)
Training Pipeline
- Pre-cache: Load dataset β encode all images with frozen Flux VAE β save latents to disk β unload VAE
- Train: Load cached tensors β train LiquidGen backbone with flow matching β fast iterations!
- Sample: Load VAE only when generating sample images (lazy loading)
Details
- VAE: FLUX.1-schnell (frozen, 16ch latent, 8x compression, Apache 2.0)
- Objective: Flow matching (velocity prediction) β
v = noise - x_0 - Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
- Gradient clipping: 2.0 (critical for stability, from ZigMa paper)
- EMA: 0.9999 decay
- Sampling: Euler ODE, 50 steps, classifier-free guidance
π Files
βββ model.py # LiquidGen model architecture (~55-280M params)
βββ train.py # Training pipeline with latent pre-caching
βββ LiquidGen_Colab_Notebook.ipynb # Ready-to-run Colab notebook
βββ README.md
π Architecture Diagram
Input Latent [B, 16, H/8, W/8]
β
ββββ Patch Embed (Conv2d, stride=2) βββ [B, D, H/16, W/16]
ββββ + Learnable Position Embedding
ββββ Input Projection (DW-Conv + PW-Conv + GELU)
β
ββββ LiquidBlock Γ (depth/2) βββ save skip connections
β βββ AdaGN (timestep conditioned)
β βββ GatedDepthwiseStimulusConv (local spatial)
β βββ + ZigzagScan1D (global context)
β βββ LiquidTimeConstant #1 (CfC blend)
β βββ AdaGN
β βββ ChannelMixMLP (GELU)
β βββ LiquidTimeConstant #2 (CfC blend)
β
ββββ LiquidBlock Γ (depth/2) βββ add skip connections
β
ββββ GroupNorm + Conv + GELU
ββββ Unpatchify (ConvTranspose2d) βββ [B, 16, H/8, W/8]
π¬ Research Background
Liquid Neural Networks
- Liquid Time-constant Networks (Hasani et al., NeurIPS 2020) β ODE-based neurons with input-dependent Ο
- Closed-form Continuous-depth Models (Hasani et al., Nature Machine Intelligence 2022) β Analytical solution eliminating ODE solvers
- Neural Circuit Policies (Lechner et al., Nature Machine Intelligence 2020) β Sparse wiring: sensoryβinterβcommandβmotor
- LiquidTAD (2025) β Static decay Ξ±=exp(-softplus(Ο)) for fully parallel liquid dynamics (100Γ speedup)
Attention-Free Image Generation
- ZigMa (ECCV 2024) β Zigzag scanning for SSM-based diffusion
- DiMSUM (NeurIPS 2024) β Spatial-frequency Mamba (FID 2.11 ImageNet 256)
- DiffuSSM (2023) β First attention-free diffusion model
- DiM (2024) β Multi-directional Mamba with padding tokens
Flow Matching
- Flow Matching for Generative Modeling (Lipman et al., 2023)
- SiT (2024) β Scalable Interpolant Transformers
β‘ Design Decisions
- No Attention β O(n) complexity. Liquid dynamics + zigzag conv replace self-attention entirely.
- Liquid over Residual β
Ξ±Β·x + (1-Ξ±)Β·f(x)instead ofx + f(x). Explicit control over retention per channel. - Zigzag Scanning β Preserves spatial continuity at row boundaries (critical insight from ZigMa).
- Latent Pre-caching β Encode once, train forever. No VAE overhead during training.
- Flow Matching β Straighter ODE trajectories β fewer sampling steps, better quality.
π License
MIT