🧪 LiquidGen: Liquid Neural Network Image Generator

A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.

LiquidGen replaces self-attention in diffusion models with Closed-form Continuous-depth (CfC) liquid dynamics — making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).

🚀 Quick Start (Colab)

Open LiquidGen_Colab_Notebook.ipynb in Google Colab
Select a dataset preset (see table below)
Run all cells — latents are pre-cached automatically, then training starts

Training is optimized for Colab free tier:

Latent pre-caching: Encode all images with VAE once → save to disk → train on pure tensors
No VAE during training → saves ~1GB VRAM, enables larger batches (32+)
Small curated datasets that download in seconds (not 5GB WikiArt!)

Dataset Presets

Preset	Images	Download	Classes	Description
`paintings_mini`	~200	1.7MB	27 styles	Instant smoke test
`paintings`	~8K	204MB	27 styles	Recommended — best quality/speed tradeoff
`cartoon`	~2.5K	181MB	unconditional	Cartoon/anime images
`flowers`	~8K	331MB	unconditional	Flower photography
`wikiart_stream`	~80K	streaming	27 styles	Full WikiArt via streaming (set `max_images`)

🏗️ Architecture

Input Image → Flux VAE Encoder → Noisy Latent → LiquidGen Backbone → Predicted Velocity → Euler ODE → VAE Decoder → Output

Key Components

Component	What it does	Replaces
LiquidTimeConstant	`α·x + (1-α)·stimulus` with learnable decay α = exp(-softplus(ρ))	Residual connections
GatedDepthwiseStimulusConv	Local spatial context via gated DW-conv	Self-attention (local)
ZigzagScan1D	Global context via zigzag-ordered 1D conv	Self-attention (global)
AdaptiveGroupNorm	Timestep conditioning via scale/shift	AdaLN in DiT
U-Net Long Skips	Skip connections from shallow to deep blocks	Standard residual

Core Innovation: Liquid Time Constants

From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):

x_{t+1} = exp(-Δt/τ_t) · x_t + (1 - exp(-Δt/τ_t)) · h(x_t, u_t)

Our parallelizable version (inspired by LiquidTAD 2025):

α = exp(-softplus(ρ))              # Per-channel learnable retention
output = α * state + (1 - α) * stimulus  # Exponential relaxation

No sequential ODE solving. No attention. Fully parallelizable.

📊 Model Sizes

Model	Params	VRAM (train)	Best For
LiquidGen-S	~55M	~4-6 GB	256px, fast experiments
LiquidGen-B	~140M	~8-10 GB	256/512px, balanced
LiquidGen-L	~280M	~12-14 GB	512px, high quality

All fit in 16GB VRAM (Colab free T4). Training on cached latents = no VAE overhead.

🔧 Training

from train import TrainConfig, train

config = TrainConfig(
    model_size="small",
    dataset_preset="paintings",   # 8K paintings, 204MB, 27 styles
    image_size=256,
    batch_size=32,                # Large batches OK with cached latents!
    num_epochs=100,
    learning_rate=1e-4,
)
train(config)

Training Pipeline

Pre-cache: Load dataset → encode all images with frozen Flux VAE → save latents to disk → unload VAE
Train: Load cached tensors → train LiquidGen backbone with flow matching → fast iterations!
Sample: Load VAE only when generating sample images (lazy loading)

Details

VAE: FLUX.1-schnell (frozen, 16ch latent, 8x compression, Apache 2.0)
Objective: Flow matching (velocity prediction) — v = noise - x_0
Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
Gradient clipping: 2.0 (critical for stability, from ZigMa paper)
EMA: 0.9999 decay
Sampling: Euler ODE, 50 steps, classifier-free guidance

📁 Files

├── model.py                        # LiquidGen model architecture (~55-280M params)
├── train.py                        # Training pipeline with latent pre-caching
├── LiquidGen_Colab_Notebook.ipynb  # Ready-to-run Colab notebook
└── README.md

📐 Architecture Diagram

Input Latent [B, 16, H/8, W/8]
    │
    ├─── Patch Embed (Conv2d, stride=2) ──→ [B, D, H/16, W/16]
    ├─── + Learnable Position Embedding
    ├─── Input Projection (DW-Conv + PW-Conv + GELU)
    │
    ├─── LiquidBlock × (depth/2)  ←── save skip connections
    │       ├── AdaGN (timestep conditioned)
    │       ├── GatedDepthwiseStimulusConv (local spatial)
    │       ├── + ZigzagScan1D (global context)  
    │       ├── LiquidTimeConstant #1 (CfC blend)
    │       ├── AdaGN
    │       ├── ChannelMixMLP (GELU)
    │       └── LiquidTimeConstant #2 (CfC blend)
    │
    ├─── LiquidBlock × (depth/2)  ←── add skip connections
    │
    ├─── GroupNorm + Conv + GELU
    └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]

🔬 Research Background

Liquid Neural Networks

Liquid Time-constant Networks (Hasani et al., NeurIPS 2020) — ODE-based neurons with input-dependent τ
Closed-form Continuous-depth Models (Hasani et al., Nature Machine Intelligence 2022) — Analytical solution eliminating ODE solvers
Neural Circuit Policies (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
LiquidTAD (2025) — Static decay α=exp(-softplus(ρ)) for fully parallel liquid dynamics (100× speedup)

Attention-Free Image Generation

ZigMa (ECCV 2024) — Zigzag scanning for SSM-based diffusion
DiMSUM (NeurIPS 2024) — Spatial-frequency Mamba (FID 2.11 ImageNet 256)
DiffuSSM (2023) — First attention-free diffusion model
DiM (2024) — Multi-directional Mamba with padding tokens

Flow Matching

Flow Matching for Generative Modeling (Lipman et al., 2023)
SiT (2024) — Scalable Interpolant Transformers

⚡ Design Decisions

No Attention — O(n) complexity. Liquid dynamics + zigzag conv replace self-attention entirely.
Liquid over Residual — α·x + (1-α)·f(x) instead of x + f(x). Explicit control over retention per channel.
Zigzag Scanning — Preserves spatial continuity at row boundaries (critical insight from ZigMa).
Latent Pre-caching — Encode once, train forever. No VAE overhead during training.
Flow Matching — Straighter ODE trajectories → fewer sampling steps, better quality.

📜 License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support