Chorous1
Chorous1 is a suite of three high-performance, patch-based transformer models for multivariate time-series forecasting. Combining RevIN, MAE-style patch masking, and a Flatten Head architecture, Chorous1 delivers state-of-the-art accuracy on real-world benchmark data.
Table of Contents
Model Variants
| Variant | Parameters | Hidden Size | Layers | Query Heads / KV Heads |
|---|---|---|---|---|
chorous1-100m |
~100M | 768 | 12 | 12 / 4 |
chorous1-50m |
~50M | 512 | 16 | 8 / 2 |
chorous1-27m |
~27M | 384 | 16 | 6 / 2 |
Architecture
| Component | Specification |
|---|---|
| Context Length | 512 steps |
| Forecast Horizon | 96 steps |
| Patch Size | 16 (non-overlapping) |
| Number of Patches | 32 |
| FFN Multiplier | 2.667ร |
| Activation | SwiGLU |
| Positional Encoding | RoPE (ฮธ = 500,000) |
| Normalization | RMSNorm |
| Masking Ratio | 25% (training only) |
| Loss Function | Huber Loss + MAE |
| Precision | bfloat16 |
How It Works
Stage 1 โ Neural Encoding. The transformer encoder processes patches of time-series data using RoPE and GQA to capture long-range temporal dependencies and periodic structure.
Stage 2 โ RevIN Normalization. A reversible instance normalization layer removes mean and variance shifts from the input prior to processing, then restores them on the output โ eliminating the distribution mismatch problem common in real-world deployments.
Quickstart
import torch
from safetensors.torch import load_file
# Replace "100m" with "50m" or "27m" as needed
weights = load_file("./chorous_checkpoint/100m/model.safetensors")
model.load_state_dict(weights)
model.eval()
# Input shape: [Batch, Channels, Time]
x = torch.randn(1, 7, 512)
with torch.no_grad():
forecast = model(x) # Output shape: [1, 7, 96]
Performance
| Metric | chorous1-100m |
chorous1-50m |
chorous1-27m |
|---|---|---|---|
| Weights Size | ~200 MB | ~110 MB | ~65 MB |
| VRAM (Inference) | ~12 GB | ~8 GB | ~6 GB |
Limitations
- Fixed Forecast Horizon โ Optimized for 96-step forecasting. Modifying the output head for longer horizons may reduce accuracy.
- Channel Count Constraint โ The RevIN layer is initialized using the maximum channel count from the training suite. Inputs exceeding this limit are not supported out of the box.
- Patch Alignment Requirement โ Input context length must be an exact multiple of the patch size (16).
License
Chorous1 is released under the 8f-ai-license-v1.0. Please review the full terms before use in production or commercial applications.
- Downloads last month
- 12