Unlocking Feature Learning in Gated Delta Networks at Scale Paper • 2606.04048 • Published 24 days ago • 2
Unlocking Feature Learning in Gated Delta Networks at Scale Paper • 2606.04048 • Published 24 days ago • 2
Residual Stream Duality in Modern Transformer Architectures Paper • 2603.16039 • Published Mar 17 • 4
Residual Stream Duality in Modern Transformer Architectures Paper • 2603.16039 • Published Mar 17 • 4