Mechanical Engineering, Carnegie Mellon University
Advisors: Prof. Burak Kara, Prof. Jessica Zhang
PROPOSED WORK
NEXT PAPER - ICML (Int'l conference of Machine Learning)
PROGRESS
NEXT STEPS
Aim 2: Causal self attention with FLARE
AIM 1(a): rank-adaptive
AIM 1(b): conditioning mechanism
Progress
Benchmarking (model size: 340M params, context length: 2k tokens)
Advantages
Progress
Progress
Memory-Efficient Causal Attention via Latent Routing (FLARE Decoder)
Target Conference: NeurIPS 2026 (May 15)
1. Contrib: Latent routing formulation of causal attention
2. Contrib: Constant-memory autoregressive decoding algorithm
3. (TODO) Contrib: Linear-memory training via chunkwise recomputation algorithm
4. (TODO) Contrib: Optimized GPU kernels for scalable training and inference
5. (TODO) Contrib: Adaptive latent queries for content aware compression (new architecture improvement over FLARE)
(from gated linear attn paper)
Efficient Causal Attention via Latent Routing (FLARE Decoder)
Progress
Efficient Causal Attention via Latent Routing (FLARE Decoder)
Progress
Next steps