KVSlimmer: Theoretical Insights and Practical Optimizatio...

KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging

arXiv:2603.00907v1 Announce Type: new Abstract: The growing computational and memory demands of the Key-Value (KV) cache significantly limit the ability of Large Language Models (LLMs). While KV merging has emerged as a promising solution, existing methods that rely on empirical observations of KV asymmetry and gradient-based Hessian approximations lack a theoretical foundation and incur suboptimal compression and inference overhead. To bridge these gaps, we establish a theoretical framework that characterizes this asymmetry through the spectral energy distribution of projection weights, demonstrating that concentrated spectra in Query/Key weights induce feature homogeneity, whereas dispersed spectra in Value weights preserve heterogeneity. Then, we introduce KVSlimmer, an efficient algorithm that captures exact Hessian information through a mathematically exact formulation, and derives a closed-form solution utilizing only forward-pass variables, resulting in a gradient-free approach that is both memory- and time-efficient. Extensive experiments across various models and benchmarks demonstrate that KVSlimmer consistently outperforms SOTA methods. For instance, on Llama3.1-8B-Instruct, it improves the LongBench average score by 0.92 while reducing memory costs and latency by 29% and 28%, respectively.

相关推荐

泸天化：农行四川分行拟减持公司不超1%股份

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Detection-Gated Glottal Segmentation with Zero-Shot Cross-Dataset Transfer and Clinical Feature Extraction

上期所调整燃料油期货相关合约涨跌停板幅度和交易保证金比例

MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine