awesome-kv-cache-compression
github.com/october2001/awesome-kv-cache-compression ↗📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
713
GitHub Stars
163
Curated Resources
4
Categories
21 hours ago
Last Refreshed
⚙️ Project📷 Survey🔍 Method📊 Evaluation
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me 1️⃣ pruning / evicting / sparse resources from awesome-kv-cache-compression"
Installation instructions →What's inside
🔍 Method
- A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder.1️⃣ Pruning / Evicting / Sparse
- Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference.1️⃣ Pruning / Evicting / Sparse
- AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models.1️⃣ Pruning / Evicting / Sparse
- AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning.2️⃣ Merging
- ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching.1️⃣ Pruning / Evicting / Sparse
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction.1️⃣ Pruning / Evicting / Sparse
📷 Survey
📊 Evaluation
- Comparative Characterization of KV Cache Management Strategies for LLM Inference.
- KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches.
- More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression.
- NexusQuant.
- Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving.
- SCBench: A KV Cache-Centric Analysis of Long-Context Methods.
⚙️ Project
Showing a sample of 163 resources. View the full list on GitHub →