Context Awesome

awesome-kv-cache-compression

github.com/october2001/awesome-kv-cache-compression ↗

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

726

GitHub Stars

163

Curated Resources

4

Categories

16 hours ago

Last Refreshed

⚙️ Project📷 Survey🔍 Method📊 Evaluation

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me 1️⃣ pruning / evicting / sparse resources from awesome-kv-cache-compression"

Installation instructions →

What's inside

🔍 Method

A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder.1️⃣ Pruning / Evicting / Sparse
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference.1️⃣ Pruning / Evicting / Sparse
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models.1️⃣ Pruning / Evicting / Sparse
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning.2️⃣ Merging
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching.1️⃣ Pruning / Evicting / Sparse
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction.1️⃣ Pruning / Evicting / Sparse

📷 Survey

📊 Evaluation

⚙️ Project

Showing a sample of 163 resources. View the full list on GitHub →