awesome-efficient-llm
github.com/horseee/awesome-efficient-llm ↗A curated list for Efficient Large Language Models
2k
GitHub Stars
214
Curated Resources
2
Categories
8 hours ago
Last Refreshed
Full ListPaper from Sep 30, 2024 - Now (see Full List from May 22, 2023 here)
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me quantization resources from awesome-efficient-llm"
Installation instructions →What's inside
Paper from Sep 30, 2024 - Now (see Full List from May 22, 2023 here)
- 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUsQuantization
- Accelerated AI Inference via Dynamic Execution MethodsInference Acceleration
- Accelerated Test-Time Scaling with Model-Free Speculative SamplingInference Acceleration
- A Comprehensive Study on Quantization Techniques for Large Language ModelsQuantization
- Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM InferenceKV Cache Compression
- Adaptive Pruning for Large Language Models with Structural Importance AwarenessNetwork Pruning / Sparsity
Full List
- curated listPlease check out all the papers by selecting the sub-area you're interested in. On this main page, only papers released in the past 90 days are shown.
- efficient_plm/Please check out all the papers by selecting the sub-area you're interested in. On this main page, only papers released in the past 90 days are shown.
- project/Please check out all the papers by selecting the sub-area you're interested in. On this main page, only papers released in the past 90 days are shown.
Showing a sample of 214 resources. View the full list on GitHub →