awesome-data-efficient-llm
github.com/luo-junyu/awesome-data-efficient-llm ↗A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me ❖ paper list resources from awesome-data-efficient-llm"
Installation instructions →What's inside
❖ Paper List
- Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks
Propose active IT based on prompt uncertainty to select tasks for LLM tuning.
- Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Generate Lean 4 proof data to enhance LLM theorem - proving, without experimental focus.
- A Guide To Effectively Leveraging LLMs for Low-Resource Text Summarization: Data Augmentation and Semi-supervised Approaches
Two new methods for low - resource text summarization are proposed.
- All-in-One Tuning and Structural Pruning for Domain-Specific LLMs
ATP is a unified approach to pruning & fine - tuning LLMs via a trainable generator.
- Alpagasus:Training a Better Alpaca with Fewer Data
Propose data selection strategy, filter low - quality data for IFT, ALPAGASUS as example.
- An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Model is not a General Substitute for GPT-4
Fine - tuned judge models have limitations, integrated method improves them.
Showing a sample of 204 resources. View the full list on GitHub →