awesome-data-efficient-llm

github.com/luo-junyu/awesome-data-efficient-llm ↗

A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective

GitHub Stars

204

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me ❖ paper list resources from awesome-data-efficient-llm"

Installation instructions →

What's inside

❖ Paper List

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks
Propose active IT based on prompt uncertainty to select tasks for LLM tuning.
Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Generate Lean 4 proof data to enhance LLM theorem - proving, without experimental focus.
A Guide To Effectively Leveraging LLMs for Low-Resource Text Summarization: Data Augmentation and Semi-supervised Approaches
Two new methods for low - resource text summarization are proposed.
All-in-One Tuning and Structural Pruning for Domain-Specific LLMs
ATP is a unified approach to pruning & fine - tuning LLMs via a trainable generator.
Alpagasus:Training a Better Alpaca with Fewer Data
Propose data selection strategy, filter low - quality data for IFT, ALPAGASUS as example.
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Model is not a General Substitute for GPT-4
Fine - tuned judge models have limitations, integrated method improves them.

Showing a sample of 204 resources. View the full list on GitHub →