Skip to main content

A curated list of Large Language Model (LLM) Interpretability resources.

1.5k
GitHub Stars
89
Curated Resources
5
Categories
2 hours ago
Last Refreshed
LLM Interpretability ToolsLLM Interpretability PapersLLM Interpretability ArticlesLLM Interpretability GroupsLLM Survey Paper

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me llm interpretability articles resources from awesome-llm-interpretability"

Installation instructions →

What's inside

LLM Interpretability Articles

LLM Interpretability Groups

  • Alignment Lab AI

    Group of researchers focusing on AI alignment.

  • EleutherAI

    Non-profit AI research lab that focuses on interpretability and alignment of large models.

  • Nous Research

    Research group discussing various topics on interpretability.

  • PAIR

    at Google work on

LLM Interpretability Papers

LLM Survey Paper

  • A Survey of Large Language Models

    . This survey paper provides an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers..

LLM Interpretability Tools

  • Attention Analysis

    Analyzing attention maps from BERT transformer.

  • Automated Interpretability

    Code for automatically generating, simulating, and scoring explanations of neuron behavior.

  • Awesome-Attention-Heads

    A carefully compiled list that summarizes the diverse functions of the attention heads.

  • Comgra

    Comgra helps you analyze and debug neural networks in pytorch.

  • Copy Suppression

    Designed to help explore different prompts for GPT-2 Small, as part of a research project regarding copy-suppression in LLMs.

  • ecco

    A python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Showing a sample of 89 resources. View the full list on GitHub →