awesome-transformer-nlp

A curated list of NLP resources focused on Transformer networks, attention mechanism, GPT, BERT, ChatGPT, LLMs, and transfer learning.

1.1k

GitHub Stars

293

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me transformer architecture resources from awesome-transformer-nlp"

Accelerating Large Language Model Decoding with Speculative Sampling (paper)Transformer Architecture
Speculative sampling algorithm enable the generation of multiple tokens from each transformer call. Achieves a 2–2.5x decoding speedup with Chinchilla in a distributed setup, without compromising the sample quality or making modifications to the model itself.
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series ModelsGenerative Pre-Training Transformer (GPT)
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT (paper)Large Language Model (LLM)
My remarks: this paper raises a lot of questions around the term "foundation models", i.e., what's the model bare minimum number of parameters to qualify as foundation? It sounds to me foundation models are an "invented" concept that doesn't have good validity.
AI And The Limits Of Language — An AI system trained on words and sentences alone will never approximate human understandingAdditional Reading
What LLMs like ChatGPT can and cannot do, and why AGI is not here yet.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations paperBERT and Transformer
A Length-Extrapolatable Transformer (paper)Transformer Architecture
This improves

A Hackers' Guide to Language Models (video)
A quick run through all the basic ideas of language models, how to use them (both open models and OpenAI-based models) using code as much as possible.
A visual intro to large language models (LLMs) by Jay Alammar/Cohere
A high-level look at LLMs and some of their applications for language processing. It covers text generation models (like GPT) and representation models (like BERT).
How to train a new language model from scratch using Transformers and TokenizersTutorials
Interfaces for Explaining Transformer Language Models
A gentle visual to Transformer models by looking at input saliency and neuron activation inside neural networks.

algteam/bert-examples
BERT examples.
bigboNed3/bert_serving
Export BERT model for serving.
brightmart/bert_language_understanding
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN.
guotong1988/BERT-chinese
Pre-training of deep bidirectional transformers for Chinese language understanding.
hanxiao/bert-as-service
Mapping a variable-length sentence to a fixed-length vector using pretrained BERT model.
HighCWu/keras-bert-tpu
Implementation of BERT that could load official pre-trained models for feature extraction and prediction on TPU.

Alpaca.cppOther
Run a fast ChatGPT-like model locally on your device.
Apple Neural Engine (ANE) TransformersOther
Transformer architecture optimized for Apple Silicon.
bojone/bert4kerasKeras
Light reimplement of BERT for Keras.
CformersOther
SoTA Transformers with C-backend for fast inference on your CPU.
codertimo/BERT-pytorchPyTorch
Google AI 2018 BERT pytorch implementation.
CyberZHG/keras-bertKeras
Implementation of BERT that could load official pre-trained models for feature extraction and prediction.

asyml/texarText Generation
Toolkit for Text Generation and Beyond.
benywon/ChineseBertQuestion Answering (QA)
This is a Chinese BERT model specific for question answering.
brightmart/sentiment_analysis_fine_grainClassification
Multi-label classification with BERT; Fine Grained Sentiment Analysis from AI challenger.
facebookresearch/SpanBERTQuestion Answering (QA)
Question Answering on SQuAD; improving pre-training by representing and predicting spans.
fooSynaptic/BERT_classifer_trialClassification
BERT trial for Chinese corpus classfication.
FuYanzhe2/Name-Entity-RecognitionNamed-Entity Recognition (NER)
Lstm-CRF, Lattice-CRF, recent NER related papers.

Cognitive Biases in Large Language Models
Core Views on AI Safety: When, Why, What, and How
Discovering Language Model Behaviors with Model-Written Evaluations (paper)
They automatically generate evaluations with LMs. They discover new cases of inverse scaling where LMs get worse with size. They also find some of the first examples of inverse scaling in RLHF, where more RLHF makes LMs worse.
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models (paper)
The paper argues GPT is a General Purpose Technology.

FastBert
A simple deep learning library that allows developers and data scientists to train and deploy BERT based models for NLP tasks beginning with text classification. The work on FastBert is inspired by fast.ai.
gpt2tc
A small program using the GPT-2 LM to complete and compress texts. It has no external dependency, requires no GPU and is quite fast. The smallest model (117M parameters) is provided. Larger models can be downloaded as well. (no waitlist, no sign up required).
jessevig/bertviz
Tool for visualizing attention in the Transformer model.

Showing a sample of 293 resources. View the full list on GitHub →