awesome-nlp

github.com/awesomelistsio/awesome-nlp ↗

A curated list of awesome frameworks, libraries, tools, datasets, tutorials, and research papers for Natural Language Processing (NLP). This list covers a variety of NLP tasks, from text processing and tokenization to state-of-the-art language models and applications like sentiment analysis and machine translation.

GitHub Stars

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me frameworks and libraries resources from awesome-nlp"

Installation instructions →

What's inside

Frameworks and Libraries

AllenNLP
An open-source NLP research library built on top of PyTorch.
Hugging Face Transformers
A comprehensive library of state-of-the-art NLP models like BERT, GPT, and RoBERTa.
NLTK (Natural Language Toolkit)
A comprehensive library for text processing and analysis.
spaCy
An open-source library for advanced natural language processing in Python.
TextBlob
A simple library for processing textual data in Python.

Research Papers

Attention Is All You Need (2017)
The paper that introduced the Transformer architecture, revolutionizing NLP.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
The introduction of the BERT model.
ELMo: Deep Contextualized Word Representations (2018)
A model for contextual word embeddings.
GloVe: Global Vectors for Word Representation (2014)
A model for generating word embeddings.
Word2Vec: Efficient Estimation of Word Representations in Vector Space (2013)
The introduction of Word2Vec, a method for learning word embeddings.

NLP Tasks

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation
Fairseq
A Facebook AI research framework for sequence-to-sequence models.
OpenNMT
A neural machine translation framework.
PEGASUS
A pre-trained model specifically designed for text summarization.
spaCy NER
Stanford NER

Text Processing and Tokenization

BPE (Byte Pair Encoding)
A subword tokenization technique used by models like GPT and BERT.
Moses Tokenizer
A widely used tokenizer for machine translation tasks.
RegexpTokenizer (NLTK)
A tokenizer that uses regular expressions to split text into tokens.
SentencePiece
A language-independent tokenization and text processing library.
spaCy Tokenizer
A fast and efficient tokenizer integrated within the spaCy library.

Datasets

CoNLL-2003
A dataset for named entity recognition.
GLUE Benchmark
A collection of resources for evaluating natural language understanding systems.
IMDB Reviews
A dataset for sentiment analysis.
SQuAD (Stanford Question Answering Dataset)
A dataset for reading comprehension and question answering tasks.
tiny_qa_benchmark_pp
WikiText
A collection of high-quality text from Wikipedia for language modeling tasks.

Learning Resources

Coursera: Natural Language Processing Specialization
A comprehensive course on NLP by Deeplearning.ai.
Fast.ai NLP Course
A practical course on NLP using the fastai library.
Hugging Face Tutorials
Official tutorials for using the Hugging Face NLP library.
Stanford CS224N: Natural Language Processing with Deep Learning
A popular university course on NLP.

Pretrained Language Models

DistilBERT
A smaller, faster, and lighter version of BERT.
GPT-3 (Generative Pre-trained Transformer 3)
A powerful generative language model by OpenAI.
RoBERTa
An optimized variant of BERT, focusing on robustly optimized pretraining.
T5 (Text-to-Text Transfer Transformer)
A model that treats every NLP task as a text-to-text problem.
XLNet
A generalized autoregressive pretraining model that outperforms BERT on several tasks.

Tools and Applications

FastText
A library for efficient text classification and representation learning.
Gensim
A Python library for topic modeling and document similarity.
LexRank
A text summarization library using graph-based ranking algorithms.
Polyglot
A multilingual NLP toolkit supporting various languages.
Stanford CoreNLP
A suite of NLP tools for linguistic analysis.

Showing a sample of 46 resources. View the full list on GitHub →