Skip to main content

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

19k
GitHub Stars
654
Curated Resources
10
Categories
6 hours ago
Last Refreshed
Research Summaries and TrendsProminent NLP Research LabsTutorialsLibrariesTasks and MethodsDatasetsMultilingual NLP FrameworksLanguage Models for NLPNLP per LanguageSee Also

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me machine translation resources from awesome-nlp"

Installation instructions →

What's inside

Research Summaries and Trends

  • ACL Anthology

    canonical archive of papers from ACL, EMNLP, NAACL, EACL, COLING, and related venues.

  • ACL Rolling Review

    the rolling review process feeding ACL-affiliated venues.

Libraries

  • A collection of Natural Language Processing (NLP) Ruby libraries, tools and software

  • AllenNLP

    An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.

  • Amazon ComprehendServices

    NLP and ML suite covers most common tasks like NER, tagging, and sentiment analysis

  • AnaforaAnnotation Tools

  • Annotation LabAnnotation Tools

    Free End-to-End No-Code platform for text annotation and DL model training/tuning. Out-of-the-box support for Named Entity Recognition, Classification, Relation extraction and Assertion Status Spark NLP models. Unlimited support for users, teams, projects, documents. Not FOSS.

  • ArgillaAnnotation Tools

    open-source platform for collecting human feedback, building NLP and LLM datasets, and curating preference data.

Tasks and Methods

Tutorials

Language Models for NLP

  • AfriqueLLMMultilingual and Cross-Lingual Models

    suite of open LLMs (4B-14B) continued-pretrained on 26B tokens across 20 African languages with a comprehensive empirical study of data mixing.

  • Alignment Faking in Large Language ModelsBias, Fairness, Safety in NLP

    models strategically complying during training.

  • Apple Intelligence Foundation Language ModelsEfficient and Small Language Models

    on-device 3B model using KV-cache sharing and 2-bit QAT for 37.5% cache memory reduction without accuracy loss.

  • A Primer in BERTologyProbing and Interpretability

    what BERT learns about language.

  • Atomic CalibrationFactuality, Hallucination, Calibration

    claim-level calibration analysis for long-form generation; models are substantially worse-calibrated on extended outputs than on single claims.

  • AWQEfficient and Small Language Models

    activation-aware weight quantization.

NLP per Language

  • AI4Bharat IndicNLP SuiteLibraries and Tooling

    tools, datasets, and models across 22 Indic languages.

  • AiravataModels and Embeddings

    instruction-tuned Hindi LLM.

  • AlbertinaModels

    encoder-only Portuguese LMs for both PT-PT and PT-BR.

  • ALLaMModels and Embeddings

    Arabic-first foundation models.

  • AlpinoNLP in Dutch

    dependency parser for Dutch (also does POS tagging and lemmatization).

  • AraBERTModels and Embeddings

    Arabic BERT family.

See Also

Datasets

  • Common Corpus

    2T-token open-license multilingual corpus.

  • CulturaX

    6.3T tokens across 167 languages.

  • Dolma

    3T-token open pretraining corpus with documented filtering pipeline.

  • FineWeb / FineWeb-Edu

    15T-token cleaned web corpus; FineWeb-Edu filters for educational quality.

  • gensim-data

    data repository for pretrained NLP models and NLP corpora.

Showing a sample of 654 resources. View the full list on GitHub →