awesome-pretrained-models-for-information-retrieval
github.com/ict-bigdatalab/awesome-pretrained-models-for-information-retrieval ↗A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).
676
GitHub Stars
205
Curated Resources
8
Categories
3 hours ago
Last Refreshed
Survey PapersFirst Stage RetrievalRe-ranking StageJointly Learning Retrieval and Re-rankingModel-based IR SystemLLM and IRMultimodal RetrievalOther Resources
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me multi-stream architecture applied on input resources from awesome-pretrained-models-for-information-retrieval"
Installation instructions →What's inside
Multimodal Retrieval
- 12-in-1: Multi-Task Vision and Language Representation Learning.Multi-stream Architecture Applied on Input
- Dynamic Modality Interaction Modeling for Image-Text Retrieval.Unified Single-stream Architecture
- ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph.Multi-stream Architecture Applied on Input
- Learning Transferable Visual Models From Natural Language Supervision.Multi-stream Architecture Applied on Input
- M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training.Multi-stream Architecture Applied on Input
- M6-v0: Vision-and-Language Interaction for Multi-modal Pretraining.Multi-stream Architecture Applied on Input
LLM and IR
- ACID: Abstractive, Content-Based IDs for Document Retrieval with Language Models.LLM for IR
- A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models.LLM for IR
- Atlas: Few-shot Learning with Retrieval Augmented Language Models.Retrieval Augmented LLM
- Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels.LLM for IR
- Demonstrate–Search–Predict: Composing retrieval and language models for knowledge-intensive NLP.LLM for IR
- Enabling Large Language Models to Generate Text with Citations.LLM for IR
First Stage Retrieval
- A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval.Dense Retrieval
- Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval.Dense Retrieval
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval.Dense Retrieval
- Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation.Dense Retrieval
- BERT-based Dense Retrievers Require Interpolation with BM25 for Effective Passage Retrieval.Hybrid Retrieval
- COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List.Sparse Retrieval
Survey Papers
- A Deep Look into neural ranking models for information retrieval.
- Dense Text Retrieval based on Pretrained Language Models: A Survey.
- Pretrained Transformers for Text Ranking: BERT and Beyond.
- Pre-training Methods in Information Retrieval.
- Semantic Models for the First-stage Retrieval: A Comprehensive Review.
Jointly Learning Retrieval and Re-ranking
Model-based IR System
- A Neural Corpus Indexer for Document Retrieval.
- A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning.
- Autoregressive Search Engines: Generating Substrings as Document Identifiers.
- CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks.
- DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index.
- How Does Generative Retrieval Scale to Millions of Passages?
Re-ranking Stage
- Are Neural Ranking Models Robust?Other Topics
- A Unified Pretraining Framework for Passage Ranking and Expansion.Other Topics
- Axiomatically Regularized Pre-training for Ad hoc Search.Other Topics
- BERT-QE: Contextualized Query Expansion for Document Re-ranking.Other Topics
- Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching.Long Document Processing Techniques
- Beyond [CLS] through Ranking by Generation.Basic Usage
Other Resources
- BERT-related-papersOther Resources About Pre-trained Models in NLP
- Efficient Transformers: A Survey.Surveys About Efficient Transformers
- Faiss: a library for efficient similarity search and clustering of dense vectorsSome Retrieval Toolkits
- MatchZoo: a library consisting of many popular neural text matching modelsSome Retrieval Toolkits
- Pre-trained Languge Model Papers from THU-NLPOther Resources About Pre-trained Models in NLP
- Pre-trained Models for Natural Language Processing: A Survey.Other Resources About Pre-trained Models in NLP
Showing a sample of 205 resources. View the full list on GitHub →