awesome-information-retrieval
github.com/harpribot/awesome-information-retrieval ↗A curated list of awesome information retrieval resources
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me courses resources from awesome-information-retrieval"
Installation instructions →What's inside
Courses
- 11-442 / 11-642: Search Engines
Jamie Callan (CMU).
- 600.466: Information Retrieval and Web Agents
David Yarowsky (John Hopkins University).
- Coursera - Text Retrieval and Search Engines
Prof. ChengXiang Zhai (University of Illinois at Urbana-Champaign).
- CS 172: Introduction to Information Retrieval
Vagelis Hristidis (University of California - Riverside).
- CS 276 / LING 286: Information Retrieval and Web Search
Chris Manning and Pandu Nayak (Stanford University).
- CS 371R: Information Retrieval and Web Search
Raymond J. Mooney (University of Texas at Austin).
Datasets
- 20 Newsgroup dataset
This data set consists of 20000 newsgroup messages.posts taken from 20 newsgroup topics.
- Advanced Cross Linugal Information Retrieval and Question Answering (ACLIA)
The dataset is used for the task of cross-lingual question answering but the complexity of the task is higher than CLQA dataset.
- Blog
Explore information seeking behavior in the blogosphere.
- Chemical IR
Address challenges in building large chemical testbeds for chemical IR.
- Clinical Decision Support
Investigate techniques to link medical cases to information relevant for patient care.
- CLIR Test Collections
This dataset can be used for cross lingual IR between CJKE (Chinese-Japanese-Korean-English) languages. It is suitable for the following tasks:
Software
- Apache Lucene
Open Source Search Engine that can be used to test Information Retrieval Algorithm. Twitter uses this core for its real-time search.
- Indri Search Engine
Another Open Source Search Engine competitor of Apache Lucene.
- Lemur Toolkit
Open Source Toolkit for research in Language Modeling, filtering and categorization.
- The Lemur Project
Open Source Toolkit for research in Language Modeling, filtering and categorization.
Talks
- Beware online "filter bubbles"
Eli Pariser (Author of the Filter Bubble, TED Talk).
- Challenges in Building Large-Scale Information Retrieval Systems
Jeff Dean (WSDM Conference, 2009).
- Do we have the right to be forgotten?
Michael Douglas [TEDx SouthBank].
- Extreme Classification: A New Paradigm for Ranking & Recommendation
Manik Verma (Microsoft Research)
- Information Experience - Solution to Information Overload on Web
Doug Imbruce (Techcrunch Disrupt)[Doug Imbruce is the Founder of Qwiki, Inc, a technology startup in New York, NY, acquired by Yahoo! in 2013].
- Internet Privacy
Dr. Alma Whitten (Google Brussels Tech Talk).
Blogs
- Can Deep Learning help solve Deep Learning
Information Retrieval from Lip Reading.
- Deep Neural Network Learns to Judge Books by Their Covers
Information Extraction.
- Information Retrieval and the Web
Google Research.
- IR Thoughts
Dr. Edel Garcia.
- Neural Network Learns to Identify Criminals by Their Faces
Information Extraction.
- To reduce biases in machine learning start with openly discussing the problem
Bias in Relevance.
Books
- Information Retrieval: A Survey
Ed Greengrass, 2000. (Comprehensive survey of Conventional Information Retrieval, before Deep Learning era).
- Information Retrieval in Practice
B. Croft, D. Metzler, T. Strohman. Pearson Education, 2009.
- Introduction to Information Retrieval
C.D. Manning, P. Raghavan, H. Schütze. Cambridge UP, 2008. (First book for getting started with Information Retrieval).
- Introduction to Modern Information Retrieval
G.G. Chowdhury. Neal-Schuman, 2003. (Intended for students of library and information studies).
- Language Modeling for Information Retrieval
W.B. Croft, J. Lafferty. Springer, 2003. (Handles Language Modeling aspect of Information Retrieval. It also extensively details probabilistic perspective in this domain, which is interesting).
- Mining the Web: Analysis of Hypertext and Semi Structured Data
S. Chakrabarti. Morgan Kaufmann, 2002.
Showing a sample of 97 resources. View the full list on GitHub →