indicnlp_catalog
github.com/ai4bharat/indicnlp_catalog ↗A collaborative catalog of NLP resources for Indic languages
635
GitHub Stars
267
Curated Resources
11
Categories
3 hours ago
Last Refreshed
:+1: Featured ResourcesMajor Indic Language NLP RepositoriesLibraries and ToolsEvaluation BenchmarksStandardsText CorporaModelsSpeech CorporaOCR CorporaMultimodal CorporaLanguage Specific Catalogs
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me dialog resources from indicnlp_catalog"
Installation instructions →What's inside
Speech Corpora
Text Corpora
- A Code-Mixed Medical Task-Oriented Dialog DatasetDialog
- ACTSA corpus for TeluguSentiment, Sarcasm, Emotion Analysis
- A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection, 2018Hate Speech and Offensive Comments
- Aggression-annotated Corpus of Hindi-English Code-mixed Data, 2018Hate Speech and Offensive Comments
- AI4Bharat AksharantarParallel Transliteration Corpus
- AI4Bharat Cross-lingual Semantic Textual SimilarityLexical Resources and Semantic Similarity
Major Indic Language NLP Repositories
Models
- AI4Bharat IndicBARTPre-trained Language Models
- AI4Bharat IndicBERTPre-trained Language Models
- AI4Bharat IndicFTWord Embeddings
- AI4Bharat IndicNERNER
- AI4Bharat IndicNLP ProjectMorphanalyzers
- AI4Bharat IndicWav2VecSpeech Models
Evaluation Benchmarks
- AI4Bharat IndicGLUE
- AI4Bharat IndicNLG Suite
- AI4Bharat Text Classification
- GLUECoS
Language Identification (LID), POS Tagging (POS), Named Entity Recognition (NER), Sentiment Analysis (SA), Question Answering (QA), Natural Language Inference (NLI).
Standards
Libraries and Tools
OCR Corpora
Showing a sample of 267 resources. View the full list on GitHub →