low-resource-languages
github.com/richardlitt/low-resource-languages ↗A curated list of resources for the conservation, development, and documentation of low resource (human) languages.
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me software resources from low-resource-languages"
Installation instructions →What's inside
Generic Repositories
- 4langSoftware
Concept dictionary using Eilenberg machines.
- accentuate.usSoftware
- alignment-with-openfstSoftware
This is an implementation of the CRF autoencoder framework for four tasks: bitext word alignment, part-of-speech tagging, code switching, dependency parsing.
- ApertiumSoftware
- Apertium
A free/open-source machine translation platform, initially aimed at related-language pairs but expanded to deal with more divergent language pairs (Wikipedia-like army of other MT linguists). Wikipedia has a
- ark-tweet-nlpSoftware
CMU ARK Twitter Part-of-Speech Tagger (
Organizations
- 7000 LanguagesOther OSS Organizations
Creates free online language learning courses and materials in partnership with Indigenous, minority, and refugee communities.
- African Languages LabOther OSS Organizations
Develops enterprise-grade language AI models (including Mansa LLM) supporting 30+ African languages for translation, transcription, and NLP.
- AI4BharatOn GitHub
Open-source datasets, tools, and models for Indian languages from IIT Madras, including IndicTrans2 (translation), Indic-TTS, IndicLID (language identification), and IndicVoices.
- batumiOn GitHub
Speech recognition and natural language processing for low-resource languages
- BloomBooksOn GitHub
- cmusphinxOn GitHub
Mirror of the SourceForge repositories
Language Specific Projects
- Afrikaanse rekenaarlinguïstiek (Afrikaans computational linguistics)Afrikaans
- aimsighIrish
Source for the now-defunct aimsigh.com Irish search engine.
- akalongman/kautilitiesGeorgian
Convert Georgian letters to Latin and vice-versa (PHP).
- android_gl_dictGalician
Android Galician (gl_ES) Keyboard Dictionary
- an-metri-galGalician
Análise métrico de texto en verso en lingua galega (Galician language) gl-ES
- apertium-cat-glgGalician
Apertium translation pair for Catalan and Galician
Annotation
- AGTK
AGTK is a suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs. (Original project is on SourceForge:
- Annotation page
Ethnographic tools for annotation.
- brat
brat rapid annotation tool (brat) for online text annotation.
- brendano/gfl_syntax
Graph Fragment Language for Easy Syntactic Annotation.
- CLAM
Quickly and transparently transforms command-line NLP tools into RESTful webservices with an interface for human end-users.
- eopas
ETHNOER Online Presentation and Annotation System.
Android Applications
- Aikuma
Android software for recording and translation.
- AndroidFieldDB
An Android app which lets the user build a custom visual and auditory vocabulary, useful for guided anomia treatment and self designed language lessons by heritage speakers.
- AndroidFieldDBElicitationRecorder
A general purpose video recording tool.
- AndroidLanguageLessons
Lets heritage speakers create self designed language lessons.
- AndroidProductionExperiment
Android App to run perception experiments.
- Android Speech Recognition Trainer
Speech recognition training app for low resource languages which interfaces with FieldDB corpora.
FieldDB
- AndroidLanguageLearningClientForFieldDB-sikuliFieldDB Webservices/Components/Plugins
Sikuli tests for AndroidLanguageLearningClientForFieldDB.
- AuthenticationWebServiceFieldDB Webservices/Components/Plugins
A node.js web service which mananges users and corpora creation and authentication.
- bower-fielddbFieldDB Webservices/Components/Plugins
A bower repository which hosts fielddb core components, bower install fielddb --save.
- bower-fielddb-angularFieldDB Webservices/Components/Plugins
A bower repository which hosts fielddb-angular components, bower install fielddb-angular --save.
- FieldDB
An offline/online field database which adapts to its user's terminology and I-Language, has plugins for various data automation routines along the process of primary data collection to cleaning to publication and archival.
- FieldDBActivityFeedFieldDB Webservices/Components/Plugins
A fielddb activity feed widget which can be embedded in other codebases, websites etc
Flashcards
- Anki
Anki is a program to make and share flaschard decks (including audio) for any language or writing system.
- awesome-anki
A curated list of awesome Anki add-ons, decks and resources.
Audio automation
- arctic-prompts
Generate prompts PDF for CMU ARCTIC dataset.
- Audacity
Free, open source, cross-platform software for recording and editing sounds.
- AudioWebService
a simple nodejs server which accepts upload of audio and runs it through praat.
- AuToBI
Automatic prosodic annotation tool written in Java.
- BashScriptsForPhonetics
(
- CMU Sphinx
Open source toolkit for speech recognition. PocketSphinx, SphinxTrain, Sphinx4, and sphinxbase.
Showing a sample of 476 resources. View the full list on GitHub →