awesome-kaldi

This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )

536

GitHub Stars

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me resources for understanding the math/science behind kaldi better: resources from awesome-kaldi"

A Bit of Progress in Language Modeling
The most basic and comprehensive article about the creation of Language-Models
GMM Acoustic Modeling and Feature Extraction
A really good presentation by Andrew Maas for better understanding the GMM-based phoneme alignment.
Semirings and WFST
A good small course (~3 hours) from Nanyang technological university that covers the idea of WFSTs in a really straight forward and visual way.
Speech Recognition with Weighted Finite-State Transducers
The "bible" for understanding WFST-based systems for Speech recognition.
The HTK book
The HTK book is for another ASR toolkit but it highlihts the basics of speech recognition in a a really intuitive and graphic way.

A time delay neural network architecture for efficient modeling of long temporal contexts
The article that describes the usage of TDNNs in Kaldi
Hybrid speech recognition with Deep Bidirectional LSTM
an article about the BLSTM basic recipe in Kaldi.
The Kaldi Speech Recognition Toolkit
The original article that described Kaldi and the different parts of the project. It should be noted that some parts of that article are outdated.

Building Speech Recognition Systems with the Kaldi Toolkit
This presentation is extremely long but also extremely helpful. Its the most complete source of information about the training process and its development.
Eleanor Chodroff Kaldi Tutorial
A good in depth tutorial about the training process with a lot of code examples.
How to start with kaldi and speech recognition
A Medium post (by me) regarding the general structure of the Kaldi project and its different parts.
How to Train a Deep Neural Net Acoustic Model with Kaldi
A tutorial by Josh Meyer for specifically running Kaldi with DNN
Kaldi for Dummies tutorial
The basic tutorial in the Kaldi documentation. It is really good for "hands on" experience but it is not so well explained.
Speaker Diarization with Kaldi
A tutorial about X-Vectors and Speaker Diarization.

combine_data.sh
If you have multiple datasets and you want to combine all of the manually, there is no need to do it file after file. this script will take an entire data directory and will combine all the files into the same new directory.
Finetune acoustic model
If you don't have a lot of data You can always train a Kaldi model from the closest domain to your domain and then take the
Kaldi-ONNX project by XiaoMi
A project that helps transferring the Kaldi model into ONNX so you could easily use the model in different frameworks.
perturb_data_dir_speed_3way.sh
this script will help you to change the speaking speed of different utterances without creating excess files. It does this by implementing an SoX command to your wav file and copying and editing all the other files in your folder. Using this script and also the next one is a must-have in most state-of-the-art systems and will help your model to generalize better.
perturb_data_dir_volume.sh
this script will do exactly the same but will change the volume of the utterances.
resample_data_dir.sh
You want to make a new model for different sampling rate but you don't want to manually re-sample you entire data? this script will help you to do it, again with a SoX command.

compile Kaldi for android
You can also compile the Kaldi project in a way that will work directly on android devices. That might not be a good idea with a heavy model, but can be used to more constrained models.
kaldi-adapt-lm
A tool that helps to adapt nnet3 chain models to a different language model.
kaldi-gstreamer-server
this is a nice project that will help you to integrate Kaldi toolkit and the
kaldi-offline-transcriber
A good example for a project that handles both training and decoding. It is being build for Estonian but can be easily transformed into any language.
online2-tcp-nnet3-decode-faster
A new excutable that was
tf-kaldi-speaker
A framework that combines TensorFlow and Kaldi in the context of speaker verification/identification tasks. The project has some pretrained model that were trained on huge datasets.

Decoding graph construction in Kaldi: A visual walkthrough
If you want to understand the different parts of the Decoding graph you should probably read this. It is required to understand those concepts for debugging your graph in the development of a new model.
Some Kaldi Notes
Some advanced notes that is highly recommended to read if you want to be a more trained user.

Showing a sample of 33 resources. View the full list on GitHub →