awesome-audio-speech
github.com/kennethanceyer/awesome-audio-speech ↗Awesome list of Audio, Speech, and DSP(Digital signal processing)
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me filtering / denoising resources from awesome-audio-speech"
Installation instructions →What's inside
Filtering / Denoising
Books
- Adaptive Filter Theory by Simon Haykin
- Digital Signal Processing: Principles, Algorithms, and Applications by John G. Proakis and Dimitris K Manolakis
- Discrete-Time Signal Processing by Alan V. Oppenheim and Ronald W. Schafer
- DSP First: A Multimedia Approach by James H. McClellan and Ronald W. Schafer
- Signals and Systems by Alan V. Oppenheim and Alan S. Willsky
- The Scientist and Engineer's Guide to Digital Signal Processing by Steven W. Smith
Recognition
- Amazon Transcribe
- Deep Speech 2 (Baidu Research)
- Deep Speech (Baidu Research)
- DistilWhisper
Hugging Face's distilled version of Whisper.
- Faster Whisper
An optimized implementation for faster processing.
- Google Speech-to-Text
Open source projects
- Audacity
A cross-platform audio editor and recorder that supports many formats and provides a user-friendly interface.
- DeepSpeech
A speech-to-text engine developed by Mozilla Research.
- librosa
A library for audio and music analysis in Python, providing functions for computing features, such as MFCCs, chroma, and beat-related features.
- PulseAudio
A cross-platform sound server for Linux, Unix, and Windows systems that provides sound server functionality to other applications.
- PyTorch Audio
A library that provides a PyTorch-based implementation of common audio functions, such as spectrogram computation, audio pre-processing, and spectrogram-based features.
- SoX
A cross-platform audio processing tool that provides a command-line interface for converting, editing, and playing audio files.
Research papers
Synthesis
Diarization
- Fully Supervised Speaker Diarization
A novel approach to speaker diarization using fully supervised learning.
- NVIDIA's Speaker Diarization
NVIDIA's advanced approach to speaker diarization.
- Speaker Diarization with LSTM
A paper on using LSTM networks for speaker diarization.
Showing a sample of 59 resources. View the full list on GitHub →