awesome-audio-visual
github.com/krantiparida/awesome-audio-visual ↗A curated list of different papers and datasets in various areas of audio-visual processing
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me datasets resources from awesome-audio-visual"
Installation instructions →What's inside
Datasets
- ACAV100M
140 million full-length videos (total duration 1,030 years) and produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence.
- AIST++
A large-scale 3D human dance motion dataset, which contains a wide variety of 3D motion paired with music It is built upon the AIST Dance Database, which is an uncalibrated multi-view collection of dance videos.
- AudioSet
Audio-Visual Classification
- AudioSet Single Source
Subset of AudioSet videos containing only a single souding object
- AudioSetZSL
Audio-Visual Zero-shot Learning
- AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE)
Geotagged aerial images and sounds, classified into 13 scene classes
Showing a sample of 31 resources. View the full list on GitHub →