awesome-audio-visual

A curated list of different papers and datasets in various areas of audio-visual processing

775

GitHub Stars

262

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me resources resources from awesome-audio-visual"

ACAV100M
140 million full-length videos (total duration 1,030 years) and produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence.
AIST++
A large-scale 3D human dance motion dataset, which contains a wide variety of 3D motion paired with music It is built upon the AIST Dance Database, which is an uncalibrated multi-view collection of dance videos.
AudioSet
Audio-Visual Classification
AudioSet Single Source
Subset of AudioSet videos containing only a single souding object
AudioSetZSL
Audio-Visual Zero-shot Learning
AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE)
Geotagged aerial images and sounds, classified into 13 scene classes

Showing a sample of 262 resources. View the full list on GitHub →