awesome-human-activity-recognition
github.com/Leooo-Huang/awesome-human-activity-recognition ↗Always up-to-date, most comprehensive HAR resource — continuously scanned and auto-updated from Papers with Code. 53 datasets integrated across all modalities.
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me skeleton action recognition resources from awesome-human-activity-recognition"
Installation instructions →What's inside
Frameworks and Libraries
- 2s-AGCNSkeleton Action Recognition
Two-stream adaptive graph convolutional network for skeleton-based action recognition from CVPR 2019.
- aeonWearable Sensor HAR
Unified Python toolkit for time series including classification, clustering, and anomaly detection.
- CTR-GCNSkeleton Action Recognition
Channel-wise topology refinement graph convolution for skeleton-based action recognition from ICCV 2021.
- DeepConvLSTMWearable Sensor HAR
Reference implementation of the convolutional LSTM architecture for wearable activity recognition.
- Hang-Time HARWearable Sensor HAR
Basketball activity recognition from a single wrist-worn inertial sensor using deep learning.
- HD-GCNSkeleton Action Recognition
Hierarchically decomposed graph convolutional network for skeleton action recognition from AAAI 2024.
Datasets
- ActivityNetVision (RGB / Depth)
Temporal action detection benchmark with 20k untrimmed YouTube videos across 200 classes.
- ActivityNet CaptionsMultimodal and Egocentric
Dense video captioning and temporal grounding with 20k videos and 100k captions.
- AMASSSkeleton and Mocap
Unified SMPL motion capture parameters from 40+ datasets covering 16k minutes and 344 subjects.
- AVAVision (RGB / Depth)
Spatio-temporal action detection with 430 movie clips and 80 atomic action labels with bounding boxes.
- BabelSkeleton and Mocap
Motion-language alignment dataset with 43 hours and 3.7k sequences annotated with SMPL and text labels.
- BEHAVEEmerging and Frontier
RGB-D human-object interaction with 3D pose spanning 321 sequences from 20 subjects.
Competitions and Challenges
- ActivityNet Challenge
Annual challenge for temporal action detection, proposals, and dense captioning.
- Babel Challenge
Motion-language understanding and temporal action segmentation on mocap data.
- Ego-Exo4D Challenge 2025
CVPR 2025 multi-track benchmark covering ego-pose, action recognition, and language understanding.
- EPIC-Kitchens Challenge
Egocentric action recognition, detection, and anticipation competition.
- SHL Recognition Challenge
Annual challenge for transportation mode recognition from smartphone sensors.
- UAV-Human Challenge
Human behavior understanding from UAV perspectives with multi-modal data.
Key Papers
- Attend and DiscriminateWearable and Sensor HAR
Abedin et al., IMWUT 2021, attention mechanisms for multi-sensor HAR.
- C3D: Learning Spatiotemporal FeaturesFoundational
Tran et al., ICCV 2015, pioneering 3D convolutions for video feature learning.
- DeepConvLSTMWearable and Sensor HAR
Ordonez and Roggen, Sensors 2016, establishing deep learning for wearable activity recognition.
- Deep Learning for HAR: A SurveySurveys
Li et al., ACM Computing Surveys 2022, comprehensive review of deep learning approaches for HAR.
- I3D: Quo Vadis Action RecognitionFoundational
Carreira and Zisserman, CVPR 2017, inflating 2D ImageNet architectures to 3D video.
- InternVideo2Transformer Era (2020 onwards)
Wang et al., ECCV 2024, scaling video foundation models to 6B parameters across 60+ benchmarks.
Related Awesome Lists
- Awesome Action Recognition
Action recognition papers and datasets.
- Awesome IMU Sensing
IMU-based sensing for activity recognition and navigation.
- Awesome Pose Estimation
Human pose estimation methods and benchmarks.
- Awesome Self-Supervised Learning
Self-supervised learning methods applicable to video and sensor modalities.
- Awesome Skeleton-based Action Recognition
GCN and transformer methods for skeleton HAR.
- Awesome Video Understanding
Video understanding systems and architectures.
Tutorials and Courses
- Coursera - Motion Planning
University of Pennsylvania course covering motion representations relevant to HAR.
- Dive into Deep Learning - Action Recognition
Interactive textbook chapter on video understanding and action recognition with PyTorch code.
- MMAction2 Tutorials
Step-by-step guide to training action recognition models on custom datasets.
- Motion Diffusion Tutorial
Colab notebook for training text-conditioned human motion diffusion models on HumanML3D.
- Sensor HAR Tutorial by Marius Bock
Comprehensive deep learning tutorial for inertial sensor HAR with PyTorch.
- Stanford CS231N - Video Understanding
Lecture materials covering temporal modeling, two-stream networks, and 3D convolutions for action recognition.
Tools and Utilities
- Decord
Efficient GPU-accelerated video reader for deep learning training pipelines.
- MediaPipe
Google's on-device ML framework for pose estimation, hand tracking, and gesture recognition.
- MMAction2 Model Zoo
Pretrained checkpoints and configs for 100+ action recognition models.
- OpenPose
Real-time multi-person keypoint detection for skeleton extraction from video.
- Papers with Code - HAR Leaderboards
Live SOTA tracking across all major HAR benchmarks.
- vid2player
Character animation from video input, useful for activity recognition visualization.
Pretrained Models
- InternVideo2 Model Zoo
6B-parameter video-language model checkpoints on Hugging Face for action recognition and retrieval.
- MotionBERT Checkpoints
Pretrained motion encoder transferable to 3D pose estimation, action recognition, and mesh recovery.
- MVD
Masked video distillation pretrained model competitive with VideoMAE on downstream action recognition.
- UniFormerV2
Efficient video transformer with multi-scale tokens achieving 90.0% top-1 on Kinetics-400.
- VideoMAE V2
Billion-parameter video foundation model pretrained on millions of clips, finetunable for action recognition.
Showing a sample of 123 resources. View the full list on GitHub →