awesome-video-captioning
github.com/tgc1997/awesome-video-captioning ↗A curated list of research papers in Video Captioning
121
GitHub Stars
70
Curated Resources
8
Categories
1 hour ago
Last Refreshed
201520162017201820192020Dense-CaptioningGrounded-Captioning
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me dense-captioning resources from awesome-video-captioning"
Installation instructions →What's inside
Dense-Captioning
- Adversarial Inference for Multi-sentence Video Description
- An Efficient Framework for Dense Video Captioning
- Attend and Interact: Higher-Order Object Interactions for Video Understanding
- Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
- Dense-Captioning Events in Videos
- Dense Relational Captioning: Triple-stream Networks for Relationship-based Captioning
2017
- Attention-Based Multimodal Fusion for Video Description
- End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering
- Hierarchical Boundary-Aware Neural Encoder for Video Captioning
- Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
- MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning
- Multi-Task Video Captioning with Video and Entailment Generation
2020
- Controllable Video Captioning with an Exemplar Sentence
- Joint Commonsense and Relation Reasoning for Image and Video Captioning
- Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
- Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning
- Learning to Discretely Compose Reasoning Module Networks for Video Captioning
- Object Relational Graph with Teacher-Recommended Learning for Video Captioning
2019
- Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network
- Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention
- Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning
- Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
- Memory-Attended Recurrent Network for Video Captioning
- Motion Guided Spatial Attention for Video Captioning
2015
2018
- ECO: Efficient Convolutional Network for Online Video Understanding
- Fine-grained Video Captioning for Sports Narrative
- Interpretable Video Captioning via Trajectory Structured Localization
- Less Is More: Picking Informative Frames for Video Captioning
- M3: Multimodal Memory Modelling for Video Captioning
- Reconstruction Network for Video Captioning
Grounded-Captioning
2016
- Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
- Jointly Modeling Embedding and Translation to Bridge Video and Language
- MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
- Video Description using Bidirectional Recurrent Neural Networks
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Showing a sample of 70 resources. View the full list on GitHub →