awesome-vision-and-language-pre-training
github.com/phellonchen/awesome-vision-and-language-pre-training ↗Recent Advances in Vision and Language Pre-training (VLP)
297
GitHub Stars
99
Curated Resources
3
Categories
6 hours ago
Last Refreshed
Representation LearningTask-specificOther Analysis
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me vqa resources from awesome-vision-and-language-pre-training"
Installation instructions →What's inside
Other Analysis
- 12-in-1: Multi-Task Vision and Language Representation Learning
- A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
- A Comprehensive Survey of Deep Learning for Image Captioning
- ActBERT: Learning Global-Local Video-Text Representations
- Adaptive Transformers for Learning Multimodal Representations
- A repository of vision and language papers
Task-specific
- BERT Can See Out of the Box: On the Cross-modal Transferability of Text RepresentationsVQA
- CROSS-PROBE BERT FOR EFFICIENT AND EFFECTIVE CROSS-MODAL SEARCHText-Image Retrieval
- Dynamic Contrastive Distillation for Image-Text RetrievalText-Image Retrieval
- Fusion of Detected Objects in Text for Visual Question AnsweringVQA
- ImageBERT: Cross-Modal Pre-training with Large-scale Weak-supervised Image-text DataText-Image Retrieval
- Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQAVQA
Representation Learning
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
- CoCa: Contrastive Captioners are Image-Text Foundation Models
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations
- ERNIE-VIL: KNOWLEDGE ENHANCED VISION-LANGUAGE REPRESENTATIONS THROUGH SCENE GRAPH
Showing a sample of 99 resources. View the full list on GitHub →