awesome-prompting-on-vision-language-model
github.com/jindonggu/awesome-prompting-on-vision-language-model ↗This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
510
GitHub Stars
115
Curated Resources
3
Categories
23 hours ago
Last Refreshed
# :paperclips: Awesome PapersPrompting Model in Image-Text Matching (e.g. on CLIP)Prompting Model in Text-to-Image Generation (e.g. on Stable Diffusion)
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me applications & responsible ai resources from awesome-prompting-on-vision-language-model"
Installation instructions →What's inside
Prompting Model in Text-to-Image Generation (e.g. on Stable Diffusion)
- Adding Conditional Control to Text-to-Image Diffusion Models
IEEE/CVF
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
ICLR
- A Pilot Study of Query-Free Adversarial Attack Against Stable Diffusion
CVPR
- Are Diffusion Models Vulnerable to Membership Inference Attacks?
ICML
- A Reproducible Extraction of Training Images from Diffusion Models
arXiv
- Denoising Diffusion Probabilistic Models
NeurIPS
Prompting Model in Image-Text Matching (e.g. on CLIP)
- Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
NeurIPS
- AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
NeurIPS
- BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised LearningApplications & Responsible AI
IEEE
- CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive LearningApplications & Responsible AI
ICLR Workshop
- Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation DetectionApplications & Responsible AI
ICLR
- Conditional Prompt Learning for Vision-Language Models
CVPR
# :paperclips: Awesome Papers
- Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language ModelsPrompting Models in Multimodal-to-Text Generation (e.g. on Flamingo)
NeurIPS
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsPrompting Models in Multimodal-to-Text Generation (e.g. on Flamingo)
ICML
- Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsPrompting Models in Multimodal-to-Text Generation (e.g. on Flamingo)
NeurIPS
- Compositional Exemplars for In-context LearningPrompting Models in Multimodal-to-Text Generation (e.g. on Flamingo)
ICML
- Flamingo: a Visual Language Model for Few-Shot LearningPrompting Models in Multimodal-to-Text Generation (e.g. on Flamingo)
NeurIPS
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningPrompting Models in Multimodal-to-Text Generation (e.g. on Flamingo)
NeurIPS
Showing a sample of 115 resources. View the full list on GitHub →