Skip to main content

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

18k
GitHub Stars
548
Curated Resources
15
Categories
7 hours ago
Last Refreshed
✨ Highlights of NJU-MiGMultimodal Instruction Tuning (& Latest Works)Multimodal HallucinationMultimodal In-Context LearningMultimodal Chain-of-ThoughtLLM-Aided Visual ReasoningFoundation ModelsEvaluationMultimodal RLHFOthersDatasets of Multimodal Instruction TuningDatasets of In-Context LearningDatasets of Multimodal Chain-of-ThoughtDatasets of Multimodal RLHFBenchmarks for Evaluation

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me multimodal instruction tuning (& latest works) resources from awesome-multimodal-large-language-models"

Installation instructions →

What's inside

Datasets of Multimodal Instruction Tuning

  • ALLaVA-4V

    ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model

  • BuboGPT

    BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

  • CAP2QA

    Visually Dehallucinative Instruction Generation

  • cc-sbu-align

    MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

  • ChartLlama

    ChartLlama: A Multimodal LLM for Chart Understanding and Generation

  • ComVint

    What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

Benchmarks for Evaluation

  • BenchLMM

    BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

  • Bingo

    Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

  • Charting-New-Territories

    Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

  • CharXiv

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

  • CMMMU

    CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

  • CoBSAT

    Can MLLMs Perform Text-to-Image In-Context Learning?

Showing a sample of 548 resources. View the full list on GitHub →