awesome-multimodal-search
github.com/mixpeek/awesome-multimodal-search βCollections of multimodal search libraries, service and research papers
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me βοΈ cloud services & apis resources from awesome-multimodal-search"
Installation instructions βWhat's inside
βοΈ Cloud Services & APIs
- Algolia
Search API with AI-powered vector search
- Anthropic Claude API
Text + image understanding
- AWS Rekognition + Kendra + Transcribe
Image, text, audio
- Cohere
Text embeddings with multilingual support
- Elastic AI Search
Enterprise search with vector capabilities
- Microsoft Azure AI Search
Text, images, PDFs, audio transcription
π Landmark Papers
π Tutorials & Demos
- Building Multimodal Search Engines
Text, Image
- ChromaDB Multimodal Examples
Text, Image
- FAISS Tutorial with Images
Image similarity
- Haystack Multimodal Pipelines
Text, Image, Audio
- Hugging Face CLIP Demo
Text-Image
- ImageBind + Deep Lake
Unified search
π Libraries & Frameworks
- ChromaDB
Embedding database for building AI applications with multimodal data.
- CLIP Retrieval
Lightweight toolkit to search CLIP-embedded LAION datasets.
- DocArray
Data structure for multimodal and nested data, pairs with Jina.
- FAISS
Library for efficient similarity search from Meta Research, supports image vectors.
- Haystack
End-to-end framework for building search pipelines with multimodal support.
- Jina AI
Flow-based neural search framework for text, image, video, and audio.
π° Multimodal Monday Blog Posts
- Multimodal Monday #1 - State of the Stack
Researchers introducing new methods to replace embeddings with discrete IDs for faster cross-modal search.
- Multimodal Monday #2 β From Tiny VLMs to 10MβToken Titans
Major multimodal model releases including Meta's Llama 4 Scout & Maverick and Microsoft's Phi-4-Multimodal, marking the start of a new era of natively multimodal AI.
- Multimodal Monday #3 β Scaling Multimodal AI: Laws, Lightweights & Large Releases
Apple's new scaling law research redefines how multimodal models are built, while Moonshot and OpenGVLab drop powerful open-source VLMs with reasoning and tool-use.
Showing a sample of 59 resources. View the full list on GitHub β