awesome-vector-databases

github.com/ever-works/awesome-vector-databases ↗

A curated list of vector database solutions, libraries, and resources for AI applications - https://vectordb.ever.works

GitHub Stars

978

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me vector db research & surveys resources from awesome-vector-databases"

Installation instructions →

What's inside

Vector DB Research & Surveys

A Brief Survey of Vector Databases
BigDIA 2023 survey paper providing a concise overview of vector databases, ANN algorithms, technologies, and applications. Reviews core indexing methods and benchmarks; highlights gaps between theory and practice in scalability. Ideal for academic and research use cases in selecting vector DB literature; compares high-level 2023 overview with prior surveys and emerging 2026 benchmarks. (
A Comprehensive Survey on Vector Database
ArXiv 2023 survey paper categorizing ANN algorithms (hash/tree/graph/quantization) for vector databases, covering architecture, storage, retrieval, and LLM integration. Details benchmarks reviewed and accuracy-scalability trade-offs. Suited for academic/research use in ANN method selection; contrasts 2023 algorithmic depth with prior system surveys and 2026 benchmarks. (
Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
arXiv 2024 research paper investigating ANN search performance on dynamic datasets with updates. Reviews benchmarks for vector indexing adaptability and efficiency. For academic/research use in dynamic vector DB scenarios; compares to prior static benchmarks and 2026 dynamic trends. (

Research Papers & Surveys

Accelerating ANNS in Hierarchical Graphs via Shortcuts
VLDB 2025 paper proposing efficient level navigation with shortcuts for accelerating approximate nearest neighbor search in hierarchical graph indexes, improving traversal speed across multi-layer graph structures. (
Accelerating Graph-based ANNS with Adaptive Awareness
SIGKDD 2025 paper proposing adaptive awareness capabilities for graph-based approximate nearest neighbor search, enabling the search algorithm to dynamically adjust its strategy based on local graph characteristics and query properties. (
Accelerating Graph Indexing for ANNS on Modern CPUs
SIGMOD 2025 paper proposing optimizations for graph-based approximate nearest neighbor search indexing on modern CPU architectures, leveraging SIMD instructions and cache-aware algorithms for improved index construction performance. (
AdaptiveIndex — Adaptive Indexing in High-Dimensional Metric Spaces
VLDB 2023 paper introducing an adaptive indexing approach for high-dimensional metric spaces that dynamically adjusts its structure based on query workloads to improve search performance over time. (
Approximate Nearest Neighbor Search in Recommender Systems
Technical article by Yury Malkov covering approximate nearest neighbor search applications in recommender systems. Discusses how ANN algorithms accelerate candidate generation in large-scale recommendation pipelines. (
ARKGraph — All-Range Approximate K-Nearest-Neighbor Graph
VLDB 2023 paper proposing ARKGraph, a graph-based method for all-range approximate k-nearest neighbor search that adapts to various recall requirements. (

LLM Frameworks

ACE Framework
Agentic Context Engineering framework for self-improving LLMs with structured context management, tool guides, and vector-based memory for agent behavior optimization. (
AG2
Open-source multi-agent AI framework (formerly Microsoft AutoGen) with event-driven core, async-first execution, and pluggable orchestration strategies for building AI agent systems. (
AutoGen
Microsoft's open-source framework for multi-agent conversations with tool use, memory persistence, and vector retrieval integration for collaborative LLM agents and chat systems. (
AutoRAG
Automated framework for optimizing Retrieval Augmented Generation pipelines using AutoML-style techniques to find the best RAG module combinations and parameters for specific datasets. (

research-papers-surveys

ACL 2023 Tutorial: Retrieval-Based Language Models and Applications
This ACL 2023 tutorial reviews retrieval-based language models, which often rely on vector databases and vector search systems to retrieve relevant context. The tutorial covers methods and applications central to the use of vector databases in modern NLP systems. (
ACORN
ACORN is a performant and predicate-agnostic search system for vector embeddings and structured data, enhancing the capability of vector databases to handle complex queries over high-dimensional data efficiently. (
Adanns
Adanns is a framework for adaptive semantic search, focusing on efficient and scalable similarity search in high-dimensional vector spaces. Its relevance to 'Awesome Vector Databases' lies in its support for advanced vector search techniques suitable for AI and machine learning applications. (
AiSAQ
AiSAQ is an all-in-storage approximate nearest neighbor search system that uses product quantization to enable DRAM-free vector similarity search, serving as a specialized vector search/indexing approach for large-scale information retrieval. (
BANG
BANG is a billion-scale approximate nearest neighbor search system optimized for single GPU execution, enabling high-performance vector search in vector database environments at massive scale. (

Concepts & Definitions

ACORN Algorithm for Filtered Vector Search
Advanced algorithm designed to make hybrid searches combining metadata filters and vector similarity more efficient, implemented in Apache Solr and other vector search systems. (
Agentic Chunking
An advanced RAG chunking strategy that uses LLMs to dynamically determine optimal document splitting based on semantic meaning and content structure. Agentic chunking analyzes document characteristics and adapts the chunking approach per document for superior retrieval accuracy. (
Agentic RAG
An advanced RAG architecture where an AI agent autonomously decides which questions to ask, which tools to use, when to retrieve information, and how to aggregate results. Represents a major trend in 2026 for more intelligent and adaptive retrieval systems. (
Approximate Nearest Neighbors (ANN)
Algorithms and techniques for finding nearest neighbors in high-dimensional vector spaces with speed-accuracy trade-offs. ANN methods like HNSW, IVF, and DiskANN enable billion-scale vector search by sacrificing small amounts of recall for massive performance gains over exact search. (
ASMR Technique
Agentic Search and Memory Retrieval technique by Supermemory using parallel reader agents and search agents that achieved ~99% accuracy on LongMemEval benchmark. (
Asymmetric Search
A search paradigm where queries and documents are encoded differently, optimized for scenarios where queries are short and documents are long. Common in information retrieval and modern embedding models designed specifically for search. (

Vector Database Engines

Actian VectorAI DB
Edge-native vector database enabling sub-15ms ANN queries on remote devices without cloud dependency, using efficient disk-based indexing for real-time processing. Supports offline operation with synchronization capabilities, optimized for low-resource environments. Ideal for edge RAG, facial recognition, and IoT recommendations; more compact than Milvus for disconnected setups, edge-focused unlike Qdrant's distributed architecture. (
AlayaDB
Hybrid database-inference engine that converts documents to tensors via LLM forward pass, storing in a KV cache for optimized retrieval. Features integrated storage and inference with advanced indexing for fast context retrieval in RAG pipelines. Suited for LLM applications and semantic search; differs from Milvus by embedding inference, more specialized than Qdrant's pure vector storage. (
BBANN
High-performance out-of-core vector index winner of NeurIPS'21 billion-scale ANN competition, leveraging disk-based structures for massive datasets beyond RAM limits. Employs advanced approximate search algorithms for high QPS on limited hardware. Applicable to large-scale recommendations and search; competitive with DiskANN baseline, outperforms in benchmarks unlike pure in-memory like Qdrant. (

Multimodal Vector Databases

Activeloop Deep Lake
Multi-modal tensor DB for vectors/images/texts/videos with hybrid embedding + metadata/tensor search. Supports multimodal RAG datasets with versioning. Data lake scale vs pure vector stores like Qdrant. (
ApertureDB
Graph-vector DB for multimodal data (images/videos/docs/embeddings) with hybrid vector similarity + graph traversal + metadata/keyword filtering. Enables complex multimodal RAG queries. Combines FAISS vectors with graphs unlike pure vector DBs like Qdrant. (

vector-database-engines

Aerospike
A multi-model AI database designed for high-throughput vector processing at scale, supporting real-time AI use cases with a patented Hybrid Memory Architecture and efficient infrastructure usage, capable of handling large volumes of data and concurrent users. (
AllegroGraph
A database that incorporates neuro-symbolic AI and offers a managed service (AllegroGraph Cloud) for neuro-symbolic AI knowledge graphs, indicating its relevance to advanced AI applications, likely including vector capabilities. (
Amazon Web Services Vector Search
AWS has introduced vector search in several of its managed database services, including OpenSearch, Bedrock, MemoryDB, Neptune, and Amazon Q, making it a comprehensive platform for vector search solutions. (

Showing a sample of 978 resources. View the full list on GitHub →