The proliferation of complex, multimodal datasets has exposed a critical gap between the capabilities of specialized vector databases and traditional graph databases. While vector databases excel at semantic similarity search, they lack the capacity for deep relational querying. Conversely, graph databases master complex traversals but are not natively optimized for high-dimensional vector search. This paper introduces the Hybrid Multimodal Graph Index (HMGI), a novel framework designed to bridge this gap by creating a unified system for efficient, hybrid queries on multimodal data. HMGI leverages the native graph database architecture and integrated vector search capabilities, exemplified by platforms like Neo4j, to combine Approximate Nearest Neighbor Search (ANNS) with expressive graph traversal queries. Key innovations of the HMGI framework include modality-aware partitioning of embeddings to optimize index structure and query performance, and a system for adaptive, low-overhead index updates to support dynamic data ingestion, drawing inspiration from the architectural principles of systems like TigerVector. By integrating semantic similarity search directly with relational context, HMGI aims to outperform pure vector databases like Milvus in complex, relationship-heavy query scenarios and achieve sub-linear query times for hybrid tasks.
翻译:随着复杂多模态数据集的激增,专用向量数据库与传统图数据库之间的能力鸿沟日益凸显。向量数据库擅长语义相似性搜索,但缺乏深度关系查询能力;反之,图数据库精于复杂路径遍历,却未原生优化高维向量搜索。本文提出混合多模态图索引(HMGI),该创新框架通过构建统一系统实现对多模态数据的高效混合查询,从而弥合这一鸿沟。HMGI依托Neo4j等平台所代表的原生图数据库架构与集成向量搜索能力,将近似最近邻搜索(ANNS)与富有表现力的图遍历查询相结合。该框架的核心创新包括:通过模态感知的嵌入分区优化索引结构与查询性能,以及受TigerVector等系统架构原理启发、支持动态数据摄入的自适应低开销索引更新机制。通过将语义相似性搜索直接融入关系上下文,HMGI旨在复杂关系密集型查询场景中超越Milvus等纯向量数据库,并为混合查询任务实现亚线性查询时间。