Exact similarity search over large collections of data series is a fundamental operation in modern applications, yet existing solutions are often fragmented, specialized, or tailored to specific execution environments. In this paper, we present DaiSy, a unified library for exact data series similarity search that integrates multiple state-of-the-art algorithms within a single, coherent framework. DaiSy is the first library to support exact similarity search across diverse execution environments, including implementations for disk-based, in-memory, GPU-accelerated, and distributed scalable similarity search. Although designed for data series, DaiSy is also directly applicable to exact similarity search over vector data, enabling its use in a broader range of applications. The library supports interfaces in both C++ and Python, enabling users to easily integrate its functionality into a variety of tasks. DaiSy is open-sourced and available at: https://github.com/MChatzakis/DaiSy.
翻译:大规模数据序列集合的精确相似性搜索是现代应用中的基本操作,然而现有解决方案往往零散、专精或针对特定执行环境定制。本文提出DaiSy,一个统一的数据序列精确相似性搜索库,将多种最先进算法集成至单一连贯框架中。DaiSy是首个支持跨异构执行环境(包括磁盘、内存、GPU加速及分布式可扩展相似性搜索实现)进行精确相似性搜索的库。尽管专为数据序列设计,DaiSy同样可直接应用于向量数据的精确相似性搜索,从而扩展其适用场景。该库提供C++与Python接口,使用户能便捷地将功能集成至各类任务中。DaiSys已开源,访问地址:https://github.com/MChatzakis/DaiSy。