This study introduces De-DSI, a novel framework that fuses large language models (LLMs) with genuine decentralization for information retrieval, particularly employing the differentiable search index (DSI) concept in a decentralized setting. Focused on efficiently connecting novel user queries with document identifiers without direct document access, De-DSI operates solely on query-docid pairs. To enhance scalability, an ensemble of DSI models is introduced, where the dataset is partitioned into smaller shards for individual model training. This approach not only maintains accuracy by reducing the number of data each model needs to handle but also facilitates scalability by aggregating outcomes from multiple models. This aggregation uses a beam search to identify top docids and applies a softmax function for score normalization, selecting documents with the highest scores for retrieval. The decentralized implementation demonstrates that retrieval success is comparable to centralized methods, with the added benefit of the possibility of distributing computational complexity across the network. This setup also allows for the retrieval of multimedia items through magnet links, eliminating the need for platforms or intermediaries.
翻译:本研究提出De-DSI,一种将大语言模型(LLMs)与真正去中心化机制相结合的新型信息检索框架,特别在去中心化场景中应用可微分搜索索引(DSI)概念。该框架专注于在不直接访问文档的情况下,高效连接用户查询与文档标识符,仅基于查询-docid对进行运作。为提升可扩展性,引入DSI模型集成方法,将数据集划分为多个较小分片以供独立模型训练。该策略不仅通过减少各模型需处理的数据量来保持准确性,还通过聚合多个模型的输出结果实现扩展性增强。该聚合过程采用束搜索识别最优docid,并应用softmax函数进行分数归一化,最终选择得分最高的文档进行检索。去中心化实现表明,其检索成功率与中心化方法相当,且额外具备将计算复杂度分布至网络各节点的可能性。该架构还支持通过磁力链接检索多媒体项目,无需依赖平台或中介。