In this paper, we unveil a groundbreaking method to amplify full-text search lemmatization, utilizing the OpenCorpora dataset and a bespoke paradigm retrieval algorithm. Our primary aim is to streamline the extraction of a word's primary form or lemma - a crucial factor in full-text search. Additionally, we propose a compact dictionary storage strategy, significantly boosting the speed and precision of lemma retrieval.
翻译:本文提出了一种突破性方法,通过利用OpenCorpora数据集与定制化范式检索算法,显著增强全文本搜索的词形还原能力。核心目标在于简化单词基本形式(即词元)的提取流程——这是全文本搜索的关键环节。此外,我们提出了一种紧凑型词典存储策略,大幅提升了词元检索的速度与精度。