This paper introduces KSW, a Khmer-specific approach to keyword extraction that leverages a specialized stop word dictionary. Due to the limited availability of natural language processing resources for the Khmer language, effective keyword extraction has been a significant challenge. KSW addresses this by developing a tailored stop word dictionary and implementing a preprocessing methodology to remove stop words, thereby enhancing the extraction of meaningful keywords. Our experiments demonstrate that KSW achieves substantial improvements in accuracy and relevance compared to previous methods, highlighting its potential to advance Khmer text processing and information retrieval. The KSW resources, including the stop word dictionary, are available at the following GitHub repository: (https://github.com/back-kh/KSWv2-Khmer-Stop-Word-based-Dictionary-for-Keyword-Extraction.git).
翻译:本文介绍了一种针对高棉语的关键词提取方法KSW,该方法利用专门的停用词词典。由于高棉语自然语言处理资源有限,有效的关键词提取一直是一个重大挑战。KSW通过开发定制化的停用词词典并实施去除停用词的预处理方法,从而提升有意义关键词的提取效果。实验表明,与现有方法相比,KSW在准确性和相关性方面取得了显著提升,凸显了其在推动高棉语文本处理与信息检索方面的潜力。KSW相关资源(包括停用词词典)已在以下GitHub仓库中公开:(https://github.com/back-kh/KSWv2-Khmer-Stop-Word-based-Dictionary-for-Keyword-Extraction.git)。