We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.
翻译:本文介绍PyThaiNLP,一个以Python实现的自由开源泰语自然语言处理(NLP)库。该库为泰语提供了丰富的软件、模型和数据集。我们首先简要回顾PyThaiNLP开发前泰语语言工具的历史背景,并概述其提供的功能、数据集及预训练语言模型。随后总结其开发里程碑,并分享开发过程中的经验。最后通过工业界与学术界实际应用案例,展示PyThaiNLP在相关工作中的应用。该库可于https://github.com/pythainlp/pythainlp 免费获取。