In this article, we propose the R2GQA system, a Retriever-Reader-Generator Question Answering system, consisting of three main components: Document Retriever, Machine Reader, and Answer Generator. The Retriever module employs advanced information retrieval techniques to extract the context of articles from a dataset of legal regulation documents. The Machine Reader module utilizes state-of-the-art natural language understanding algorithms to comprehend the retrieved documents and extract answers. Finally, the Generator module synthesizes the extracted answers into concise and informative responses to questions of students regarding legal regulations. Furthermore, we built the ViRHE4QA dataset in the domain of university training regulations, comprising 9,758 question-answer pairs with a rigorous construction process. This is the first Vietnamese dataset in the higher regulations domain with various types of answers, both extractive and abstractive. In addition, the R2GQA system is the first system to offer abstractive answers in Vietnamese. This paper discusses the design and implementation of each module within the R2GQA system on the ViRHE4QA dataset, highlighting their functionalities and interactions. Furthermore, we present experimental results demonstrating the effectiveness and utility of the proposed system in supporting the comprehension of students of legal regulations in higher education settings. In general, the R2GQA system and the ViRHE4QA dataset promise to contribute significantly to related research and help students navigate complex legal documents and regulations, empowering them to make informed decisions and adhere to institutional policies effectively. Our dataset is available for research purposes.
翻译:本文提出了R2GQA系统,一个由三个主要组件构成的检索器-阅读器-生成器问答系统:文档检索器、机器阅读器和答案生成器。检索器模块采用先进的信息检索技术从法规文档数据集中提取相关条款的上下文。机器阅读器模块利用最先进的自然语言理解算法来理解检索到的文档并提取答案。最后,生成器模块将提取的答案合成为简洁且信息丰富的回答,以回应学生关于法规的提问。此外,我们在大学培训法规领域构建了ViRHE4QA数据集,包含9,758个问答对,并遵循严格的构建流程。这是高等教育法规领域的首个越南语数据集,涵盖了抽取式和生成式等多种答案类型。同时,R2GQA系统也是首个提供越南语生成式答案的系统。本文讨论了R2GQA系统各模块在ViRHE4QA数据集上的设计与实现,重点阐述了它们的功能与交互。此外,我们展示了实验结果,证明了所提系统在支持学生理解高等教育环境中的法规方面的有效性和实用性。总体而言,R2GQA系统和ViRHE4QA数据集有望为相关研究做出重要贡献,帮助学生理解复杂的法律文件和法规,使他们能够做出明智决策并有效遵守机构政策。我们的数据集可供研究使用。