In this study, we present a novel and challenging multilabel Vietnamese dataset (RMDM) designed to assess the performance of large language models (LLMs), in verifying electronic information related to legal contexts, focusing on fake news as potential input for electronic evidence. The RMDM dataset comprises four labels: real, mis, dis, and mal, representing real information, misinformation, disinformation, and mal-information, respectively. By including these diverse labels, RMDM captures the complexities of differing fake news categories and offers insights into the abilities of different language models to handle various types of information that could be part of electronic evidence. The dataset consists of a total of 1,556 samples, with 389 samples for each label. Preliminary tests on the dataset using GPT-based and BERT-based models reveal variations in the models' performance across different labels, indicating that the dataset effectively challenges the ability of various language models to verify the authenticity of such information. Our findings suggest that verifying electronic information related to legal contexts, including fake news, remains a difficult problem for language models, warranting further attention from the research community to advance toward more reliable AI models for potential legal applications.
翻译:本研究提出了一个新颖且具有挑战性的多标签越南语数据集(RMDM),旨在评估大语言模型在验证与法律语境相关的电子信息时的性能,重点关注可作为电子证据输入的虚假新闻。RMDM数据集包含四个标签:真实信息、错误信息、虚假信息和恶意信息,分别对应真实信息、误传信息、虚假信息及恶意信息。通过纳入这些多样化的标签,RMDM捕捉了不同虚假新闻类别的复杂性,并揭示了不同语言模型处理可能构成电子证据的各类信息的能力。该数据集共有1556个样本,每个标签包含389个样本。基于GPT和BERT模型的初步测试显示,不同标签下模型性能存在差异,表明该数据集有效挑战了各类语言模型验证此类信息真实性的能力。我们的研究结果表明,包括虚假新闻在内的法律语境电子信息的验证对语言模型而言仍是一个难题,值得研究界进一步关注,以推动更可靠的人工智能模型在潜在法律场景中的应用。