In this paper, we propose a new annotation scheme to classify different types of clauses in Terms-and-Conditions contracts with the ultimate goal of supporting legal experts to quickly identify and assess problematic issues in this type of legal documents. To this end, we built a small corpus of Terms-and-Conditions contracts and finalized an annotation scheme of 14 categories, eventually reaching an inter-annotator agreement of 0.92. Then, for 11 of them, we experimented with binary classification tasks using few-shot prompting with a multilingual T5 and two fine-tuned versions of two BERT-based LLMs for Italian. Our experiments showed the feasibility of automatic classification of our categories by reaching accuracies ranging from .79 to .95 on validation tasks.
翻译:本文提出一种新的标注方案,用于对条款与条件合同中的各类条款进行分类,最终目标是辅助法律专家快速识别并评估此类法律文件中存在的问题。为此,我们构建了一个小型条款与条件合同语料库,并最终确定了包含14个类别的标注方案,标注者间一致性达到0.92。随后,针对其中11个类别,我们采用多语言T5的少样本提示学习以及两种基于BERT的意大利语大语言模型的微调版本进行了二元分类实验。实验结果表明,我们提出的分类方案具有可行性,在验证任务中准确率介于0.79至0.95之间。