We propose and evaluate an automated pipeline for discovering significant topics from legal decision texts by passing features synthesized with topic models through penalised regressions and post-selection significance tests. The method identifies case topics significantly correlated with outcomes, topic-word distributions which can be manually-interpreted to gain insights about significant topics, and case-topic weights which can be used to identify representative cases for each topic. We demonstrate the method on a new dataset of domain name disputes and a canonical dataset of European Court of Human Rights violation cases. Topic models based on latent semantic analysis as well as language model embeddings are evaluated. We show that topics derived by the pipeline are consistent with legal doctrines in both areas and can be useful in other related legal analysis tasks.
翻译:本文提出并评估了一种自动化流程,用于从法律判决文本中发现重要主题。该流程通过将主题模型合成的特征输入惩罚回归和后选择显著性检验来实现。该方法能够识别与判决结果显著相关的案件主题、可供人工解读以获取重要主题洞见的主题-词分布,以及可用于识别每个主题代表性案例的案件-主题权重。我们在域名争议新数据集和欧洲人权法院侵权案件的经典数据集上验证了该方法。评估了基于潜在语义分析和语言模型嵌入的主题模型。实验表明,该流程提取的主题与两个领域的法律原则一致,并可应用于其他相关法律分析任务。