The analysis of public affairs documents is crucial for citizens as it promotes transparency, accountability, and informed decision-making. It allows citizens to understand government policies, participate in public discourse, and hold representatives accountable. This is crucial, and sometimes a matter of life or death, for companies whose operation depend on certain regulations. Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents by effectively processing and understanding the complex language used in such documents. In this work, we analyze the performance of LLMs in classifying public affairs documents. As a natural multi-label task, the classification of these documents presents important challenges. In this work, we use a regex-powered tool to collect a database of public affairs documents with more than 33K samples and 22.5M tokens. Our experiments assess the performance of 4 different Spanish LLMs to classify up to 30 different topics in the data in different configurations. The results shows that LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.
翻译:公共事务文件的分析对公民至关重要,因为它能促进透明度、问责制和知情决策。它使公民能够理解政府政策、参与公共讨论并追究代表责任。对于运营依赖某些法规的公司而言,这尤为关键,有时甚至关乎生死。大语言模型(LLMs)通过有效处理和理解此类文件中使用的复杂语言,具有显著增强公共事务文件分析的潜力。在本研究中,我们分析了LLMs在公共事务文件分类中的性能。作为一项自然的多标签任务,这些文件的分类面临重要挑战。我们利用基于正则表达式的工具收集了一个包含超过33K个样本和2250万词元的公共事务文件数据库。实验评估了4种不同的西班牙语LLMs在多种配置下对数据中最多30个主题进行分类的性能。结果表明,LLMs在处理公共事务等特定领域文件方面具有重要价值。