Automatic readability assessment plays a key role in ensuring effective and accessible written communication. Despite significant progress, the field is hindered by inconsistent definitions of readability and measurements that rely on surface-level text properties. In this work, we investigate the factors shaping human perceptions of readability through the analysis of 897 judgments, finding that, beyond surface-level cues, information content and topic strongly shape text comprehensibility. Furthermore, we evaluate 15 popular readability metrics across five English datasets, contrasting them with six more nuanced, model-based metrics. Our results show that four model-based metrics consistently place among the top four in rank correlations with human judgments, while the best performing traditional metric achieves an average rank of 8.6. These findings highlight a mismatch between current readability metrics and human perceptions, pointing to model-based approaches as a more promising direction.
翻译:自动可读性评估在确保书面交流的有效性与可及性方面发挥着关键作用。尽管取得了显著进展,该领域仍受限于可读性定义的不一致性以及依赖表层文本属性的测量方法。在本研究中,我们通过分析897项人工评判,探究了影响人类可读性感知的因素,发现除了表层线索外,信息内容和主题对文本可理解性具有重要影响。此外,我们在五个英文数据集上评估了15种常用可读性指标,并将其与六种更精细的基于模型的指标进行对比。结果表明:四种基于模型的指标在与人工评判的秩相关性中始终位列前四,而表现最佳的传统指标平均排名仅为8.6。这些发现揭示了当前可读性指标与人类感知之间的错配,表明基于模型的方法更具发展前景。