Natural language conveys information at varying levels of granularity, from fine-grained references to broad descriptions. While granularity is fundamental to human communication, existing measures mostly capture surface detail or sentence specificity. We introduce Granuscore, a reference-free measure of granularity that leverages structural properties of a hierarchical embedding space. Granuscore reliably recovers hierarchical orderings on the Granola-EQ dataset and captures expected differences in granularity across discourse contexts. Across domains, we further show that Granuscore explains non-linear variation in sentence specificity beyond sentence length. Finally, we apply Granuscore to four question-answering benchmarks and analyze how granularity differs for questions, gold answers, and model outputs across response outcomes. The analysis reveals consistent differences in model behavior and provides a principled lens for characterizing the difficulty of QA datasets. Together, the results position Granuscore as a scalable, broadly applicable tool for analyzing granularity in text.
翻译:自然语言以不同的粒度级别传递信息,从细粒度指代到宽泛描述。尽管粒度是人际交流的基础,但现有度量主要捕捉表面细节或句子的具体性。我们提出了Granuscore,一种利用层级嵌入空间结构特性的无参考粒度度量。Granuscore能够在Granola-EQ数据集上可靠恢复层级排序,并捕捉不同语篇上下文中粒度的预期差异。跨领域分析进一步表明,Granuscore可解释句子在长度之外的句间具体性非线性变化。最后,我们将Granuscore应用于四个问答基准,分析问题、标准答案以及不同响应结果下模型输出的粒度差异。该分析揭示了模型行为中一致性的差异,并为刻画问答数据集的难度提供了原理性视角。综合结果使Granuscore成为可扩展、广泛适用的文本粒度分析工具。