We introduce CogME, a cognition-inspired, multi-dimensional evaluation metric designed for AI models focusing on story understanding. CogME is a framework grounded in human thinking strategies and story elements that involve story understanding. With a specific breakdown of the questions, this approach provides a nuanced assessment revealing not only AI models' particular strengths and weaknesses but also the characteristics of the benchmark dataset. Our case study with the DramaQA dataset demonstrates a refined analysis of the model and the benchmark dataset. We argue the need for metrics based on understanding the nature of tasks and designed to align closely with human cognitive processes. This approach provides insights beyond traditional overall scores and paves the way for more sophisticated AI development targeting higher cognitive functions.
翻译:我们提出了CogME——一种面向故事理解的AI模型、受人类认知启发的多维度评估指标。该框架以人类思维策略及故事理解所涉及的故事要素为理论基础,通过对问题的具体分解,实现了精细化的评估,不仅揭示了AI模型的特定优势与不足,还反映了基准数据集的特征。基于DramaQA数据集的案例研究展示了该指标对模型及基准数据集的精细分析能力。我们认为,需要构建基于任务本质理解、且与人类认知过程紧密对齐的评估指标。该方法超越了传统整体得分的局限性,为开发面向更高认知功能的复杂AI技术提供了新的思路。