This paper introduces the Word Synchronization Challenge, a novel benchmark to evaluate large language models (LLMs) in Human-Computer Interaction (HCI). This benchmark uses a dynamic game-like framework to test LLMs ability to mimic human cognitive processes through word associations. By simulating complex human interactions, it assesses how LLMs interpret and align with human thought patterns during conversational exchanges, which are essential for effective social partnerships in HCI. Initial findings highlight the influence of model sophistication on performance, offering insights into the models capabilities to engage in meaningful social interactions and adapt behaviors in human-like ways. This research advances the understanding of LLMs potential to replicate or diverge from human cognitive functions, paving the way for more nuanced and empathetic human-machine collaborations.
翻译:本文提出“词语同步挑战”,这是一种评估大型语言模型在人机交互中表现的新型基准。该基准采用动态游戏化框架,通过词语联想测试大型语言模型模拟人类认知过程的能力。通过模拟复杂的人类互动,该基准评估大型语言模型在对话交流中如何解读并与人类思维模式对齐——这对实现有效的人机社交协作至关重要。初步研究结果揭示了模型复杂度对性能的影响,为理解模型参与有意义社交互动及以类人方式调整行为的能力提供了新见解。本研究深化了对大型语言模型复制或偏离人类认知功能潜力的理解,为建立更细腻、更具共情能力的人机协作模式铺平了道路。