This paper introduces the Word Synchronization Challenge, a novel benchmark to evaluate large language models (LLMs) in Human-Computer Interaction (HCI). This benchmark uses a dynamic game-like framework to test LLMs ability to mimic human cognitive processes through word associations. By simulating complex human interactions, it assesses how LLMs interpret and align with human thought patterns during conversational exchanges, which are essential for effective social partnerships in HCI. Initial findings highlight the influence of model sophistication on performance, offering insights into the models capabilities to engage in meaningful social interactions and adapt behaviors in human-like ways. This research advances the understanding of LLMs potential to replicate or diverge from human cognitive functions, paving the way for more nuanced and empathetic human-machine collaborations.
翻译:本文提出词语同步挑战,这是一种用于评估大型语言模型在人机交互中表现的新型基准。该基准采用动态游戏化框架,通过词语联想测试LLM模拟人类认知过程的能力。通过模拟复杂的人类交互,该基准评估LLM在对话交流中如何解释并与人类思维模式对齐——这对实现有效的人机社会协作至关重要。初步研究结果揭示了模型复杂度对性能的影响,为理解模型参与有意义社会交互及以类人方式调整行为的能力提供了见解。本研究深化了对LLM复制或偏离人类认知功能潜力的理解,为更精细、更具共情力的人机协作铺平了道路。