One of the fundamental cognitive abilities of humans is to quickly resolve uncertainty by generating hypotheses and testing them via active trials. Encountering a novel phenomenon accompanied by ambiguous cause-effect relationships, humans make hypotheses against data, conduct inferences from observation, test their theory via experimentation, and correct the proposition if inconsistency arises. These iterative processes persist until the underlying mechanism becomes clear. In this work, we devise the IVRE (pronounced as "ivory") environment for evaluating artificial agents' reasoning ability under uncertainty. IVRE is an interactive environment featuring rich scenarios centered around Blicket detection. Agents in IVRE are placed into environments with various ambiguous action-effect pairs and asked to determine each object's role. They are encouraged to propose effective and efficient experiments to validate their hypotheses based on observations and actively gather new information. The game ends when all uncertainties are resolved or the maximum number of trials is consumed. By evaluating modern artificial agents in IVRE, we notice a clear failure of today's learning methods compared to humans. Such inefficacy in interactive reasoning ability under uncertainty calls for future research in building human-like intelligence.
翻译:人类的一项基本认知能力是通过生成假设并通过主动试验来快速解决不确定性。当遇到伴随模糊因果关系的全新现象时,人类会针对数据提出假设、从观察中进行推理、通过实验检验理论,并在出现矛盾时修正命题。这一迭代过程将持续进行,直至潜在机制变得清晰。在这项工作中,我们设计了IVRE(发音为"ivory")环境,用于评估人工智能体在不确定性下的推理能力。IVRE是一个交互式环境,包含围绕Blicket检测的丰富场景。放入IVRE的智能体面临各种模糊的动作-效果对,需要确定每个对象的作用。我们鼓励智能体基于观察提出有效且高效的实验来验证假设,并主动收集新信息。游戏在所有不确定性得到解决或达到最大试验次数时结束。通过在现代人工智能体上评估IVRE,我们注意到,与人类相比,当前的学习方法存在明显失败。这种交互式推理能力在不确定性下的低效性,呼唤未来构建类人智能的研究。