Joint planning through language-based interactions is a key area of human-AI teaming. Planning problems in the open world often involve various aspects of incomplete information and unknowns, e.g., objects involved, human goals/intents -- thus leading to knowledge gaps in joint planning. We consider the problem of discovering optimal interaction strategies for AI agents to actively elicit human inputs in object-driven planning. To this end, we propose Minimal Information Neuro-Symbolic Tree (MINT) to reason about the impact of knowledge gaps and leverage self-play with MINT to optimize the AI agent's elicitation strategies and queries. More precisely, MINT builds a symbolic tree by making propositions of possible human-AI interactions and by consulting a neural planning policy to estimate the uncertainty in planning outcomes caused by remaining knowledge gaps. Finally, we leverage LLM to search and summarize MINT's reasoning process and curate a set of queries to optimally elicit human inputs for best planning performance. By considering a family of extended Markov decision processes with knowledge gaps, we analyze the return guarantee for a given MINT with active human elicitation. Our evaluation on three benchmarks involving unseen/unknown objects of increasing realism shows that MINT-based planning attains near-expert returns by issuing a limited number of questions per task while achieving significantly improved rewards and success rates.
翻译:基于语言交互的联合规划是人机协同的关键领域。开放世界中的规划问题通常涉及多方面的不完整信息与未知因素,例如相关对象、人类目标/意图——这导致联合规划中存在知识缺口。我们研究如何为AI智能体发现最优交互策略,以在目标驱动规划中主动获取人类输入。为此,我们提出最小信息神经符号树(MINT),通过构建符号化交互可能性命题树,并调用神经规划策略评估剩余知识缺口导致的规划结果不确定性,从而推理知识缺口的影响。进一步,我们利用MINT的自博弈机制优化智能体的信息获取策略与查询生成。最后,我们借助大语言模型搜索并总结MINT的推理过程,精心设计一组查询以最优方式获取人类输入,从而实现最佳规划性能。通过构建具有知识缺口的扩展马尔可夫决策过程族,我们分析了基于MINT的主动人类信息获取策略的回报保证。在三个涉及不同真实度未知/未见对象的基准测试中,评估结果表明:基于MINT的规划方法通过每任务仅提出有限数量问题,即可获得接近专家水平的回报,同时显著提升奖励获取成功率。