Good teachers always tailor their explanations to the learners. Cognitive scientists model this process under the rationality principle: teachers try to maximise the learner's utility while minimising teaching costs. To this end, human teachers seem to build mental models of the learner's internal state, a capacity known as Theory of Mind (ToM). Inspired by cognitive science, we build on Bayesian ToM mechanisms to design teacher agents that, like humans, tailor their teaching strategies to the learners. Our ToM-equipped teachers construct models of learners' internal states from observations and leverage them to select demonstrations that maximise the learners' rewards while minimising teaching costs. Our experiments in simulated environments demonstrate that learners taught this way are more efficient than those taught in a learner-agnostic way. This effect gets stronger when the teacher's model of the learner better aligns with the actual learner's state, either using a more accurate prior or after accumulating observations of the learner's behaviour. This work is a first step towards social machines that teach us and each other, see https://teacher-with-tom.github.io.
翻译:优秀教师总会根据学习者调整自身讲解。认知科学家遵循理性原则对此过程进行建模:教师试图在最小化教学成本的同时最大化学习者的效用。为此,人类教师似乎能够构建学习者内部状态的心理模型,这种能力被称为心智理论(Theory of Mind,ToM)。受认知科学启发,我们基于贝叶斯心智理论机制设计教师智能体,使其像人类一样针对学习者调整教学策略。配备心智理论的教师通过观测构建学习者内部状态模型,并利用这些模型选择既能最大化学习者收益又能最小化教学成本的示范。我们在模拟环境中的实验表明,接受这种教学的学员比接受无差异教学的学员效率更高。当教师对学习者的模型与实际学习者状态更一致时——无论是通过使用更准确的先验知识,还是通过累积更多对学习者行为的观测——这种效果会进一步增强。这项工作向能够教导人类和彼此的社会化机器迈出了第一步,详见https://teacher-with-tom.github.io。