The olfactory search POMDP (partially observable Markov decision process) is a sequential decision-making problem designed to mimic the task faced by insects searching for a source of odor in turbulence, and its solutions have applications to sniffer robots. As exact solutions are out of reach, the challenge consists in finding the best possible approximate solutions while keeping the computational cost reasonable. We provide a quantitative benchmarking of a solver based on deep reinforcement learning against traditional POMDP approximate solvers. We show that deep reinforcement learning is a competitive alternative to standard methods, in particular to generate lightweight policies suitable for robots.
翻译:嗅觉搜索POMDP(部分可观测马尔可夫决策过程)是一个为模拟昆虫在湍流中寻找气味源而设计的序列决策问题,其解决方案可应用于嗅探机器人。由于精确解难以获得,挑战在于在保持合理计算成本的同时寻找最优近似解。我们针对基于深度强化学习的求解器与传统POMDP近似求解器提供了定量基准对比。研究表明,深度强化学习是标准方法的有力替代方案,尤其适用于生成适合机器人的轻量化策略。