We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor location. We ask whether navigation strategies to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent plumes. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.
翻译:我们研究了湍流环境中的嗅觉搜索问题。研究对象为仅响应气味刺激、无法获取空间感知或气味位置先验信息的智能体。我们探究能否在序列决策框架下稳健地学习到达目标的导航策略。通过构建包含少量可解释嗅觉状态的强化学习算法,并利用真实湍流气味信号进行训练,我们证明了:引入时间记忆机制后,只需将气味轨迹的两种显著特征离散化为少数嗅觉状态,即可学习在真实气味羽流中的导航能力。性能表现受限于湍流羽流的稀疏特性——存在最优记忆策略:忽略羽流内部的空白片段,而在脱离羽流时激活恢复策略。我们通过让智能体自主习得恢复策略获得了最佳性能,并发现该策略主要表现为逆风搜寻行为,这与飞行昆虫的观测行为高度相似。该最优策略对气味羽流结构的显著变化具有鲁棒性,表明仅需微调少量参数即可适应不同环境。