Earth observing satellites are powerful tools for collecting scientific information about our planet, however they have limitations: they cannot easily deviate from their orbital trajectories, their sensors have a limited field of view, and pointing and operating these sensors can take a large amount of the spacecraft's resources. It is important for these satellites to optimize the data they collect and include only the most important or informative measurements. Dynamic targeting is an emerging concept in which satellite resources and data from a lookahead instrument are used to intelligently reconfigure and point a primary instrument. Simulation studies have shown that dynamic targeting increases the amount of scientific information gathered versus conventional sampling strategies. In this work, we present two different learning-based approaches to dynamic targeting, using reinforcement and imitation learning, respectively. These learning methods build on a dynamic programming solution to plan a sequence of sampling locations. We evaluate our approaches against existing heuristic methods for dynamic targeting, showing the benefits of using learning for this application. Imitation learning performs on average 10.0\% better than the best heuristic method, while reinforcement learning performs on average 13.7\% better. We also show that both learning methods can be trained effectively with small amounts of data.
翻译:对地观测卫星是收集地球科学信息的强大工具,但其存在固有局限:难以轻易偏离轨道轨迹,传感器视场有限,且传感器指向与操作会消耗大量星上资源。因此,卫星需优化数据采集过程,仅纳入最重要或信息量最大的观测。动态指向是一种新兴概念,其利用卫星资源及前瞻仪器的数据,智能地重新配置并指向主载荷。仿真研究表明,与传统采样策略相比,动态指向能显著提升获取的科学信息量。本研究提出两种基于学习的动态指向方法,分别采用强化学习与模仿学习技术。这些学习方法建立在动态规划求解采样位置序列的基础上。我们通过对比现有动态指向启发式方法评估所提方案,证明了学习技术在此应用中的优势。模仿学习的平均性能较最佳启发式方法提升10.0%,强化学习则平均提升13.7%。同时研究表明,两种学习方法仅需少量数据即可实现有效训练。