SEArch: Optimistic Policy Selection Between Scene Noise and Drift for UAV Radar Search

Unmanned Aerial Vehicles (UAVs) equipped with radar sensors are deployed for target search missions in diverse environments, where targets exhibit characteristic signatures (e.g., respiration micro-motion in human search) detectable through occlusions. A fundamental challenge arises from shifts in radar statistics as the UAV moves through a dynamic and potentially non-stationary environment, rendering any fixed signal-processing strategy suboptimal; yet perception and adaptation must run onboard a resource-constrained aerial node in real time. Since no single detector performs well across all conditions, we adopt a multi-policy paradigm and formulate UAV target search as an online policy selection problem over a library of specialized detectors, with performance measured by regret, the cumulative loss gap relative to the best policy in each scene. The setting couples in-scene stochastic noise with inter-scene shifts. Whereas prior methods capture only one regime, we account for both through the Stochastically Extended Adversary (SEA) framework, without requiring oracle knowledge of scene dynamics. Because adaptation must run at the UAV, we instantiate SEA through \textsc{SEArch}, a lightweight optimistic Follow the Regularized Leader (OFTRL) selector with an adaptive learning rate, achieving regret $O(\barσ_T \sqrt{T} + \sqrt{J})$, where $\barσ_T$ captures radar measurement noise and $J$ is the number of scene transitions over the mission horizon $T$. To enable rapid adaptation under frequent scene changes, we further introduce \textsc{W-SEArch}, a windowed variant that restarts every $w$ rounds and achieves regret $O(\barσ_I \sqrt{w})$ under at most one transition per window. Experiments show up to 30\% regret reduction compared to non-adaptive baselines across a range of non-stationary settings.

翻译：装备雷达传感器的无人机被部署于多样化环境中执行目标搜索任务，这些目标会表现出可通过遮挡物检测的特征信号（例如人体搜索中的呼吸微动）。当无人机在动态且可能非平稳的环境中移动时，雷达统计特性会发生偏移，这一基础性挑战导致任何固定信号处理策略均非最优；然而感知与自适应过程必须在资源受限的机载节点上实时运行。由于没有单一检测器能在所有条件下表现良好，我们采用多策略范式，将无人机目标搜索建模为在线策略选择问题——从专用检测器库中选取策略，并以遗憾值（即相对于各场景最优策略的累积损失差距）衡量性能。该场景同时包含场景内随机噪声与场景间漂移。虽然现有方法仅能处理单一机制，但我们在随机拓展对手(SEA)框架中同时兼顾两者，且无需知晓场景动态的先验信息。由于自适应过程需在无人机端运行，我们通过\textsc{SEArch}实现SEA框架——这是一种轻量级乐观跟随正则化领导者(OFTRL)选择器，配备自适应学习率，可获得$O(\barσ_T \sqrt{T} + \sqrt{J})$的遗憾界，其中$\barσ_T$表征雷达测量噪声，$J$为任务周期$T$内的场景转移次数。为适应频繁场景变化下的快速调整，我们进一步提出\textsc{W-SEArch}窗口化变体，该变体每$w$轮重置一次，并在每个窗口内至多发生一次转移时获得$O(\barσ_I \sqrt{w})$的遗憾界。实验表明，在多种非平稳场景中，与非自适应基准相比，该方法可降低高达30%的遗憾值。