Information seeking on mobile devices is often fragmented, trapping users in repetitive cycles of context switching and data re-entry, which increases cognitive load and disrupts workflow. Existing mobile agents provide limited cross-source integration and are largely opaque, presenting progress as a linear feed with few opportunities to intervene, steer, or take control. We present DroidRetriever, a transparent, steerable system for cross-source mobile information seeking. It accepts voice or typed input and the multi-LLM system decomposes the task, navigates to target pages, takes screenshots, and synthesizes a concise report with citation-linked screenshots. We make the process transparent through a progress dashboard combining sub-task progress and real-time exploration maps for seamless takeover. DroidRetriever also pauses on detected privacy or high-risk screens and prompts intervention. Across 35 tasks over 24 apps, experiments and user studies demonstrate improvements in coverage, transparency, and reduced workload. We release our code at https://github.com/AkimotoAyako/DroidRetriever.
翻译:移动设备上的信息检索通常碎片化,用户陷入重复的上下文切换和数据重新输入循环,增加了认知负荷并中断工作流。现有移动代理的跨源集成能力有限且大多不透明,仅以线性序列方式呈现进展,几乎没有干预、引导或接管的机会。我们提出DroidRetriever,一个透明、可操控的跨源移动信息检索系统。该系统接受语音或文字输入,通过多大型语言模型(multi-LLM)系统分解任务、导航至目标页面、截取屏幕截图,并合成附带引用链接截图的简洁报告。我们通过结合子任务进度与实时探索地图的进度面板实现过程透明化,支持无缝接管。DroidRetriever还会在检测到隐私或高风险屏幕时暂停并提示干预。在覆盖24个应用的35项任务中,实验与用户研究表明该方法在覆盖率、透明度及降低工作负荷方面均有提升。我们于https://github.com/AkimotoAyako/DroidRetriever开源代码。