The performance and generalization of foundation models for interactive systems critically depend on the availability of large-scale, realistic training data. While recent advances in large language models (LLMs) have improved GUI understanding, progress in desktop automation remains constrained by the scarcity of high-quality, publicly available desktop interaction data, particularly for macOS. We introduce GUIRILLA, a scalable data crawling framework for automated exploration of desktop GUIs. GUIRILLA is not an autonomous agent; instead, it systematically collects realistic interaction traces and accessibility metadata intended to support the training, evaluation, and stabilization of downstream foundation models and GUI agents. The framework targets macOS, a largely underrepresented platform in existing resources, and organizes explored interfaces into hierarchical MacApp Trees derived from accessibility states and user actions. As part of this work, we release these MacApp Trees as a reusable structural representation of macOS applications, enabling downstream analysis, retrieval, testing, and future agent training. We additionally release macapptree, an open-source library for reproducible accessibility-driven GUI data collection, along with the full framework implementation to support open research in desktop autonomy.
翻译:交互式系统基础模型的性能与泛化能力关键取决于大规模、真实训练数据的可用性。尽管近期大型语言模型的进展提升了GUI理解能力,但桌面自动化领域仍受限于高质量公共桌面交互数据(特别是macOS平台)的匮乏。本文提出GUIRILLA——一种用于桌面GUI自动化探索的可扩展数据抓取框架。该框架并非自主代理,而是系统性地收集真实交互轨迹与无障碍元数据,旨在支撑下游基础模型与GUI代理的训练、评估及稳定性优化。GUIRILLA聚焦于现有资源中严重不足的macOS平台,并将探索所得界面组织成基于无障碍状态与用户操作生成的层级化MacApp树。作为成果,我们发布这些MacApp树作为macOS应用的可重用结构表征,支持下游分析、检索、测试及未来代理训练。此外,我们还开源了macapptree库——一个基于无障碍驱动GUI数据采集的可复现库,并提供完整框架实现以支持桌面自主化的开放研究。