Machine learning models have been trained to predict semantic information about user interfaces (UIs) to make apps more accessible, easier to test, and to automate. Currently, most models rely on datasets that are collected and labeled by human crowd-workers, a process that is costly and surprisingly error-prone for certain tasks. For example, it is possible to guess if a UI element is "tappable" from a screenshot (i.e., based on visual signifiers) or from potentially unreliable metadata (e.g., a view hierarchy), but one way to know for certain is to programmatically tap the UI element and observe the effects. We built the Never-ending UI Learner, an app crawler that automatically installs real apps from a mobile app store and crawls them to discover new and challenging training examples to learn from. The Never-ending UI Learner has crawled for more than 5,000 device-hours, performing over half a million actions on 6,000 apps to train three computer vision models for i) tappability prediction, ii) draggability prediction, and iii) screen similarity.
翻译:机器学习模型已被训练用于预测用户界面(UI)的语义信息,以使应用程序更易于访问、更易测试,并实现自动化。目前,大多数模型依赖于由人类众包工作者收集和标注的数据集,这一过程成本高昂,且对于某些任务而言错误率惊人地高。例如,虽然可以通过截图(即基于视觉指示符)或可能不可靠的元数据(如视图层级结构)猜测某个UI元素是否“可点击”,但确保确知的其中一种方式是程序化地点击该UI元素并观察其效果。我们构建了“无尽UI学习者”(Never-ending UI Learner),一个应用爬虫,能够自动从移动应用商店安装真实应用,并对其进行爬取以发现新的、具有挑战性的训练样本以供学习。该无尽UI学习者已运行超过5000设备小时,在6000个应用上执行了超过50万次操作,用于训练三个计算机视觉模型,分别用于:i) 可点击性预测,ii) 可拖拽性预测,以及 iii) 屏幕相似性。