Extracting semantic representations from mobile user interfaces (UI) and using the representations for designers' decision-making processes have shown the potential to be effective computational design support tools. Current approaches rely on machine learning models trained on small-sized mobile UI datasets to extract semantic vectors and use screenshot-to-screenshot comparison to retrieve similar-looking UIs given query screenshots. However, the usability of these methods is limited because they are often not open-sourced and have complex training pipelines for practitioners to follow, and are unable to perform screenshot set-to-set (i.e., app-to-app) retrieval. To this end, we (1) employ visual models trained with large web-scale images and test whether they could extract a UI representation in a zero-shot way and outperform existing specialized models, and (2) use mathematically founded methods to enable app-to-app retrieval and design consistency analysis. Our experiments show that our methods not only improve upon previous retrieval models but also enable multiple new applications.
翻译:从移动用户界面中提取语义表征并将其用于设计者的决策过程,已展现出成为有效计算设计支持工具的潜力。现有方法依赖于在小规模移动界面数据集上训练的机器学习模型来提取语义向量,并通过截图间对比来检索与查询截图视觉相似的界面。然而,这些方法的可用性受限,原因在于它们通常未开源且训练流程复杂,从业者难以复现,且无法实现截图集合间(即应用间)的检索。为此,我们(1)采用基于大规模网络图像训练的视觉模型,测试其能否以零样本方式提取界面表征,并超越现有专用模型;(2)运用数学基础方法实现应用间检索与设计一致性分析。实验表明,我们的方法不仅改进了现有检索模型,还赋能了多个新应用场景。