From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.
翻译:从视觉感知角度而言,现代图形用户界面(GUI)包含文本、图像以及按钮、菜单等交互对象构成的复杂富图形二维视觉空间布局。现有模型虽能准确预测“平均而言”容易吸引注意力的区域和对象,但目前尚无能够预测个体扫描路径的模型。为填补这一空白,我们提出眼形器(EyeFormer),该模型采用Transformer架构作为策略网络,引导深度强化学习算法控制注视点位置。当提供少量用户扫描路径样本时,该模型具备生成个性化预测的独特能力。它能预测完整的扫描路径信息(包括注视位置与注视时长),适用于不同个体和多种刺激类型。此外,我们还展示了该模型在GUI布局优化中的应用。相关软件与模型将开源发布。