A Comparative Study on Reward Models for UI Adaptation with Reinforcement Learning

Adapting the User Interface (UI) of software systems to user requirements and the context of use is challenging. The main difficulty consists of suggesting the right adaptation at the right time in the right place in order to make it valuable for end-users. We believe that recent progress in Machine Learning techniques provides useful ways in which to support adaptation more effectively. In particular, Reinforcement learning (RL) can be used to personalise interfaces for each context of use in order to improve the user experience (UX). However, determining the reward of each adaptation alternative is a challenge in RL for UI adaptation. Recent research has explored the use of reward models to address this challenge, but there is currently no empirical evidence on this type of model. In this paper, we propose a confirmatory study design that aims to investigate the effectiveness of two different approaches for the generation of reward models in the context of UI adaptation using RL: (1) by employing a reward model derived exclusively from predictive Human-Computer Interaction (HCI) models (HCI), and (2) by employing predictive HCI models augmented by Human Feedback (HCI&HF). The controlled experiment will use an AB/BA crossover design with two treatments: HCI and HCI&HF. We shall determine how the manipulation of these two treatments will affect the UX when interacting with adaptive user interfaces (AUI). The UX will be measured in terms of user engagement and user satisfaction, which will be operationalized by means of predictive HCI models and the Questionnaire for User Interaction Satisfaction (QUIS), respectively. By comparing the performance of two reward models in terms of their ability to adapt to user preferences with the purpose of improving the UX, our study contributes to the understanding of how reward modelling can facilitate UI adaptation using RL.

翻译：软件系统的用户界面（UI）根据用户需求和使用情境进行自适应具有挑战性。其主要难点在于在正确的时间、正确的位置提出正确的自适应方案，以使其对最终用户具有价值。我们认为，机器学习技术的最新进展为更有效地支持自适应提供了有效途径。特别是，强化学习（RL）可用于针对每种使用情境个性化界面，以改善用户体验（UX）。然而，在UI自适应的强化学习中，确定每种自适应方案的奖励是一个挑战。近期研究探索了使用奖励模型来应对这一挑战，但目前尚无关于此类模型的实证证据。本文提出了一项验证性研究设计，旨在调查在基于强化学习的UI自适应场景下，两种不同奖励模型生成方法的有效性：（1）采用完全源自预测性人机交互（HCI）模型的奖励模型（HCI），以及（2）采用由人类反馈增强的预测性HCI模型（HCI&HF）。该受控实验将采用AB/BA交叉设计，包含两种处理方式：HCI和HCI&HF。我们将确定这两种处理方式的操控如何影响用户与自适应用户界面（AUI）交互时的用户体验。用户体验将通过用户参与度和用户满意度来衡量，其中用户参与度通过预测性HCI模型实现操作化，用户满意度则通过用户交互满意度问卷（QUIS）进行量化。通过比较两种奖励模型在适应以提升用户体验的用户偏好方面的性能，本研究有助于理解奖励建模如何促进基于强化学习的用户界面自适应。