A Comparative Study on Reward Models for UI Adaptation with Reinforcement Learning

Adapting the User Interface (UI) of software systems to user requirements and the context of use is challenging. The main difficulty consists of suggesting the right adaptation at the right time in the right place in order to make it valuable for end-users. We believe that recent progress in Machine Learning techniques provides useful ways in which to support adaptation more effectively. In particular, Reinforcement learning (RL) can be used to personalise interfaces for each context of use in order to improve the user experience (UX). However, determining the reward of each adaptation alternative is a challenge in RL for UI adaptation. Recent research has explored the use of reward models to address this challenge, but there is currently no empirical evidence on this type of model. In this paper, we propose a confirmatory study design that aims to investigate the effectiveness of two different approaches for the generation of reward models in the context of UI adaptation using RL: (1) by employing a reward model derived exclusively from predictive Human-Computer Interaction (HCI) models (HCI), and (2) by employing predictive HCI models augmented by Human Feedback (HCI&HF). The controlled experiment will use an AB/BA crossover design with two treatments: HCI and HCI&HF. We shall determine how the manipulation of these two treatments will affect the UX when interacting with adaptive user interfaces (AUI). The UX will be measured in terms of user engagement and user satisfaction, which will be operationalized by means of predictive HCI models and the Questionnaire for User Interaction Satisfaction (QUIS), respectively. By comparing the performance of two reward models in terms of their ability to adapt to user preferences with the purpose of improving the UX, our study contributes to the understanding of how reward modelling can facilitate UI adaptation using RL.

翻译：将软件系统的用户界面（UI）自适应地满足用户需求和使用场景具有挑战性。其主要困难在于在正确的时间和地点提出恰当的自适应建议，以使其对最终用户具有价值。我们认为，近年来机器学习技术的进步为更有效地支持自适应提供了有用途径。特别是，强化学习（RL）可用于根据每个使用场景个性化用户界面，从而改善用户体验（UX）。然而，在RL用于UI自适应时，确定每个自适应备选方案的奖励是一个挑战。近期研究探索了使用奖励模型来应对这一挑战，但目前尚无关于此类模型的实证证据。在本文中，我们提出了一项验证性研究设计，旨在探讨两种不同奖励模型生成方法在基于RL的UI自适应中的有效性：（1）采用完全源自预测性人机交互（HCI）模型的奖励模型（HCI），以及（2）采用由人类反馈增强的预测性HCI模型（HCI&HF）。受控实验将采用AB/BA交叉设计，包含两种处理条件：HCI和HCI&HF。我们将确定这两种处理条件的调控如何影响与自适应用户界面（AUI）交互时的用户体验。用户体验将通过用户参与度和用户满意度来衡量，分别通过预测性HCI模型和用户交互满意度问卷（QUIS）进行操作化。通过比较两种奖励模型在适应以改善用户体验的用户偏好方面的性能，本研究有助于理解奖励建模如何利用强化学习促进UI自适应。