Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by integrating user feedback into the agent's training process. This paper introduces a framework that guides agent training through implicit neural signals, with a focus on the neural classification problem. Our work presents and releases a novel dataset of functional near-infrared spectroscopy (fNIRS) recordings collected from 25 human participants across three domains: Pick-and-Place Robot, Lunar Lander, and Flappy Bird. We train multiple classifiers to predict varying levels of agent performance (optimal, suboptimal, or worst-case) from windows of preprocessed fNIRS features, achieving an average F1 score of 67% for binary and 46% for multi-class classification across conditions and domains. We also train multiple regressors to predict the degree of deviation between an agent's chosen action and a set of near-optimal policy actions, providing a continuous measure of performance. Finally, we evaluate cross-subject generalization and show that fine-tuning pre-trained models with a small sample of subject-specific data increases average F1 scores by 17% and 41% for binary and multi-class models, respectively. Our results demonstrate that mapping implicit fNIRS signals to agent performance is feasible and can be improved, laying the foundation for future Reinforcement Learning from Neural Feedback (RLNF) systems.
翻译:基于人类反馈的强化学习(RLHF)是一种通过将用户反馈整合到智能体训练过程中,使其行为与人类偏好对齐的方法。本文提出了一种通过隐式神经信号引导智能体训练的框架,重点关注神经信号分类问题。我们的工作呈现并发布了一个新颖的功能性近红外光谱(fNIRS)记录数据集,该数据集采集自25名人类参与者在三个任务领域(拾放机器人、月球着陆器、Flappy Bird)中的神经活动。我们训练了多个分类器,用于从预处理后的fNIRS特征时间窗口中预测智能体的不同表现水平(最优、次优或最差),在各类条件和任务领域中,二分类与多分类的平均F1分数分别达到67%与46%。我们还训练了多个回归器,用于预测智能体所选动作与一组近似最优策略动作之间的偏差程度,从而提供连续的性能度量。最后,我们评估了跨被试泛化能力,结果表明使用少量被试特定数据对预训练模型进行微调,可使二分类与多分类模型的平均F1分数分别提升17%与41%。我们的研究结果证明,将隐式fNIRS信号映射至智能体表现是可行的,且性能可进一步提升,这为未来基于神经反馈的强化学习(RLNF)系统奠定了基础。