InteraRec: Screenshot Based Recommendations Using Multimodal Large Language Models

Weblogs, comprised of records detailing user activities on any website, offer valuable insights into user preferences, behavior, and interests. Numerous recommendation algorithms, employing strategies such as collaborative filtering, content-based filtering, and hybrid methods, leverage the data mined through these weblogs to provide personalized recommendations to users. Despite the abundance of information available in these weblogs, identifying and extracting pertinent information and key features from them necessitate extensive engineering endeavors. The intricate nature of the data also poses a challenge for interpretation, especially for non-experts. In this study, we introduce a sophisticated and interactive recommendation framework denoted as InteraRec, which diverges from conventional approaches that exclusively depend on weblogs for recommendation generation. InteraRec framework captures high-frequency screenshots of web pages as users navigate through a website. Leveraging state-of-the-art multimodal large language models (MLLMs), it extracts valuable insights into user preferences from these screenshots by generating a textual summary based on predefined keywords. Subsequently, an LLM-integrated optimization setup utilizes this summary to generate tailored recommendations. Through our experiments, we demonstrate the effectiveness of InteraRec in providing users with valuable and personalized offerings. Furthermore, we explore the integration of session-based recommendation systems into the InteraRec framework, aiming to enhance its overall performance. Finally, we curate a new dataset comprising of screenshots from product web pages on the Amazon website for the validation of the InteraRec framework. Detailed experiments demonstrate the efficacy of the InteraRec framework in delivering valuable and personalized recommendations tailored to individual user preferences.

翻译：网络日志记录了用户在网站上的活动详情，为理解用户偏好、行为和兴趣提供了宝贵信息。众多推荐算法（如协同过滤、基于内容的过滤及混合方法）利用从这些日志中挖掘的数据为用户提供个性化推荐。尽管网络日志蕴含丰富信息，但从中识别并提取相关信息和关键特征需要大量的工程投入。数据的复杂性也给非专业人士的解读带来了挑战。本研究提出了一种先进的交互式推荐框架InteraRec，它不同于传统仅依赖网络日志生成推荐的方法。InteraRec框架通过在用户浏览网站时高频截取网页屏幕截图，借助前沿的多模态大语言模型（MLLMs），根据预定义关键词生成文本摘要，从而从截图中提取有价值的用户偏好信息。随后，一个集成LLM的优化设置利用该摘要生成定制化推荐。实验证明，InteraRec能有效为用户提供有价值的个性化推荐。此外，我们探索了将基于会话的推荐系统集成到InteraRec框架中，以提升其整体性能。最后，我们构建了一个包含亚马逊网站产品页面截图的新数据集，用于验证InteraRec框架。详细实验表明，InteraRec框架在根据个体用户偏好提供有价值且个性化的推荐方面具有显著效能。