Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss

Automated coaching messages for weight control can save time and costs, but their repetitive, generic nature may limit their effectiveness compared to human coaching. Large language model (LLM) based artificial intelligence (AI) chatbots, like ChatGPT, could offer more personalized and novel messages to address repetition with their data-processing abilities. While LLM AI demonstrates promise to encourage healthier lifestyles, studies have yet to examine the feasibility and acceptability of LLM-based BWL coaching. 87 adults in a weight-loss trial rated ten coaching messages' helpfulness (five human-written, five ChatGPT-generated) using a 5-point Likert scale, providing additional open-ended feedback to justify their ratings. Participants also identified which messages they believed were AI-generated. The evaluation occurred in two phases: messages in Phase 1 were perceived as impersonal and negative, prompting revisions for Phase 2 messages. In Phase 1, AI-generated messages were rated less helpful than human-written ones, with 66 percent receiving a helpfulness rating of 3 or higher. However, in Phase 2, the AI messages matched the human-written ones regarding helpfulness, with 82% scoring three or above. Additionally, 50% were misidentified as human-written, suggesting AI's sophistication in mimicking human-generated content. A thematic analysis of open-ended feedback revealed that participants appreciated AI's empathy and personalized suggestions but found them more formulaic, less authentic, and too data-focused. This study reveals the preliminary feasibility and acceptability of LLM AIs, like ChatGPT, in crafting potentially effective weight control coaching messages. Our findings also underscore areas for future enhancement.

翻译：自动化减重辅导信息可节省时间和成本，但其重复性和通用性可能限制其相较于人类辅导的效力。基于大语言模型的AI聊天机器人（如ChatGPT）凭借其数据处理能力，能生成更个性化、更具新意的信息以缓解重复性问题。尽管LLM AI在促进健康生活方式方面展现出潜力，但现有研究尚未检验基于LLM的行为减重辅导的可行性与可接受性。87名参与减重试验的成年人采用5点李克特量表对十条辅导信息的帮助程度进行评分（五条为人类撰写，五条由ChatGPT生成），并提供开放式反馈以佐证其评分。参与者还需判断哪些信息由AI生成。评估分两阶段进行：第一阶段的信息被认为缺乏个性且带有负面倾向，据此修订生成第二阶段信息。第一阶段中，AI生成信息的帮助评分低于人类撰写信息，66%的信息获得3分及以上评价；而在第二阶段，AI信息与人类信息的帮助评分持平，82%的信息得分高于3分。此外，50%的AI信息被误判为人类撰写，表明AI在模仿人类内容方面已具备较高水平。对开放式反馈的主题分析显示，参与者认可AI的共情能力和个性化建议，但认为其内容更模板化、缺乏真实感且过度侧重数据。本研究揭示了ChatGPT等LLM AI在生成有效减重辅导信息方面的初步可行性与可接受性，同时指出未来需改进的方向。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日