The release of ChatGPT in 2022 triggered a rapid surge in generative artificial intelligence mobile apps (Gen-AI apps). Despite widespread adoption, little is known about how end users perceive and evaluate these Gen-AI functionalities. We conduct a user-centered analysis of 1,035,342 reviews from 171 Gen-AI apps from the Google Play Store. We propose SARA (Selection, Acquisition, Refinement, and Analysis), a four-phase framework that leverages prompt-based LLMs for large-scale review analysis. We validate the reliability of LLM-based topic extraction and assignment using 4,353 manually evaluated reviews, achieving 91% accuracy with five-shot prompting and filtering of non-informative reviews. We identify the top ten topics (e.g., AI Performance and Emotional Connection) and perform a cross-platform comparison with Apple App Store reviews. Through qualitative analysis of 762 reviews, we uncover three opportunities (AI for Accessibility and Wellbeing, AI as a Collaborative Creative Tool, and AI Versatility) and three challenges (Managing User Expectations and AI Limitations, Balancing Content Moderation and Creative Freedom, and Strategic Integration of Gen-AI Features). Finally, we analyze temporal trends, revealing how user concerns shift as users mature. Our findings enable researchers and developers to better leverage the capabilities of Gen-AI apps and address potential challenges.
翻译:2022年ChatGPT的发布引发了生成式人工智能移动应用(Gen-AI应用)的迅猛增长。尽管这些应用已被广泛采用,但关于最终用户如何感知和评估这些Gen-AI功能的研究仍十分有限。我们对来自Google Play商店中171款Gen-AI应用的1,035,342条用户评论进行了以用户为中心的分析。我们提出了SARA(选择、获取、精炼与分析)四阶段框架,该框架利用基于提示的大语言模型进行大规模评论分析。我们通过4,353条人工评估的评论验证了基于LLM的主题提取与分配的可靠性,在使用五次提示策略并过滤非信息性评论后,准确率达到91%。我们识别出十大主题(例如,AI性能与情感连接),并与Apple App Store的评论进行了跨平台比较。通过对762条评论的定性分析,我们发现了三个机遇(AI助力无障碍与福祉、AI作为协同创作工具、AI多功能性)和三个挑战(管理用户期望与AI局限性、平衡内容审核与创作自由、Gen-AI功能的战略整合)。最后,我们分析了时间趋势,揭示了用户关注点如何随其成熟度而变化。我们的研究发现使研究人员和开发者能够更好地利用Gen-AI应用的能力并应对潜在挑战。