Using Large Language Models to Assist Video Content Analysis: An Exploratory Study of Short Videos on Depression

Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.

翻译：尽管利用大型语言模型（LLMs）进行内容分析日益受到关注，但当前研究主要集中于文本内容。在本研究中，我们通过一项遵循LLM辅助多模态内容分析新流程的案例研究，探索了LLMs在辅助视频内容分析方面的潜力。该流程涵盖编码手册设计、提示工程、LLM处理和人工评估。我们策略性地设计了用于获取结构化LLM标注的注释提示，以及用于生成LLM解释以更好理解LLM推理过程和透明度的解释提示。为测试LLM的视频标注能力，我们分析了从25个关于抑郁症的YouTube短视频中提取的203个关键帧。通过将LLM标注与两位人工编码员的标注进行比较，我们发现LLM在物体和活动标注方面比在情绪和类型标注方面具有更高的准确性。此外，我们识别了LLM在视频标注能力方面的潜力与局限。基于这些发现，我们探讨了未来研究的机遇与挑战以及工作流程的改进方向。同时，我们也讨论了基于LLM辅助视频分析的未来研究所涉及的伦理问题。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日