Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.
翻译:尽管利用大语言模型进行内容分析的研究日益增多,但现有工作主要集中于文本内容。本研究通过开展一项遵循LLM辅助多模态内容分析新流程的案例研究,探索了LLM在辅助视频内容分析方面的潜力。该流程涵盖编码手册设计、提示工程、LLM处理及人工评估四个阶段。我们策略性地设计了用于获取结构化LLM标注的注释提示,以及用于生成LLM解释的理解提示,以增强对LLM推理过程与透明度的理解。为测试LLM的视频标注能力,我们分析了从25个抑郁症相关YouTube短视频中提取的203个关键帧。通过将LLM标注结果与两位人工编码员的标注进行对比,发现LLM在物体与活动标注方面比情感与体裁标注具有更高准确度。此外,我们识别了LLM在视频标注能力上的潜力与局限。基于研究发现,我们探讨了未来研究及工作流程改进的机遇与挑战,并讨论了基于LLM辅助视频分析的未来研究所涉及的伦理问题。