Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.
翻译:尽管利用大型语言模型(LLMs)进行内容分析日益受到关注,但当前研究主要集中于文本内容。在本研究中,我们通过一项遵循LLM辅助多模态内容分析新流程的案例研究,探索了LLMs在辅助视频内容分析方面的潜力。该流程涵盖编码手册设计、提示工程、LLM处理和人工评估。我们策略性地设计了用于获取结构化LLM标注的注释提示,以及用于生成LLM解释以更好理解LLM推理过程和透明度的解释提示。为测试LLM的视频标注能力,我们分析了从25个关于抑郁症的YouTube短视频中提取的203个关键帧。通过将LLM标注与两位人工编码员的标注进行比较,我们发现LLM在物体和活动标注方面比在情绪和类型标注方面具有更高的准确性。此外,我们识别了LLM在视频标注能力方面的潜力与局限。基于这些发现,我们探讨了未来研究的机遇与挑战以及工作流程的改进方向。同时,我们也讨论了基于LLM辅助视频分析的未来研究所涉及的伦理问题。