NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hindered by the scarcity of appropriately labeled datasets. The existing video datasets pose several limitations: 1) these datasets are small-scale in size to support comprehensive investigations of nursing activity; 2) they primarily focus on single procedures, lacking expert-level annotations for various nursing procedures and action steps; and 3) they lack temporally localized annotations, which prevents the effective localization of targeted actions within longer video sequences. To mitigate these limitations, we propose NurViD, a large video dataset with expert-level annotation for nursing procedure activity understanding. NurViD consists of over 1.5k videos totaling 144 hours, making it approximately four times longer than the existing largest nursing activity datasets. Notably, it encompasses 51 distinct nursing procedures and 177 action steps, providing a much more comprehensive coverage compared to existing datasets that primarily focus on limited procedures. To evaluate the efficacy of current deep learning methods on nursing activity understanding, we establish three benchmarks on NurViD: procedure recognition on untrimmed videos, procedure and action recognition on trimmed videos, and action detection. Our benchmark and code will be available at \url{https://github.com/minghu0830/NurViD-benchmark}.

翻译：深度学习在护理操作活动理解中的应用，有望显著提升护患互动的质量与安全性。通过运用该技术，我们可促进培训教育、改进质量控制，并实现操作合规性监测。然而，当前该领域自动识别系统的发展受限于标注数据集的匮乏。现有视频数据集存在以下不足：1）规模较小，难以支撑护理活动的全面研究；2）主要聚焦单一操作流程，缺乏针对多类护理操作及动作步骤的专家级标注；3）缺乏时间定位标注，导致无法有效定位长视频序列中的目标动作。为克服这些局限，我们提出NurViD——一个具有专家级标注的大型视频数据集，用于护理操作活动理解。NurViD包含超过1,500个视频，总时长144小时，约为现有最大护理活动数据集的四倍。尤为重要的是，它涵盖51种不同护理操作与177个动作步骤，相比主要聚焦有限操作流程的现有数据集，实现了更全面的覆盖。为评估当前深度学习方法在护理活动理解中的有效性，我们在NurViD上建立了三个基准任务：未修剪视频的操作识别、修剪视频的操作与动作识别，以及动作检测。我们的基准与代码将发布于\url{https://github.com/minghu0830/NurViD-benchmark}。