We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS) -- an infant sucking pattern with no nutrition delivered -- as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity. Our NNS activity segmentation algorithm solves this problem by identifying periods of NNS with high certainty -- up to 94.0\% average precision and 84.9\% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 hours of overnight baby monitor footage from 19 infants. Our method is based on an underlying NNS action recognition algorithm, which uses spatiotemporal deep learning networks and infant-specific pose estimation, achieving 94.9\% accuracy in binary classification of 960 2.5 s balanced NNS vs. non-NNS clips. Tested on our second, independent, and public NNS in-the-wild dataset, NNS recognition classification reaches 92.3\% accuracy, and NNS segmentation achieves 90.8\% precision and 84.2\% recall.
翻译:我们提出了一种端到端计算机视觉流程,利用商用婴儿监控视频素材检测非营养性吸吮(NNS)——一种无营养摄入的婴儿吸吮模式——作为发育迟缓的潜在生物标志物。NNS临床(或算法)评估的一个障碍源于其稀疏性,需要专家花费数小时浏览视频片段以寻找几分钟的相关活动。我们的NNS活动分割算法通过高置信度识别NNS时段解决了这一问题——在30个异构60秒视频片段中,平均精确率高达94.0%,平均召回率达84.9%。这些片段来自我们手动标注的NNS临床婴儿床数据集,包含19名婴儿共183小时夜间监控视频。该方法基于底层NNS动作识别算法,采用时空深度学习网络和婴儿专用姿态估计,在960个2.5秒的平衡NNS与非NNS片段二分类中达到94.9%的准确率。经第二个独立公开的NNS野外数据集测试,NNS识别分类准确率达92.3%,NNS分割的精确率和召回率分别为90.8%和84.2%。