Involution Fused ConvNet for Classifying Eye-Tracking Patterns of Children with Autism Spectrum Disorder

Autism Spectrum Disorder (ASD) is a complicated neurological condition which is challenging to diagnose. Numerous studies demonstrate that children diagnosed with autism struggle with maintaining attention spans and have less focused vision. The eye-tracking technology has drawn special attention in the context of ASD since anomalies in gaze have long been acknowledged as a defining feature of autism in general. Deep Learning (DL) approaches coupled with eye-tracking sensors are exploiting additional capabilities to advance the diagnostic and its applications. By learning intricate nonlinear input-output relations, DL can accurately recognize the various gaze and eye-tracking patterns and adjust to the data. Convolutions alone are insufficient to capture the important spatial information in gaze patterns or eye tracking. The dynamic kernel-based process known as involutions can improve the efficiency of classifying gaze patterns or eye tracking data. In this paper, we utilise two different image-processing operations to see how these processes learn eye-tracking patterns. Since these patterns are primarily based on spatial information, we use involution with convolution making it a hybrid, which adds location-specific capability to a deep learning model. Our proposed model is implemented in a simple yet effective approach, which makes it easier for applying in real life. We investigate the reasons why our approach works well for classifying eye-tracking patterns. For comparative analysis, we experiment with two separate datasets as well as a combined version of both. The results show that IC with three involution layers outperforms the previous approaches.

翻译：自闭症谱系障碍（ASD）是一种复杂的神经系统疾病，诊断极具挑战性。大量研究表明，自闭症儿童存在注意力维持困难且视觉聚焦能力较弱。由于凝视异常长期以来被视为自闭症的典型特征，眼动追踪技术在该领域受到特别关注。结合眼动传感器的深度学习（DL）方法正利用其额外能力推动诊断技术及应用发展。通过学习复杂的非线性输入输出关系，DL能够准确识别不同凝视与眼动模式并自适应调整数据。仅靠卷积无法充分捕捉凝视模式或眼动轨迹中的关键空间信息。基于动态核的自卷积（involution）过程可提升凝视模式或眼动数据分类的效率。本文采用两种不同图像处理操作，探究这些过程如何学习眼动模式。鉴于这些模式主要基于空间信息，我们将自卷积与卷积融合形成混合模型，为深度学习模型增加位置感知能力。我们以简洁高效的方式实现所提模型，便于实际应用。通过探究该方法在眼动模式分类中表现优异的原因，我们在两个独立数据集及其组合版本上进行对比实验。结果表明，包含三层自卷积层的融合卷积网络（IC）优于现有方法。