In this work, we present, AVR application for audio-visual humor detection. While humor detection has traditionally centered around textual analysis, recent advancements have spotlighted multimodal approaches. However, these methods lean on textual cues as a modality, necessitating the use of ASR systems for transcribing the audio-data. This heavy reliance on ASR accuracy can pose challenges in real-world applications. To address this bottleneck, we propose an innovative audio-visual humor detection system that circumvents textual reliance, eliminating the need for ASR models. Instead, the proposed approach hinges on the intricate interplay between audio and visual content for effective humor detection.
翻译:本研究提出了一种用于视听幽默检测的AVR应用。尽管幽默检测传统上以文本分析为核心,但近年来的进展凸显了多模态方法的重要性。然而,现有方法通常将文本线索作为一种模态,需要依赖自动语音识别系统对音频数据进行转录。这种对ASR准确性的高度依赖在实际应用中可能带来挑战。为突破这一瓶颈,我们提出了一种创新的视听幽默检测系统,该系统规避了对文本的依赖,从而无需使用ASR模型。相反,所提出的方法依赖于音频与视觉内容之间复杂的相互作用来实现有效的幽默检测。