Intimacy estimation of a given text has recently gained importance due to the increase in direct interaction of NLP systems with humans. Intimacy is an important aspect of natural language and has a substantial impact on our everyday communication. Thus the level of intimacy can provide us with deeper insights and richer semantics of conversations. In this paper, we present our work on the SemEval shared task 9 on predicting the level of intimacy for the given text. The dataset consists of tweets in ten languages, out of which only six are available in the training dataset. We conduct several experiments and show that an ensemble of multilingual models along with a language-specific monolingual model has the best performance. We also evaluate other data augmentation methods such as translation and present the results. Lastly, we study the results thoroughly and present some noteworthy insights into this problem.
翻译:文本亲密性估计近年来因自然语言处理系统与人类直接交互的增加而变得愈发重要。亲密性是自然语言的重要属性,对日常交流具有显著影响。因此,亲密性水平能够为我们提供对话中更深层的见解和更丰富的语义信息。本文针对SemEval共享任务9中预测给定文本亲密性水平的工作进行阐述。该数据集包含十种语言的推文,其中仅六种语言出现在训练数据集中。我们通过多项实验证明,多语言模型与特定语言单语模型集成的方案具有最佳性能。同时,我们评估了翻译等数据增强方法并展示了实验结果。最后,我们对实验结果进行深度分析,并就该问题提出若干值得关注的见解。