Determining the intended, context-dependent meanings of noun compounds like "shoe sale" and "fire sale" remains a challenge for NLP. Previous work has relied on inventories of semantic relations that capture the different meanings between compound members. Focusing on Romanian compounds, whose morphosyntax differs from that of their English counterparts, we propose a new set of relations and test it with human annotators and a neural net classifier. Results show an alignment of the network's predictions and human judgments, even where the human agreement rate is low. Agreement tracks with the frequency of the selected relations, regardless of structural differences. However, the most frequently selected relation was none of the sixteen labeled semantic relations, indicating the need for a better relation inventory.
翻译:确定名词复合词(如“shoe sale”和“fire sale”)在语境中的特定含义仍是自然语言处理领域的一项挑战。以往研究依赖语义关系库来捕捉复合词成员间的不同含义。本研究聚焦于形态句法与英语不同的罗马尼亚语复合词,提出一组新的关系集合,并通过人工标注和神经网络分类器进行测试。结果表明,即使在人工一致性较低的情况下,网络的预测结果与人工判断仍具有一致性。无论结构差异如何,一致性均与所选关系的频率相关。然而,最常被选择的关系并非十六种标注语义关系中的任何一种,这表明需要更优的关系库。