Parsing is the process of analyzing a sentence's syntactic structure by breaking it down into its grammatical components. and is critical for various linguistic applications. Urdu is a low-resource, free word-order language and exhibits complex morphology. Literature suggests that dependency parsing is well-suited for such languages. Our approach begins with a basic feature model encompassing word location, head word identification, and dependency relations, followed by a more advanced model integrating part-of-speech (POS) tags and morphological attributes (e.g., suffixes, gender). We manually annotated a corpus of news articles of varying complexity. Using Maltparser and the NivreEager algorithm, we achieved a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%, demonstrating the feasibility of dependency parsing for Urdu.
翻译:句法分析是通过将句子分解为语法成分来解析其句法结构的过程,对于多种语言应用至关重要。乌尔都语是一种资源匮乏的自由语序语言,并呈现出复杂的形态特征。文献研究表明,依存句法分析非常适用于此类语言。我们的方法始于一个包含词位置、中心词识别和依存关系的基础特征模型,随后引入了一个整合了词性标签和形态属性(如后缀、性别)的更高级模型。我们手工标注了一个包含不同复杂度新闻文章的语料库。使用Maltparser与NivreEager算法,我们取得了70%的最佳标记准确率和84%的无标记依存正确率,证明了乌尔都语依存句法分析的可行性。