While LLMs excel in zero-shot tasks, their performance in linguistic challenges like syntactic parsing has been less scrutinized. This paper studies state-of-the-art open-weight LLMs on the task by comparing them to baselines that do not have access to the input sentence, including baselines that have not been used in this context such as random projective trees or optimal linear arrangements. The results show that most of the tested LLMs cannot outperform the best uninformed baselines, with only the newest and largest versions of LLaMA doing so for most languages, and still achieving rather low performance. Thus, accurate zero-shot syntactic parsing is not forthcoming with open LLMs.
翻译:尽管大语言模型在零样本任务中表现出色,但其在句法分析等语言挑战任务中的性能尚未得到充分审视。本文通过将最先进的开源权重大语言模型与无法获取输入句子的基线模型进行比较,研究了该任务下的模型表现,其中包括此前未在该语境中使用的基线方法,如随机投影树或最优线性排列。结果表明,大多数测试的大语言模型无法超越最佳的无先验基线模型,仅有最新且规模最大的LLaMA版本在多数语言中表现更优,但其性能仍处于较低水平。因此,基于开源大语言模型实现精确的零样本句法分析尚不成熟。