Linguistic commentary on LLMs, heavily influenced by the theoretical frameworks of de Saussure and Chomsky, is often speculative and unproductive. Critics challenge whether LLMs can legitimately model language, citing the need for "deep structure" or "grounding" to achieve an idealized linguistic "competence." We argue for a radical shift in perspective towards the empiricist principles of Witold Ma\'nczak, a prominent general and historical linguist. He defines language not as a "system of signs" or a "computational system of the brain" but as the totality of all that is said and written. Above all, he identifies frequency of use of particular language elements as language's primary governing principle. Using his framework, we challenge prior critiques of LLMs and provide a constructive guide for designing, evaluating, and interpreting language models.
翻译:基于德·索绪尔和乔姆斯基理论框架对大型语言模型的语言学评论常常流于推测且缺乏建设性。批评者质疑大型语言模型能否真正建模语言,认为需要"深层结构"或"语义基础"才能实现理想化的语言"能力"。我们主张彻底转向著名普通语言学家与历史语言学家维托尔德·曼恰克的实证主义原则。他将语言定义为"所有口头与书面表达的总和",而非"符号系统"或"大脑计算系统"。最重要的是,他将特定语言元素的使用频率确立为语言的首要支配原则。运用这一框架,我们回应对大型语言模型的既有批评,并为语言模型的设计、评估与阐释提供了建设性指导。