Comparative knowledge (e.g., steel is stronger and heavier than styrofoam) is an essential component of our world knowledge, yet understudied in prior literature. In this paper, we study the task of comparative knowledge acquisition, motivated by the dramatic improvements in the capabilities of extreme-scale language models like GPT-3, which have fueled efforts towards harvesting their knowledge into knowledge bases. However, access to inference API for such models is limited, thereby restricting the scope and the diversity of the knowledge acquisition. We thus ask a seemingly implausible question: whether more accessible, yet considerably smaller and weaker models such as GPT-2, can be utilized to acquire comparative knowledge, such that the resulting quality is on par with their large-scale counterparts? We introduce NeuroComparatives, a novel framework for comparative knowledge distillation using lexically-constrained decoding, followed by stringent filtering of generated knowledge. Our framework acquires comparative knowledge between everyday objects and results in a corpus of 8.7M comparisons over 1.74M entity pairs - 10X larger and 30% more diverse than existing resources. Moreover, human evaluations show that NeuroComparatives outperform existing resources (up to 32% absolute improvement), even including GPT-3, despite using a 100X smaller model. Our results motivate neuro-symbolic manipulation of smaller models as a cost-effective alternative to the currently dominant practice of relying on extreme-scale language models with limited inference access.
翻译:比较性知识(例如,钢材比泡沫塑料更坚固且更重)是我们世界知识的重要组成部分,但在先前文献中尚未得到充分研究。本文聚焦比较性知识获取任务,并受GPT-3等超大规模语言模型能力显著提升的启发——这些进步推动了从大语言模型中提取知识以构建知识库的研究。然而,此类模型的推理API访问受限,限制了知识获取的范围和多样性。因此,我们提出一个看似不合理的研究问题:能否利用更易获取但规模小得多的模型(如GPT-2)获取比较性知识,使其质量媲美大规模模型?我们提出NeuroComparatives这一新颖框架,采用词汇受限解码进行比较性知识蒸馏,并辅以严格的知识过滤。该框架可获取日常物体间的比较性知识,生成包含174万实体对的870万条比较关系——规模是现有资源的10倍,多样性提升30%。人类评估表明,尽管NeuroComparatives使用的模型规模小100倍,但其性能仍超越现有资源(绝对提升达32%),甚至包括GPT-3。我们的研究结果证明,对较小规模模型进行神经符号操作是一种经济高效的替代方案,可取代当前过度依赖推理访问受限的超大规模语言模型的主流实践。