Empirical Basis of Engineering Design Knowledge

Engineering design knowledge is embodied in natural language text through intricate placement of entities and relationships. Ontological constructs of design knowledge often limit the performances of NLP techniques to extract design knowledge. Also, large-language models could be less useful for generating and explicating design knowledge, as these are trained predominantly on common-sense text. In this article, we present the constituents of design knowledge based on empirical observations from patent documents. We obtain a sample of 33,881 patents and populate over 24 million facts from the sentences in these. We conduct Zipf distribution analyses using the frequencies of unique entities and relationships that are present in the facts thus populated. While the literal entities cannot be generalised from the sample of patents, the relationships largely capture attributes ('of'), structure ('in', 'with'), purpose ('to', 'for'), hierarchy ('include'), exemplification ('such as'), and behaviour ('to', 'from'). The analyses reveal that over half of entities and relationships could be generalised to 64 and 24 linguistic syntaxes respectively, while hierarchical relationships include 75 syntaxes. These syntaxes represent the linguistic basis of engineering design knowledge. We combine facts within each patent into a knowledge graph, from which we discover motifs that are statistically over-represented subgraph patterns. Across all patents in the sample, we identify eight patterns that could be simplified into sequence [->...->], aggregation [->...<-], and hierarchy [<-...->] that form the structural basis of engineering design knowledge. We propose regulatory precepts for concretising abstract entities and relationships within subgraphs, while also explicating hierarchical structures. These precepts could be useful for better construction and management of knowledge in a design environment.

翻译：工程设计知识通过自然语言文本中实体与关系的精细布局得以体现。设计知识的本体构造常限制自然语言处理技术提取设计知识的性能。此外，由于大语言模型主要基于常识文本训练，其在生成与阐释设计知识方面的效用可能有限。本文基于专利文献的经验观察，呈现设计知识的构成要素。我们获取了33,881项专利样本，并从其中的句子中构建了超过2400万条事实。利用这些事实中出现的唯一实体与关系的频率，我们进行了齐普夫分布分析。尽管文字实体无法从专利样本中推广，但关系主要捕获属性（'of'）、结构（'in', 'with'）、目的（'to', 'for'）、层级（'include'）、示例（'such as'）及行为（'to', 'from'）。分析表明，超过半数的实体与关系可分别归纳为64种和24种语言句法，而层级关系则包含75种句法。这些句法表征了工程设计知识的语言学基础。我们将每项专利中的事实合并为知识图谱，从中发现了统计学上显著过表达的基序（子图模式）。在整个样本的所有专利中，我们识别出八种可简化为序列[->...->]、聚合[->...<-]和层级[<-...->]的模式，这些构成了工程设计知识的结构基础。我们提出了用于具体化子图中抽象实体与关系并阐释层级结构的规约法则。这些法则有助于设计环境中知识的更优构建与管理。