Approaches form the foundation for conducting scientific research. Querying approaches from a vast body of scientific papers is extremely time-consuming, and without a well-organized management framework, researchers may face significant challenges in querying and utilizing relevant approaches. Constructing multiple dimensions on approaches and managing them from these dimensions can provide an efficient solution. Firstly, this paper identifies approach patterns using a top-down way, refining the patterns through four distinct linguistic levels: semantic level, discourse level, syntactic level, and lexical level. Approaches in scientific papers are extracted based on approach patterns. Additionally, five dimensions for categorizing approaches are identified using these patterns. This paper proposes using tree structure to represent step and measuring the similarity between different steps with a tree-structure-based similarity measure that focuses on syntactic-level similarities. A collection similarity measure is proposed to compute the similarity between approaches. A bottom-up clustering algorithm is proposed to construct class trees for approach components within each dimension by merging each approach component or class with its most similar approach component or class in each iteration. The class labels generated during the clustering process indicate the common semantics of the step components within the approach components in each class and are used to manage the approaches within the class. The class trees of the five dimensions collectively form a multi-dimensional approach space. The application of approach queries on the multi-dimensional approach space demonstrates that querying within this space ensures strong relevance between user queries and results and rapidly reduces search space through a class-based query mechanism.
翻译:方法是开展科学研究的基础。从海量科学论文中查询方法极为耗时,若缺乏组织良好的管理框架,研究者在查询与利用相关方法时将面临重大挑战。构建方法的多维度分类体系并基于这些维度进行管理,可提供高效的解决方案。本文首先采用自上而下的方式识别方法模式,并通过语义层面、语篇层面、句法层面和词汇层面四个不同语言层级对模式进行细化。基于方法模式从科学论文中提取方法实体。进一步利用这些模式识别出五个方法分类维度。本文提出使用树结构表示方法步骤,并采用一种基于句法相似度的树结构相似性度量方法计算不同步骤间的相似性。进而提出集合相似度度量以计算方法之间的整体相似性。通过自底向上的聚类算法,在每一轮迭代中将每个方法组件或类别与其最相似的方法组件或类别进行合并,从而为每个维度内的方法组件构建类别树。聚类过程中生成的类别标签表征了各类别内方法组件中步骤成分的共性语义,用于管理该类别中的所有方法。五个维度的类别树共同构成多维方法空间。在多维方法空间中进行方法查询的应用表明,在该空间内查询能确保用户查询与结果间的高度相关性,并通过基于类别的查询机制快速缩减搜索空间。