Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

Across the natural and life sciences, images have become a primary measurement modality, yet the dominant analytic paradigm remains semantics-first. Structure is recovered by predicting or enforcing domain-specific labels. This paradigm fails systematically under the conditions that make image-based science most valuable, including open-ended scientific discovery, cross-sensor and cross-site comparability, and long-term monitoring in which domain ontologies and associated label sets drift culturally, institutionally, and ecologically. A deductive inversion is proposed in the form of criteria-first and semantics-later. A unified framework for criteria-first structure discovery is introduced. It separates criterion-defined, semantics-free structure extraction from downstream semantic mapping into domain ontologies or vocabularies and provides a domain-general scaffold for reproducible analysis across image-based sciences. Reproducible science requires that the first analytic layer perform criterion-driven, semantics-free structure discovery, yielding stable partitions, structural fields, or hierarchies defined by explicit optimality criteria rather than local domain ontologies. Semantics is not discarded; it is relocated downstream as an explicit mapping from the discovered structural product to a domain ontology or vocabulary, enabling plural interpretations and explicit crosswalks without rewriting upstream extraction. Grounded in cybernetics, observation-as-distinction, and information theory's separation of information from meaning, the argument is supported by cross-domain evidence showing that criteria-first components recur whenever labels do not scale. Finally, consequences are outlined for validation beyond class accuracy and for treating structural products as FAIR, AI-ready digital objects for long-term monitoring and digital twins.

翻译：在自然科学与生命科学领域，图像已成为主要的测量手段，然而主流的分析范式仍遵循“语义先行”原则。该范式通过预测或强制应用特定领域的标签来恢复结构。在使图像科学最具价值的场景下——包括开放式的科学发现、跨传感器与跨站点的可比性，以及因领域本体论及相关标签集在文化、制度和生态层面发生漂移而需进行的长期监测——这种范式会系统性地失效。本文提出一种演绎性倒置，即“标准先行，语义后置”。我们引入了一个统一的标准先行结构发现框架。该框架将基于标准定义、无关语义的结构提取，与下游向领域本体或词汇表的语义映射分离开来，为图像科学中可复现的分析提供了一个领域通用的支撑框架。可复现的科学要求第一层分析执行标准驱动、无关语义的结构发现，产生由显式最优性标准（而非局部领域本体论）定义的稳定分区、结构场或层次结构。语义并未被抛弃，而是被重新定位至下游，作为从发现的结构产物到领域本体或词汇表的显式映射，从而支持多元解释与明确的交叉对照，而无需重写上游提取过程。该论点植根于控制论、作为区分的观察以及信息论中信息与意义的分离，并得到跨领域证据的支持，这些证据表明，每当标签无法扩展时，标准先行的组件便会反复出现。最后，本文概述了超越类别准确率的验证方法，以及将结构产物视为符合FAIR原则、AI就绪的数字对象以用于长期监测和数字孪生的相关影响。