A fundamental but largely unaddressed obstacle in Symbolic regression (SR) is structural redundancy: every expression DAG with admits many distinct node-numbering schemes that all encode the same expression, each occupying a separate point in the search space and consuming fitness evaluations without adding diversity. We present IsalSR (Instruction Set and Language for Symbolic Regression), a representation framework that encodes expression DAGs as strings over a compact two-tier alphabet and computes a pruned canonical string -- a complete labeled-DAG isomorphism invariant -- that collapses all the equivalent representations into a single canonical form.
翻译:符号回归(SR)中一个基础但尚未充分解决的障碍是结构冗余性:每个表达式有向无环图(DAG)存在多种节点编号方案,这些方案均编码同一表达式,却在搜索空间中占据独立位置并消耗适应度评估而不增加多样性。本文提出IsalSR(符号回归指令集与语言),该表示框架将表达式DAG编码为紧凑双层字母表上的字符串,并计算精简规范字符串(完整标记DAG同构不变量),将全部等价表示归约为单一规范形式。