Many computational tasks benefit from being formulated as the composition of neural networks followed by a discrete symbolic program. The goal of neurosymbolic learning is to train the neural networks using only end-to-end input-output labels of the composite. We introduce CTSketch, a novel, scalable neurosymbolic learning algorithm. CTSketch uses two techniques to improve the scalability of neurosymbolic inference: decompose the symbolic program into sub-programs and summarize each sub-program with a sketched tensor. This strategy allows us to approximate the output distribution of the program with simple tensor operations over the input distributions and summaries. We provide theoretical insight into the maximum error of the approximation. Furthermore, we evaluate CTSketch on many benchmarks from the neurosymbolic literature, including some designed for evaluating scalability. Our results show that CTSketch pushes neurosymbolic learning to new scales that have previously been unattainable by obtaining high accuracy on tasks involving over one thousand inputs.
翻译:许多计算任务通过表述为神经网络与离散符号程序的组合而受益。神经符号学习的目标是仅使用组合系统的端到端输入-输出标签来训练神经网络。本文提出CTSketch——一种新颖且可扩展的神经符号学习算法。CTSketch采用两种技术提升神经符号推理的可扩展性:将符号程序分解为子程序,并使用草图张量对每个子程序进行概要表示。该策略使我们能够通过输入分布与概要之间的简单张量运算来近似程序的输出分布。我们从理论上分析了该近似方法的最大误差界限。此外,我们在神经符号文献中的多个基准测试上评估CTSketch,包括专为可扩展性评估设计的任务。实验结果表明,CTSketch通过在处理涉及超过一千个输入的任务中获得高精度,将神经符号学习推向了以往无法实现的新规模。