The description complexity of a model is the length of the shortest formula that defines the model. We study the description complexity of unary structures in first-order logic FO, also drawing links to semantic complexity in the form of entropy. The class of unary structures provides, e.g., a simple way to represent tabular Boolean data sets as relational structures. We define structures with FO-formulas that are strictly linear in the size of the model as opposed to using the naive quadratic ones, and we use arguments based on formula size games to obtain related lower bounds for description complexity. For a typical structure the upper and lower bounds in fact match up to a sublinear term, leading to a precise asymptotic result on the expected description complexity of a randomly selected structure. We then give bounds on the relationship between Shannon entropy and description complexity. We extend this relationship also to Boltzmann entropy by establishing an asymptotic match between the two entropies. Despite the simplicity of unary structures, our arguments require the use of formula size games, Stirling's approximation and Chernoff bounds.
翻译:模型的描述复杂度是定义该模型的最短公式的长度。我们研究一阶逻辑(FO)中一元结构的描述复杂度,并建立其与熵这一语义复杂度形式的联系。一元结构类提供了一种将表格化布尔数据集表示为关系结构的简单方式。我们定义了可由FO公式描述的结构,这些公式的长度严格线性于模型规模,而非使用朴素二次型公式,并基于公式规模博弈论证来获得描述复杂度的相关下界。对于典型结构,其上界与下界实际上匹配至一个次线性项,从而导出了随机选取结构的期望描述复杂度的精确渐近结果。随后,我们给出了香农熵与描述复杂度之间关系的界。通过建立两种熵之间的渐近匹配,我们将这一关系进一步拓展至玻尔兹曼熵。尽管一元结构形式简单,但我们的论证过程需要运用公式规模博弈、斯特林近似和切尔诺夫界。