We study networked binary classification on a directed acyclic graph (DAG) where each agent observes only a subset of the feature columns of a shared dataset. Agents act sequentially along the DAG: each receives prediction columns from its parents (if any), augments its local features with these columns, fits a logistic predictor by minimizing binary cross-entropy (BCE), and forwards its prediction column to its outgoing neighbors. We ask whether this sequential distributed training procedure achieves information aggregation, meaning that some agent attains small excess loss compared to the best logistic predictor trained with access to all feature columns. This question was studied for linear regression under squared loss by Kearns, Roth, and Ryu (SODA 2026). Extending their guarantees to classification is nontrivial because their analysis relies on quadratic structure that does not directly transfer to BCE with a logistic link. We analyze the resulting sequential logit-passing protocol and prove: (i) an excess loss upper bound of $O(M/\sqrt{D})$ on depth-$D$ paths under the condition that every $M$ contiguous subsequence of $M$ agents collectively observe all features, and (ii) a close lower bound showing instances with excess loss of at least $Ω(k/D)$ where $k$ is the dimension of the feature space. Together, these results identify network depth as a fundamental bottleneck for information aggregation in networked logistic regression.
翻译:我们研究有向无环图(DAG)上的网络化二元分类问题,其中每个智能体仅观察共享数据集中特征列的一个子集。智能体沿DAG顺序执行:每个智能体从其父节点(若有)接收预测列,将这些列与其局部特征合并,通过最小化二元交叉熵(BCE)拟合逻辑回归预测器,并将其预测列转发给其出向邻居。我们探究这种顺序分布式训练过程是否能够实现信息聚合,即某些智能体相较于访问所有特征列训练的最佳逻辑回归预测器,其超额损失较小。该问题最初由Kearns、Roth和Ryu在平方损失下的线性回归研究中提出(SODA 2026)。将其保证推广到分类任务并非易事,因为他们的分析依赖于二次型结构,而该结构无法直接迁移到带逻辑链接的BCE。我们分析了由此产生的顺序逻辑回归传递协议,并证明:(i)在每$M$个连续子序列的智能体共同观察所有特征的条件下,深度为$D$的路径上超额损失的上界为$O(M/\sqrt{D})$;(ii)一个紧密的下界,表明存在实例使得超额损失至少为$Ω(k/D)$,其中$k$是特征空间的维度。综合这些结果,我们识别出网络深度是网络化逻辑回归中信息聚合的基本瓶颈。