We propose a definition of graph subshifts of finite type that can be seen as extending both the notions of subshifts of finite type from classical symbolic dynamics and finitely presented groups from combinatorial group theory. These are sets of graphs that are defined by forbidding finitely many local patterns. In this paper, we focus on the question whether such local conditions can enforce a specific support graph, and thus relate the model to classical symbolic dynamics. We prove that the subshifts that contain only infinite graphs are either aperiodic, or feature no residual finiteness of their period group, yielding non-trivial examples as well as two natural undecidability theorems.


翻译:我们提出了一种有限型图子移位的定义,该定义可视为同时推广了经典符号动力学中的有限型子移位概念和组合群论中的有限表示群。这些图子移位是由禁止有限多个局部模式所定义的图集合。本文重点探讨此类局部条件是否能强制规定特定的支撑图,从而将模型与经典符号动力学联系起来。我们证明了仅包含无限图的子移位要么是非周期的,要么其周期群不具备剩余有限性,由此既得到非平凡的例子,也得出两个自然的不可判定性定理。

0
下载
预览

We prove that Hamiltonicity in maximum-degree-3 grid graphs (directed or undirected) is ASP-complete, i.e., it has a parsimonious reduction from every NP search problem (including a polynomial-time bijection between solutions). As a consequence, given k Hamiltonian cycles, it is NP-complete to find another; and counting Hamiltonian cycles is #P-complete. If we require the grid graph's vertices to form a full $m \times n$ rectangle, then we show that Hamiltonicity remains ASP-complete if the edges are directed or if we allow removing some edges (whereas including all undirected edges is known to be easy). These results enable us to develop a stronger "T-metacell" framework for proving ASP-completeness of rectangular puzzles, which requires building just a single gadget representing a degree-3 grid-graph vertex. We apply this general theory to prove ASP-completeness of 38 pencil-and-paper puzzles where the goal is to draw a loop subject to given constraints: Slalom, Onsen-meguri, Mejilink, Detour, Tapa-Like Loop, Kouchoku, Icelom; Masyu, Yajilin, Nagareru, Castle Wall, Moon or Sun, Country Road, Geradeweg, Maxi Loop, Mid-loop, Balance Loop, Simple Loop, Haisu, Reflect Link, Linesweeper; Vertex/Touch Slitherlink, Dotchi-Loop, Ovotovata, Building Walk, Rail Pool, Disorderly Loop, Ant Mill, Koburin, Mukkonn Enn, Rassi Silai, (Crossing) Ichimaga, Tapa, Canal View, Aqre, and Paintarea. The last 14 of these puzzles were not even known to be NP-hard. Along the way, we prove ASP-completeness of some simple forms of Tree-Residue Vertex-Breaking (TRVB), including planar multigraphs with degree-6 breakable vertices, or with degree-4 breakable and degree-1 unbreakable vertices.


翻译:我们证明了最大度为3的网格图(有向或无向)中的哈密顿性问题是ASP完备的,即每个NP搜索问题(包括解之间的多项式时间双射)均可通过简约归约得到该问题。由此,给定k个哈密顿环,寻找另一个环是NP完全的;而计数哈密顿环的数量则是#P完全的。若要求网格图顶点构成完整的$m \times n$矩形,我们证明当边为有向或允许移除部分边时,哈密顿性仍为ASP完备(而包含所有无向边的情形已知是容易的)。这些结果使我们能够开发更强的“T-元胞”框架来证明矩形谜题的ASP完备性,该框架仅需构造一个代表度为3的网格图顶点的单一构件。我们应用这一通用理论证明了38种铅笔-纸张谜题的ASP完备性,这些谜题的目标是在给定约束下绘制一条回环:Slalom、Onsen-meguri、Mejilink、Detour、Tapa-Like Loop、Kouchoku、Icelom;Masyu、Yajilin、Nagareru、Castle Wall、Moon or Sun、Country Road、Geradeweg、Maxi Loop、Mid-loop、Balance Loop、Simple Loop、Haisu、Reflect Link、Linesweeper;Vertex/Touch Slitherlink、Dotchi-Loop、Ovotovata、Building Walk、Rail Pool、Disorderly Loop、Ant Mill、Koburin、Mukkonn Enn、Rassi Silai、(Crossing) Ichimaga、Tapa、Canal View、Aqre和Paintarea。其中最后14种谜题此前甚至未被证明是NP难的。在此过程中,我们还证明了某些简单形式的树残基顶点断裂问题(TRVB)的ASP完备性,包括包含度为6的可断裂顶点的平面多重图,或包含度为4的可断裂顶点与度为1的不可断裂顶点的情形。

0
下载
预览

We prove that any $n$-qubit unitary can be implemented (i) approximately in time $\tilde O\big(2^{n/2}\big)$ with query access to an appropriate classical oracle, and also (ii) exactly by a circuit of depth $\tilde O\big(2^{n/2}\big)$ with one- and two-qubit gates and $2^{O(n)}$ ancillae. The proofs involve similar reductions to Grover search. The proof of (ii) also involves a linear-depth construction of arbitrary quantum states using one- and two-qubit gates (in fact, this can be improved to constant depth with the addition of fanout and generalized Toffoli gates) which may be of independent interest. We also prove a matching $Ω\big(2^{n/2}\big)$ lower bound for (i) and (ii) for a certain class of implementations.


翻译:我们证明,任意n量子比特酉算子可以(i)在时间$\tilde O\big(2^{n/2}\big)$内,通过对适当经典预言机的查询访问近似实现;(ii)通过深度为$\tilde O\big(2^{n/2}\big)$、使用单量子比特门和两量子比特门以及$2^{O(n)}$个辅助量子比特的电路精确实现。证明过程均归约到Grover搜索。其中(ii)的证明还涉及使用单量子比特门和两量子比特门构建任意量子态的线性深度构造(实际上,通过增加扇出门和广义Toffoli门,该深度可改进为常数深度),该结果可能具有独立研究价值。此外,对于特定类型的实现,我们还证明了(i)和(ii)的下界为匹配的$Ω\big(2^{n/2}\big)$。

0
下载
预览

We study the monomer--dimer partition function on the configuration model of random $d$-regular, $l$-uniform hypergraphs. For fixed $d,l\ge2$, we prove quenched free-energy limits in explicit parameter regimes. The proof combines fixed-density first-moment asymptotics, a two-overlap second-moment variational analysis, and a subgraph-conditioning argument for the short cycles of the incidence structure. The main technical point is to identify regimes in which the replica-symmetric saddle is the unique global maximizer of the second-moment rate function. In those regimes the normalized logarithm of the total matching partition function converges in probability to an explicit variational value. We also prove the corresponding result for the weighted partition function whenever the maximizing density lies in the verified replica-symmetric region, give an additional checkable criterion for that region, and record a first-moment upper tail estimate for the maximum matching size.


翻译:我们研究随机 $d$-正则、$l$-均匀超图配置模型上的单体-二聚体配分函数。对于固定的 $d,l\ge2$,我们在显式参数区域中证明了淬火自由能的极限。证明结合了固定密度的一阶矩渐近分析、双重叠二阶矩变分分析以及针对关联结构中短环的子图条件化论证。主要技术难点在于确定副本对称鞍点是二阶矩率函数唯一全局最大化子的参数区间。在这些区间内,总匹配配分函数的归一化对数依概率收敛于显式变分值。我们还证明了当最大密度位于已验证的副本对称区域内时相应的加权配分函数结果,给出了该区域的一个可检验的附加判据,并记录了最大匹配规模的一阶矩上尾估计。

0
下载
预览

We present a knowledge compilation approach for existential and universal quantification in alternating automata. Knowledge compilation transforms formulas into normal forms with special properties that enable efficient answering of questions of interest. For Boolean formulas, several normal forms that have proven effective for existential/universal quantification, and even for functional synthesis, have been studied in the literature. For infinite word automata, quantification is a fundamental operation in verification tasks such as QPTL satisfiability checking and HyperLTL model checking. Existing algorithms rely on nondeterministic infinite word automata, where existential projection can be efficiently performed state-wise, but universal projection requires complementation. Complementing nondeterministic infinite word automata, however, is expensive in practice, making existing algorithms infeasible for automata in practice. Towards addressing this problem, we propose novel knowledge compilation techniques for existential and universal quantification on alternating safety automata. Our approach compiles alternating automata into normal forms where projection can be applied uniformly and efficiently to each state's transition function. Using the compilations for each type of quantification, we can effectively eliminate a sequence of alternating quantifiers in formulas without complementation. Our BDD-based prototype demonstrates the practical effectiveness of our algorithms on a suite of QPTL satisfiability benchmarks.


翻译:我们提出了一种面向交替自动机中存在量词和全称量词的知识编译方法。知识编译将公式转化为具有特殊性质的规范形式,从而能够高效回答特定查询问题。对于布尔公式,已有文献研究了多种在存在/全称量化及功能综合场景下被证明有效的规范形式。在无限词自动机中,量化是QPTL可满足性检测和HyperLTL模型检验等验证任务的基础操作。现有算法依赖非确定性无限词自动机,其中存在投影可按状态高效执行,但全称投影需要补集操作。然而,非确定性无限词自动机的补集操作在实际应用中代价高昂,导致现有算法难以处理实际自动机。针对这一问题,我们提出了面向交替安全自动机存在量词和全称量词的新型知识编译技术。该方法将交替自动机编译为规范形式,使得投影操作能够统一且高效地应用于每个状态的转移函数。通过针对每种量化类型进行编译,我们可以在无需补集操作的情况下有效消解公式中的交替量词序列。基于BDD的原型系统在QPTL可满足性基准测试集上的实验验证了我们算法的实际有效性。

0
下载
预览

This paper presents a comprehensive documentation of RenCon 2025, the revival of the expressive performance rendering competition which took place at ISMIR 2025 in Daejeon, Korea. The competition attracted 9 entries from international research groups, representing diverse approaches to expressive piano performance rendering. The two-phase assessment structure comprised a preliminary online evaluation and live real-time rendering at the conference. We analyze the competition format, participant demographics, system performance, and lessons learned for future iterations. The results demonstrate significant advances in expressive rendering capabilities while highlighting remaining challenges in achieving human-level musical expression.


翻译:本文全面记录了RenCon 2025竞赛——一场富有表现力的演奏渲染竞赛的复兴,该竞赛于2025年在韩国大田举行的ISMIR会议上举办。竞赛吸引了来自国际研究团体的9个参赛作品,代表了多种富有表现力的钢琴演奏渲染方法。两阶段的评估架构包括初步在线评估和会议期间的现场实时渲染。我们分析了竞赛形式、参与者构成、系统性能以及为未来迭代积累的经验教训。结果展示了富有表现力渲染能力的显著进步,同时突显了在实现人类水平音乐表达方面仍存在的挑战。

0
下载
预览

While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gradient conflict persists within each expert even when routing is maximally polarized. Moreover, activation-subspace protection can also fail because, under parameter-efficient fine-tuning, it entangles tasks due to a dimension-counting bound, and federated averaging (FedAvg) disrupts client-side orthogonality. To address this, we propose PRISM (Per-expert Routing-projection Interference-informed Subspace Method), which maintains a per-expert gradient subspace basis whose orthogonality is preserved under FedAvg and reinterprets MoE routing as a capacity allocator. Our results show that, on LLaVA-1.5-7B, LLaVA-1.5-13B, and Qwen2.5-VL-7B across CoIN-6 and CoIN-Long-10, PRISM outperforms sixteen the state of the art baselines in average accuracy. Compared to the best federated multimodal baseline, the performance margin increases from +3.23 pp on CoIN-6 to +6.06 pp on CoIN-Long-10.


翻译:现有基于混合专家低秩适配(MoE-LoRA)的联邦多模态持续学习方法,建立在路由机制将任务特定知识隔离至互不相交专家模块这一未经验证的假设之上。我们论证:路由执行的是逐样本操作,而遗忘沿任务序列持续累积,即使在路由极化达到最大时,每个专家内部仍存在梯度冲突。此外,激活子空间保护机制也可能失效——在参数高效微调场景下,由于维度计数限制,该机制会导致任务纠缠,且联邦平均(FedAvg)会破坏客户端侧的向量正交性。为解决上述问题,我们提出PRISM(逐专家路由-投影干扰感知子空间方法),该方法维护逐专家梯度子空间基,该基可在FedAvg下保持正交性,并重新诠释MoE路由作为容量分配器的角色。实验结果表明,在LLaVA-1.5-7B、LLaVA-1.5-13B及Qwen2.5-VL-7B模型上,面对CoIN-6与CoIN-Long-10基准测试集,PRISM在平均准确率上全面超越十六个当前最优基线方法。相较于最优联邦多模态基线,性能优势从CoIN-6上的+3.23个百分点扩展至CoIN-Long-10上的+6.06个百分点。

0
下载
预览

The Glivenko--Cantelli theorem is a uniform version of the strong law of large numbers. It states that for every IID sequence of random variables, the empirical measure converges to the underlying distribution (in the sense of uniform convergence of the CDF). In this work, we provide tools to study such limits of empirical measures in categorical probability. We propose two axioms, namely permutation invariance and empirical adequacy, that a morphism of type $X^{\mathbb{N}} \to X$ should satisfy to be interpretable as taking an infinite sequence as input and producing a sample from its empirical measure as output. Since not all sequences have a well-defined empirical measure, such \emph{empirical sampling morphisms} live in quasi-Markov categories, which, unlike Markov categories, allow for partial morphisms. Given an empirical sampling morphism and a few other properties, we prove representability as well as abstract versions of the de Finetti theorem, the Glivenko--Cantelli theorem and the strong law of large numbers. We provide several concrete constructions of empirical sampling morphisms as partially defined Markov kernels on standard Borel spaces. Instantiating our abstract results then recovers the standard Glivenko--Cantelli theorem and the strong law of large numbers for random variables with finite first moment. Our work thus provides a joint proof of these two theorems in conjunction with the de Finetti theorem from first principles.


翻译:格利文科-坎特利定理是强大数定律的一致版本。它指出,对于每个独立同分布的随机变量序列,经验测度收敛于潜在分布(在累积分布函数一致收敛的意义上)。在本文中,我们提供了在分类概率中研究此类经验测度极限的工具。我们提出了两个公理,即置换不变性和经验充分性,类型为$X^{\mathbb{N}} \to X$的态射应当满足这些条件,才能被解释为将无穷序列作为输入,并输出其经验测度的一个样本。由于并非所有序列都具有定义良好的经验测度,这种*经验采样态射*存在于准马尔可夫范畴中,与马尔可夫范畴不同,准马尔可夫范畴允许偏态射。给定一个经验采样态射及其他几个性质,我们证明了可表示性以及德芬内蒂定理、格利文科-坎特利定理和强大数定律的抽象版本。我们提供了在标准博雷尔空间上作为部分定义马尔可夫核的经验采样态射的几种具体构造。实例化我们的抽象结果后,可以恢复标准格利文科-坎特利定理以及针对具有有限一阶矩的随机变量的强大数定律。因此,我们的工作从第一原理出发,联合证明了这两个定理以及德芬内蒂定理。

0
下载
预览

We investigate the Gerver-Ramsey collinearity problem of determining the maximum number of points in a north-east lattice path without $k$ collinear points. Using a satisfiability solver, up to isomorphism we enumerate all north-east lattice paths avoiding $k$ collinear points for $k \leq 6$. We also find a north-east lattice path avoiding $k = 7$ collinear points with 327 steps, improving on the previous best length of 260 steps found by Shallit.


翻译:我们研究了Gerver-Ramsey共线性问题,即确定东北格点路径中无$k$个共线点的最大点数。通过使用可满足性求解器,我们在同构意义下枚举了所有避免$k \leq 6$个共线点的东北格点路径。此外,我们发现了一条避免$k = 7$个共线点、步长为327的东北格点路径,改进了Shallit此前发现的260步的最优长度。

0
下载
预览

LTLf synthesis under partial observability requires reasoning about unobservable environment variables, which is typically handled by constructing a belief-state DFA via subset construction that universally quantifies these variables. Existing approaches perform this construction as a separate step prior to game solving, often generating belief states that are unnecessary in practice. We propose an on-the-fly approach to LTLf synthesis under partial observability based on observable progression. Our method incrementally builds the belief-state DFA by progressing the specification with respect to observable variables only, universally quantifying unobservable variables on the fly. We prove the correctness of the construction and show that it naturally enables on-the-fly game solving, leading to a fully on-the-fly synthesis framework. Our implementation leverages DFAs represented using Multi-Terminal Binary Decision Diagrams: a compact representation that has proven highly effective for LTLf synthesis under full observability. Experimental results demonstrate that our approach significantly outperforms existing methods and further highlight the practical benefits of integrating on-the-fly game solving with belief-state construction.


翻译:部分可观测条件下的LTLf综合需要推理不可观测的环境变量,通常通过子集构造构建信念状态确定有限自动机(DFA)并对这些变量进行全称量化来处理。现有方法将这一构造作为博弈求解前的独立步骤执行,常会生成实际中不必要的信念状态。我们提出一种基于可观测演进的在线方法,用于求解部分可观测条件下的LTLf综合。该方法仅依据可观测变量对规范进行演进,在线对不可观测变量进行全称量化,从而增量式构建信念状态DFA。我们证明了构造的正确性,并表明其能自然实现在线博弈求解,进而形成完整的在线综合框架。我们的实现采用多终端二元决策图表示的DFA:该紧凑表示已在完全可观测条件下的LTLf综合中展现出高效性。实验结果表明,我们的方法显著优于现有方案,并进一步凸显了将在线博弈求解与信念状态构建相结合的实际优势。

0
下载
预览

Continual learning systems face a fundamental tension between plasticity -- acquiring new knowledge -- and stability -- retaining prior knowledge. We introduce MPCS (Multi-Plasticity Continual System), a neuroplastic architecture that integrates eleven complementary mechanisms: task-driven neurogenesis, Fourier-encoded inputs, EWC regularization, meta-replay, mixed consolidation, hybrid gating, synapse pruning/regeneration, Hebbian updates, task similarity routing, adaptive growth control, and continuous neuron importance tracking. We evaluate MPCS on MEP-BENCH, a multi-track benchmark spanning 31 tasks across regression, classification, logic, and mixed domains, using a three-dimensional Pareto criterion over task performance (Perf), representation diversity (RD), and gradient conflict rate (GCR). Across 15 ablation configurations (3 seeds x 4 tracks x 2000 epochs), MPCS achieves a Normalized Efficiency Score of 94.2, placing it on the Pareto frontier among 9 of 14 gate-passing systems. Key findings: (i) Fourier encoding is the single most critical component (removal drops Perf by 30.7 pp and fails the MEP gate on 14% of tasks); (ii) global EWC degrades performance (NES = -4.2); topology-local EWC reduces this penalty (NES 90.5->91.8) but does not eliminate it; removing EWC entirely yields MPCS_EFFICIENT, the highest-Perf system -- establishing a monotone relationship in the high task-similarity regime (s_bar ~= 0.95): global EWC < topology EWC < no EWC; (iii) the Pareto status assessment is predictive: removing the two Pareto-dominated components (EWC + Hebbian) jointly yields MPCS_EFFICIENT, which improves Perf by 0.6 pp at 4.7x lower compute cost (127 vs. 602 min), validating the Pareto frontier as an actionable model-compression guide.


翻译:持续学习系统面临可塑性(获取新知识)与稳定性(保持已有知识)之间的根本性矛盾。我们提出MPCS(多可塑性持续系统),这是一种融合十一种互补机制的神经可塑性架构:任务驱动神经发生、傅里叶编码输入、EWC正则化、元回放、混合巩固、混合门控、突触修剪/再生、赫布更新、任务相似性路由、自适应生长控制及连续神经元重要性跟踪。我们在涵盖回归、分类、逻辑与混合领域共31项任务的多轨道基准MEP-BENCH上,采用基于任务性能(Perf)、表征多样性(RD)与梯度冲突率(GCR)的三维帕累托准则对MPCS进行评估。在15种消融配置(3种子×4轨道×2000轮次)下,MPCS实现了94.2的标准化效率分数(NES),在14个门控通过系统中占据9个系统的帕累托前沿前沿。关键发现:(i)傅里叶编码是最关键的单一组件(移除后Perf下降30.7个百分点,并在14%任务上未通过MEP门控);(ii)全局EWC降低性能(NES=-4.2);拓扑局部化EWC可减轻惩罚(NES从90.5提升至91.8)但未完全消除;完全移除EWC得到最高性能系统MPCS_EFFICIENT,在高任务相似性区域(s_bar≈0.95)形成单调关系:全局EWC < 拓扑EWC < 无EWC;(iii)帕累托状态评估具有预测性:联合移除两个被帕累托支配的组件(EWC+赫布更新)得到MPCS_EFFICIENT,性能提升0.6个百分点的同时计算成本降低4.7倍(127分钟 vs 602分钟),验证了帕累托前沿作为可操作模型压缩指南的有效性。

0
下载
预览

High-capacity associative memories based on Kernel Logistic Regression (KLR) exhibit strong storage capabilities, but the dynamical and geometric mechanisms underlying their stability remain poorly understood. This paper investigates the global geometry of attractor basins and the physical determinants of the storage limit in KLR-trained Hopfield networks. We combine empirical evaluations using random sequences and real-world image embeddings (CIFAR-10) with phenomenological morphing experiments and statistical Signal-to-Noise Ratio (SNR) analysis. Our experiments reveal that the network achieves a storage capacity for random sequences up to $P/N \approx 16$ , and maintains stable retrieval for structured data at effective loads near $P/N \approx 20$ . Through morphing analysis, we reveal that attractors on the "Ridge of Optimization" are separated by sharp, phase-transition-like boundaries, characterized by steep effective potential barriers and critical slowing down. Furthermore, by contrasting an SNR analysis with a geometric reference point inspired by Cover's theorem, we show that the ultimate storage limit is constrained primarily not by a lack of geometric separability in the feature space, but by the loss of dynamical stability against crosstalk noise. These findings suggest that KLR networks function as highly localized, exemplar-based memories that operate optimally just before the onset of dynamical collapse, providing new insights into the design of robust, large-scale retrieval systems.


翻译:基于核逻辑回归的高容量联想记忆模型展现出强大的存储能力,但其稳定性背后的动力学与几何机制尚不明确。本文研究了经核逻辑回归训练的霍普菲尔德网络中吸引子盆地全局几何结构及存储极限的物理决定因素。我们结合随机序列与真实图像嵌入(CIFAR-10)的实证评估、现象学形变实验及统计信噪比分析,揭示了网络在随机序列下的存储容量可达 $P/N \approx 16$,而对结构化数据在有效负载接近 $P/N \approx 20$ 时仍保持稳定检索。通过形变分析,我们发现在"优化脊"上的吸引子被尖锐的相变式边界分隔,其特征为陡峭的有效势垒与临界减速效应。此外,通过对比信噪比分析与基于Cover定理的几何参照点,我们证明存储极限主要受制于对串扰噪声的动力学稳定性丧失,而非特征空间几何可分离性的不足。这些发现表明核逻辑回归网络作为高度局部化的范例记忆系统,恰好在动力学崩溃临界点前达到最优性能,为设计鲁棒的大规模检索系统提供了新见解。

0
下载
预览

Modern Vision-Language Models (VLMs) achieve impressive performance but are limited by the quadratic complexity of self-attention, which prevents their deployment on edge devices and makes their understanding of high-resolution images and long-context videos prohibitively expensive. To address this challenge, we introduce LinMU (Linear-complexity Multimodal Understanding), a VLM design that achieves linear complexity for the language model decoder without using any quadratic-complexity modules while maintaining the performance of global-attention-based VLMs. LinMU replaces every self-attention layer in the language model decoder with an M-MATE block: a dual-branch module that combines a bidirectional state-space model for global context (Flex-MA branch) with localized Swin-style window attention (Local-Swin branch) for adjacent correlations. To transform a pre-trained VLM into the LinMU architecture, we propose a three-stage distillation framework that (i) initializes both branches with self-attention weights and trains the Flex-MA branch alone, (ii) unfreezes the Local-Swin branch and fine-tunes it jointly with the Flex-MA branch, and (iii) unfreezes the remaining blocks and fine-tunes them using LoRA adapters, while regressing on hidden states and token-level logits of the frozen VLM teacher. On MMMU, TextVQA, LongVideoBench, Video-MME, and other benchmarks, LinMU matches the performance of teacher models, yet reduces Time-To-First-Token (TTFT) by up to 2.7$\times$ and improves token throughput by up to 9.0$\times$ on minute-length videos. Ablations confirm the importance of each distillation stage and the necessity of the two branches of the M-MATE block. The proposed framework demonstrates that state-of-the-art multimodal reasoning can be achieved without quadratic attention, thus opening up avenues for long-context VLMs that can deal with high-resolution images and long videos.


翻译:现代视觉语言模型(VLM)虽性能卓越,但受限于自注意力的二次复杂度,导致其无法部署于边缘设备,且在高分辨率图像和长视频理解中计算成本过高。为应对这一挑战,我们提出LinMU(线性复杂度多模态理解模型),该VLM架构在不使用任何二次复杂度模块的前提下,实现语言模型解码器的线性复杂度,同时保持基于全局注意力的VLM性能。LinMU将语言模型解码器中的每个自注意力层替换为M-MATE模块:一种双分支结构,其结合用于全局上下文的双向状态空间模型(Flex-MA分支)与用于局部关联的Swin式窗口注意力(Local-Swin分支)。为将预训练VLM转化为LinMU架构,我们提出三阶段蒸馏框架:(i)用自注意力权重初始化两个分支并单独训练Flex-MA分支;(ii)解冻Local-Swin分支并与Flex-MA分支联合微调;(iii)解冻其余模块并通过LoRA适配器进行微调,同时回归冻结VLM教师模型的隐藏状态与token级逻辑值。在MMMU、TextVQA、LongVideoBench、Video-MME等基准测试中,LinMU与教师模型性能持平,但在分钟级视频任务中,首次生成时间(TTFT)降低高达2.7倍,token吞吐量提升高达9.0倍。消融实验证实了各蒸馏阶段的重要性及M-MATE模块双分支的必要性。该框架表明,无需二次注意力即可实现最先进的多模态推理,为处理高分辨率图像与长视频的长上下文VLM开辟新路径。

0
下载
预览

We analyze the sum-of-squares rank of unweighted instances of the Minimum Knapsack (MK) problem, i.e., minimization of $\sum_{i=1}^n x_i$ for 0/1 variables under the constraint $\sum_{i=1}^n x_i \geq q$, with $q \in \mathbb{R}$. Such instances have long served as a testbed for understanding the limitations of lift-and-project methods in Boolean optimization. For example, both the Lovász-Schrijver and Sherali-Adams hierarchies require (maximal) rank $n$ to solve them, already when $q=1/2$ is constant. The SOS hierarchy requires only \emph{sublinear} rank $O(\sqrt{n})$ to solve unweighted MK when $q=1/2$. On the other hand, when $q$ is allowed to vary with~$n$, the SOS rank of the problem may become linear. Interestingly, this is known to happen both when $q$ is large, and when $q$ is very small ($0<q \leq 2^{-n}$). This raises the question of whether we should think of hard instances of unweighted MK as being typical for the SOS hierarchy, or as a consequence of very specific choices of the threshold parameter $q$. In this paper, we address this question by showing new upper and lower bounds on the SOS rank of unweighted MK in the whole regime of the parameter $q$. For $n-q \leq O(1)$, we show that the SOS rank is constant. In contrast, when $q \leq O(1)$, a linear rank is needed if $q$ is exponentially close to an integer. As our main positive result, we show that linear rank is very rare for $q \leq O(1)$. This can be expressed in the language of smoothed analysis: after perturbing $q$ by a Gaussian with mean $0$ and variance $σ^2$, the expected SOS rank of MK is $O(\sqrt{n} \log (n/σ))$.


翻译:我们分析了未加权最小背包(MK)问题的平方和秩,即对于0/1变量在约束条件∑ᵢ₌₁ⁿ xᵢ ≥ q(q∈ℝ)下最小化∑ᵢ₌₁ⁿ xᵢ。此类实例长期作为理解布尔优化中提升-投影方法局限性的测试平台。例如,即使当q=1/2为常数时,Lovász-Schrijver和Sherali-Adams层级都需要(最大)秩n才能求解它们。当q=1/2时,SOS层级仅需次线性秩O(√n)即可求解未加权MK。另一方面,当q随n变化时,该问题的SOS秩可能变为线性。有趣的是,已知这种情况在q较大时以及q非常小时(0<q≤2⁻ⁿ)均会发生。这引发了一个问题:我们应将未加权MK的困难实例视为SOS层级的典型现象,还是阈值参数q的特定选择所致?本文通过展示参数q整个范围内未加权MK的SOS秩新上下界来回答此问题。当n-q≤O(1)时,我们证明SOS秩为常数。相比之下,当q≤O(1)且q指数级接近整数时,则需要线性秩。作为主要正面结果,我们表明对于q≤O(1),线性秩极为罕见。这可用平滑分析的语言表达:在q添加均值为0、方差为σ²的高斯扰动后,MK的期望SOS秩为O(√n log(n/σ))。

0
下载
预览

We characterise the computational power of recurrent graph neural networks (GNNs) in terms of arithmetic circuits over the real numbers. Our networks are not restricted to aggregate-combine GNNs or other particular types. Generalising similar notions from the literature, we introduce the model of recurrent arithmetic circuits, which can be seen as arithmetic analogues of sequential or logical circuits. These circuits utilise so-called memory gates which are used to store data between iterations of the recurrent circuit. While (recurrent) GNNs work on labelled graphs, we construct arithmetic circuits that obtain encoded labelled graphs as real valued tuples and then compute the same function. For the other direction we construct recurrent GNNs which are able to simulate the computations of recurrent circuits. These GNNs are given the circuit-input as initial feature vectors and then, after the GNN-computation, have the circuit-output among the feature vectors of its nodes. In this way we establish an exact correspondence between the expressivity of recurrent GNNs and recurrent arithmetic circuits operating over real numbers. Our results both deepen our understanding of the capabilities of trained neural networks and open new approaches to study recurrent neural networks using the lens of circuit complexity theory.


翻译:我们从实数域算术电路的角度刻画了递归图神经网络(GNN)的计算能力。所研究的网络并不局限于聚合-组合型GNN或其他特定类型。通过泛化文献中类似概念,我们引入了递归算术电路模型,该模型可视为序列或逻辑电路在算术领域的对应。这类电路利用所谓的记忆门,在递归电路的迭代运算间存储数据。尽管(递归)GNN作用于标记图,我们构建的算术电路可将编码后的标记图作为实数值元组获取,并计算相同函数。反之,我们构造了能够模拟递归电路计算的递归GNN——这些GNN将电路输入作为初始特征向量,在完成GNN计算后,其节点特征向量中即包含电路输出。通过这种双向对应,我们建立了递归GNN与基于实数运算的递归算术电路表达力之间的精确等价关系。该成果既深化了我们对训练后神经网络能力的认知,也为借助电路复杂度理论视角研究递归神经网络开辟了新途径。

0
下载
预览
登陆后查看更多精品内容
VIP会员
最新内容
《美战争部人工智能计划》27页slides
专知会员服务
3+阅读 · 今天3:33
下一代高超音速能力
专知会员服务
1+阅读 · 今天3:10
【ICML2026】大型视觉语言模型在注意力中迷失
专知会员服务
2+阅读 · 5月10日
美海军EA-18G“咆哮者”与电子攻击的崛起
专知会员服务
8+阅读 · 5月10日
《用于防空反导作战的分布式控制技术》
专知会员服务
8+阅读 · 5月10日
本周荟萃主题
区块链
区块链(Blockchain)是由节点参与的分布式数据库系统,它的特点是不可更改,不可伪造,也可以将其理解为账簿系统(ledger)。它是比特币的一个重要概念,完整比特币区块链的副本,记录了其代币(token)的每一笔交易。通过这些信息,我们可以找到每一个地址,在历史上任何一点所拥有的价值。
深度学习
机器学习的一个分支,它基于试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的一系列算法。
机器学习
“机器学习是近20多年兴起的一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习理论主要是设计和分析一些让 可以自动“ 学习”的算法。机器学习算法是一类从数据中自动分析获得规律,并利用规律对未知数据进行预测的算法。因为学习算法中涉及了大量的统计学理论,机器学习与统计推断学联系尤为密切,也被称为统计学习理论。算法设计方面,机器学习理论关注可以实现的,行之有效的学习算法。很多 推论问题属于 无程序可循难度,所以部分的机器学习研究是开发容易处理的近似算法。”

——中文维基百科
强化学习
强化学习(RL)是机器学习的一个领域,与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外,强化学习是三种基本的机器学习范式之一。 强化学习与监督学习的不同之处在于,不需要呈现带标签的输入/输出对,也不需要显式纠正次优动作。相反,重点是在探索(未知领域)和利用(当前知识)之间找到平衡。 该环境通常以马尔可夫决策过程(MDP)的形式陈述,因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于,后者不假设MDP的确切数学模型,并且针对无法采用精确方法的大型MDP。
推荐系统
推荐系统,是指根据用户的习惯、偏好或兴趣,从不断到来的大规模信息中识别满足用户兴趣的信息的过程。推荐推荐任务中的信息往往称为物品(Item)。根据具体应用背景的不同,这些物品可以是新闻、电影、音乐、广告、商品等各种对象。推荐系统利用电子商务网站向客户提供商品信息和建议,帮助用户决定应该购买什么产品,模拟销售人员帮助客户完成购买过程。个性化推荐是根据用户的兴趣特点和购买行为,向用户推荐用户感兴趣的信息和商品。随着电子商务规模的不断扩大,商品个数和种类快速增长,顾客需要花费大量的时间才能找到自己想买的商品。这种浏览大量无关的信息和产品过程无疑会使淹没在信息过载问题中的消费者不断流失。为了解决这些问题,个性化推荐系统应运而生。个性化推荐系统是建立在海量数据挖掘基础上的一种高级商务智能平台,以帮助电子商务网站为其顾客购物提供完全个性化的决策支持和信息服务。
卷积神经网络
在深度学习中,卷积神经网络(CNN或ConvNet)是一类深度神经网络,最常用于分析视觉图像。基于它们的共享权重架构和平移不变性特征,它们也被称为位移不变或空间不变的人工神经网络(SIANN)。它们在图像和视频识别,推荐系统,图像分类,医学图像分析,自然语言处理,和财务时间序列中都有应用。
计算机网络
计算机网络( Computer Networks )指将地理位置不同的多台计算机及其外部设备,通过通信线路连接起来,在网络操作系统及网络通信协议的管理和协调下,实现资源共享和信息传递的计算机系统。
命名实体识别
命名实体识别(NER)(也称为实体标识,实体组块和实体提取)是信息抽取的子任务,旨在将非结构化文本中提到的命名实体定位和分类为预定义类别,例如人员姓名、地名、机构名、专有名词等。
机器翻译
机器翻译,又称为自动翻译,是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。它是计算语言学的一个分支,是人工智能的终极目标之一,具有重要的科学研究价值。
计算机视觉
计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取‘信息’的人工智能系统。
微信扫码咨询专知VIP会员