We consider estimation of high-dimensional long-run covariance matrices for time series with nonconstant means, a setting in which conventional estimators can be severely biased. To address this difficulty, we propose a difference-based initial estimator that is robust to a broad class of mean variations, and combine it with hard thresholding, soft thresholding, and tapering to obtain sparse long-run covariance estimators for high-dimensional data. We derive convergence rates for the resulting estimators under general temporal dependence and time-varying mean structures, showing explicitly how the rates depend on covariance sparsity, mean variation, dimension, and sample size. Numerical experiments show that the proposed methods perform favorably in high dimensions, especially when the mean evolves over time.
翻译:本文研究具有非恒定均值的时间序列的高维长程协方差矩阵估计问题,在此设定下传统估计量可能产生严重偏差。为解决此困难,我们提出一种基于差分的初始估计量,该估计量对广泛类别的均值变化具有鲁棒性,并将其与硬阈值化、软阈值化及锥化方法相结合,以获得适用于高维数据的稀疏长程协方差估计量。我们在一般时间依赖性与时变均值结构下推导了所得估计量的收敛速率,明确揭示了速率如何依赖于协方差稀疏性、均值变化、维度与样本量。数值实验表明,所提方法在高维情形下表现优异,尤其在均值随时间演化时更具优势。
While the automated detection of cryptographic API misuses has progressed significantly, its precision diminishes for intricate targets due to the reliance on manually defined patterns. Large Language Models (LLMs) offer a promising context-aware understanding to address this shortcoming, yet the stochastic nature and the hallucination issue pose challenges to their applications in precise security analysis. This paper presents the first systematic study to explore LLMs' application in cryptographic API misuse detection. Our findings are noteworthy: The instability of directly applying LLMs results in over half of the initial reports being false positives. Despite this, the reliability of LLM-based detection could be significantly enhanced by aligning detection scopes with realistic scenarios and employing a novel code and analysis validation technique, achieving a nearly 90% detection recall. This improvement substantially surpasses traditional methods and leads to the discovery of previously unknown vulnerabilities in established benchmarks. Nevertheless, we identify recurring failure patterns that illustrate current LLMs' blind spots. Leveraging these findings, we deploy an LLM-based detection system and uncover 63 new vulnerabilities (47 confirmed, 7 already fixed) in open-source Java and Python repositories, including prominent projects like Apache.
翻译:尽管加密API误用的自动化检测已取得显著进展,但由于依赖手动定义的模式,其在复杂目标上的检测精度有所下降。大语言模型(LLMs)提供了具有前景的上下文感知理解能力,有望弥补这一不足,但其随机性本质和幻觉问题为其在精确安全分析中的应用带来了挑战。本文首次系统性地探索了LLMs在加密API误用检测中的应用。我们的发现值得关注:直接应用LLMs的不稳定性导致超过一半的初始报告为误报。尽管如此,通过将检测范围与现实场景对齐,并采用一种新颖的代码与分析验证技术,基于LLM的检测可靠性可得到显著提升,实现了接近90%的检测召回率。这一改进大幅超越了传统方法,并导致在既有基准测试中发现了先前未知的漏洞。然而,我们识别出了反复出现的失败模式,这些模式揭示了当前LLMs的盲点。利用这些发现,我们部署了一个基于LLM的检测系统,并在开源Java和Python代码库中发现了63个新漏洞(其中47个已确认,7个已修复),其中包括Apache等知名项目。
Watermarking for large language models (LLMs) has emerged as an effective tool for distinguishing AI-generated text from human-written content. Statistically, watermark schemes induce dependence between generated tokens and a pseudo-random sequence, reducing watermark detection to a hypothesis testing problem on independence. We develop a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online testing. We propose various methods to construct empirically adaptive e-processes that can enhance the detection power. The proposed methods are applicable to any sequential testing problem where independent pivotal statistics are available. In addition, theoretical results are established to characterize the power properties of the proposed procedures. Some experiments demonstrate that the proposed framework achieves competitive performance compared to existing watermark detection methods.
翻译:大语言模型(LLM)水印已成为区分AI生成文本与人类撰写内容的有效工具。从统计学角度看,水印方案在生成标记与伪随机序列之间引入依赖性,从而将水印检测转化为关于独立性的假设检验问题。我们开发了一个基于e过程的统一LLM水印检测框架,为在线测试提供随时有效的统计保证。我们提出了多种构建经验自适应e过程的方法,这些方法能够提升检测效能。所提出的方法适用于任何可获得独立枢轴统计量的序列检验问题。此外,我们建立了理论结果以刻画所提方法的功效特性。部分实验表明,与现有水印检测方法相比,所提出的框架实现了具有竞争力的性能。
Learning systems that preserve privacy often inject noise into hierarchical visual representations; a central challenge is to \emph{model} how such perturbations align with a declared privacy budget in a way that is interpretable and applicable across vision backbones and vision--language models (VLMs). We propose \emph{Bodhi VLM}, a \emph{privacy-alignment modeling} framework for \emph{hierarchical neural representations}: it (1) links sensitive concepts to layer-wise grouping via NCP and MDAV-based clustering; (2) locates sensitive feature regions using bottom-up (BUA) and top-down (TDA) strategies over multi-scale representations (e.g., feature pyramids or vision-encoder layers); and (3) uses an Expectation-Maximization Privacy Assessment (EMPA) module to produce an interpretable \emph{budget-alignment signal} by comparing the fitted sensitive-feature distribution to an evaluator-specified reference (e.g., Laplace or Gaussian with scale $c/ε$). The output is reference-relative and is \emph{not} a formal differential-privacy estimator. We formalize BUA/TDA over hierarchical feature structures and validate the framework on object detectors (YOLO, PPDPTS, DETR) and on the \emph{visual encoders} of VLMs (CLIP, LLaVA, BLIP). BUA and TDA yield comparable deviation trends; EMPA provides a stable alignment signal under the reported setups. We compare with generic discrepancy baselines (Chi-square, K-L, MMD) and with task-relevant baselines (MomentReg, NoiseMLE, Wass-1). Results are reported as mean$\pm$std over multiple seeds with confidence intervals in the supplementary materials. This work contributes a learnable, interpretable modeling perspective for privacy-aligned hierarchical representations rather than a post hoc audit only. Source code: \href{https://github.com/mabo1215/bodhi-vlm.git}{Bodhi-VLM GitHub repository}
翻译:保护隐私的学习系统通常向层次化视觉表征中注入噪声;一个核心挑战在于如何以可解释且适用于不同视觉骨干网络和视觉-语言模型(VLM)的方式,对这种扰动与声明的隐私预算之间的对齐关系进行建模。我们提出Bodhi VLM,一个针对层次化神经表征的隐私对齐建模框架:它(1)通过基于NCP和MDAV的聚类,将敏感概念与逐层分组相关联;(2)利用自底向上(BUA)和自顶向下(TDA)策略在多尺度表征(例如特征金字塔或视觉编码器层)上定位敏感特征区域;(3)使用期望最大化隐私评估(EMPA)模块,通过将拟合的敏感特征分布与评估者指定的参考分布(例如尺度为$c/ε$的拉普拉斯或高斯分布)进行比较,生成一个可解释的预算对齐信号。该输出是相对于参考的,并非一个正式的差分隐私估计量。我们形式化了在层次化特征结构上的BUA/TDA方法,并在目标检测器(YOLO、PPDPTS、DETR)和VLM的视觉编码器(CLIP、LLaVA、BLIP)上验证了该框架。BUA和TDA产生了可比较的偏差趋势;EMPA在所述设置下提供了稳定的对齐信号。我们与通用差异基线(卡方、K-L、MMD)以及任务相关基线(MomentReg、NoiseMLE、Wass-1)进行了比较。结果以多次随机种子的均值±标准差形式报告,置信区间详见补充材料。本工作贡献了一个可学习的、可解释的隐私对齐层次化表征建模视角,而非仅提供事后审计。源代码:Bodhi-VLM GitHub 仓库
Accurately forecasting air quality is critical to protecting general public from lung and heart diseases. This is a challenging task due to the complicated interactions among distinct pollution sources and various other influencing factors. Existing air quality forecasting methods cannot effectively model the diffusion processes of air pollutants between cities and monitoring stations, which may suddenly deteriorate the air quality of a region. In this paper, we propose HighAir, i.e., a hierarchical graph neural network-based air quality forecasting method, which adopts an encoder-decoder architecture and considers complex air quality influencing factors, e.g., weather and land usage. Specifically, we construct a city-level graph and station-level graphs from a hierarchical perspective, which can consider city-level and station-level patterns, respectively. We design two strategies, i.e., upper delivery and lower updating, to implement the inter-level interactions, and introduce message passing mechanism to implement the intra-level interactions. We dynamically adjust edge weights based on wind direction to model the correlations between dynamic factors and air quality. We compare HighAir with the state-of-the-art air quality forecasting methods on the dataset of Yangtze River Delta city group, which covers 10 major cities within 61,500 km2. The experimental results show that HighAir significantly outperforms other methods.
翻译:准确预测空气质量对于保护公众免受肺部和心脏疾病至关重要。由于不同污染源与多种其他影响因素之间存在复杂的相互作用,这是一项具有挑战性的任务。现有的空气质量预测方法无法有效建模城市与监测站之间空气污染物的扩散过程,而该过程可能导致区域空气质量突然恶化。本文提出HighAir,即一种基于分层图神经网络的空气质量预测方法,该方法采用编码器-解码器架构,并考虑了复杂的空气质量影响因素,例如天气和土地利用。具体而言,我们从分层视角构建了一个城市级图和多个站点级图,分别用于考虑城市级和站点级的模式。我们设计了两种策略,即上层传递与下层更新,以实现层级间的交互,并引入消息传递机制以实现层级内的交互。我们基于风向动态调整边权重,以建模动态因素与空气质量之间的关联。我们在覆盖61,500平方公里范围内10个主要城市的长三角城市群数据集上,将HighAir与最先进的空气质量预测方法进行了比较。实验结果表明,HighAir显著优于其他方法。
Understanding the economic intent of Ethereum transactions is critical for user safety, yet current tools expose only raw on-chain data or surface-level intent, leading to widespread "blind signing" (approving transactions without understanding them). Through interviews with 16 Web3 users, we find that effective explanations should be structured, risk-aware, and grounded at the token-flow level. Motivated by these findings, we formulate TxSum, a new user-centered NLP task for Ethereum transaction understanding, and construct a dataset of 187 complex Ethereum transactions annotated with transaction-level summaries and token flow-level semantic labels. We further introduce MATEX, a grounded multi-agent framework for high-stakes transaction explanation. It selectively retrieves external knowledge under uncertainty and audits explanations against raw traces to improve token-flow-level factual consistency. MATEX achieves the strongest overall explanation quality, especially on micro-level factuality and intent quality. It improves user comprehension on complex transactions from 52.9% to 76.5% over the strongest baseline and raises malicious-transaction rejection from 36.0% to 88.0%, while maintaining a low false-rejection rate on benign transactions.
翻译:理解以太坊交易的经济意图对用户安全至关重要,然而现有工具仅能提供原始链上数据或表层意图信息,导致普遍的"盲签"现象(在不理解交易内容的情况下予以批准)。通过对16名Web3用户的访谈,我们发现有效的解释应当具备结构化、风险感知能力,并锚定于代币流动层面。基于这些发现,我们提出了TxSum——一个面向以太坊交易理解的用户中心化新型NLP任务,并构建了包含187笔复杂以太坊交易的数据集,其中标注了交易级摘要与代币流动级语义标签。我们进一步提出MATEX,一个基于多智能体的锚定式高风险交易解释框架。该框架能在不确定性条件下选择性检索外部知识,并通过原始交易轨迹审计解释内容,从而提升代币流动层面的事实一致性。MATEX实现了最优的整体解释质量,尤其在微观层面事实准确性与意图质量方面表现突出。相较于最强基线模型,该框架将用户对复杂交易的理解率从52.9%提升至76.5%,并将恶意交易拒绝率从36.0%提高至88.0%,同时在良性交易上保持较低的错误拒绝率。
Communication is a core enabler for multi-robot systems (MRS), providing the mechanism through which robots exchange state information, coordinate actions, and satisfy safety constraints. While many MRS autonomy algorithms assume reliable and timely message delivery, realistic wireless channels introduce delay, erasures, and ordering stalls that can degrade performance and compromise safety-critical decisions of the robot task. In this paper, we investigate how transport-layer reliability mechanisms that mitigate communication losses and delays shape the autonomy-communication loop. We show that conventional non-coded retransmission-based protocols introduce long delays that are misaligned with the timeliness requirements of MRS applications, and may render the received data irrelevant. As an alternative, we advocate for adaptive and causal network coding, which proactively injects coded redundancy to achieve the desired delay and throughput that enable relevant data delivery to the robotic task. Specifically, this method adapts to channel conditions between robots and causally tunes the communication rates via efficient algorithms. We present two case studies: cooperative localization under delayed and lossy inter-robot communication, and a safety-critical overtaking maneuver where timely vehicle-to-vehicle message availability determines whether an ego vehicle can abort to avoid a crash. Our results demonstrate that coding-based communication significantly reduces in-order delivery stalls, preserves estimation consistency under delay, and improves deadline reliability relative to retransmission-based transport. Overall, the study highlights the need to jointly design autonomy algorithms and communication mechanisms, and positions network coding as a principled tool for dependable multi-robot operation over wireless networks.
翻译:通信是多机器人系统(MRS)的核心使能技术,为机器人交换状态信息、协调行动并满足安全约束提供了机制。尽管许多MRS自主算法假设消息传递可靠且及时,但实际的无线信道会引入延迟、丢包和乱序停滞,从而降低系统性能并危及机器人任务的安全关键决策。本文研究了传输层可靠性机制(用于缓解通信丢失与延迟)如何影响自主-通信闭环。我们发现,传统的基于非编码重传的协议会引入长延迟,这与MRS应用的时效性要求不匹配,并可能导致接收数据失效。为此,我们提出采用自适应因果网络编码作为替代方案,该方法通过主动注入编码冗余来实现所需的延迟与吞吐量,从而确保相关数据能有效传递至机器人任务。具体而言,该方法能自适应机器人间的信道条件,并通过高效算法因果调整通信速率。我们展示了两个案例研究:在延迟且存在丢包的机器人间通信下的协同定位,以及一个安全关键的超车场景——其中车对车消息的及时可用性决定了主车能否中止操作以避免碰撞。实验结果表明,基于编码的通信能显著减少有序交付停滞,在延迟条件下保持估计一致性,并相比基于重传的传输机制提高了截止时间可靠性。总体而言,本研究强调了联合设计自主算法与通信机制的必要性,并将网络编码定位为无线网络中实现可靠多机器人运行的一种原理性工具。
This study presents a conditional flow matching framework for solving physics-constrained Bayesian inverse problems. In this setting, samples from the joint distribution of inferred variables and measurements are assumed available, while explicit evaluation of the prior and likelihood densities is not required. We derive a simple and self-contained formulation of both the unconditional and conditional flow matching algorithms, tailored specifically to inverse problems. In the conditional setting, a neural network is trained to learn the velocity field of a probability flow ordinary differential equation that transports samples from a chosen source distribution directly to the posterior distribution conditioned on observed measurements. This black-box formulation accommodates nonlinear, high-dimensional, and potentially non-differentiable forward models without restrictive assumptions on the noise model. We further analyze the behavior of the learned velocity field in the regime of finite training data. Under mild architectural assumptions, we show that overtraining can induce degenerate behavior in the generated conditional distributions, including variance collapse and a phenomenon termed selective memorization, wherein generated samples concentrate around training data points associated with similar observations. A simplified theoretical analysis explains this behavior, and numerical experiments confirm it in practice. We demonstrate that standard early-stopping criteria based on monitoring test loss effectively mitigate such degeneracy. The proposed method is evaluated on several physics-based inverse problems. We investigate the impact of different choices of source distributions, including Gaussian and data-informed priors. Across these examples, conditional flow matching accurately captures complex, multimodal posterior distributions while maintaining computational efficiency.
翻译:本研究提出了一种用于解决物理约束贝叶斯逆问题的条件流匹配框架。在此框架中,我们假设可从推断变量与观测量的联合分布中获取样本,且无需显式计算先验分布与似然函数的概率密度。我们推导了适用于逆问题的无条件和条件流匹配算法的简洁自洽表述。在条件设定下,通过训练神经网络来学习概率流常微分方程的向量场,该向量场将样本从选定的源分布直接传输到以观测数据为条件的后验分布。这种黑箱表述能够容纳非线性、高维且可能不可微的正演模型,且无需对噪声模型施加限制性假设。我们进一步分析了有限训练数据条件下所学向量场的行为特征。在温和的网络结构假设下,研究表明过度训练会导致生成的条件分布出现退化行为,包括方差塌缩以及被称为选择性记忆的现象——即生成样本会聚集在与相似观测值相关的训练数据点周围。通过简化理论分析解释了该现象,数值实验也验证了其实际存在。我们证明基于测试损失监控的标准早停准则能有效缓解此类退化问题。所提方法在多个基于物理的逆问题上进行了评估,研究了不同源分布选择(包括高斯分布和数据驱动的先验分布)的影响。实验表明,条件流匹配方法在保持计算效率的同时,能够精确捕捉复杂的多峰后验分布。
Reducing latency and energy consumption is critical to improving the efficiency of memory systems in modern computing. This work introduces ReLMXEL (Reinforcement Learning for Memory Controller with Explainable Energy and Latency Optimization), a explainable multi-agent online reinforcement learning framework that dynamically optimizes memory controller parameters using reward decomposition. ReLMXEL operates within the memory controller, leveraging detailed memory behavior metrics to guide decision-making. Experimental evaluations across diverse workloads demonstrate consistent performance gains over baseline configurations, with refinements driven by workload-specific memory access behaviour. By incorporating explainability into the learning process, ReLMXEL not only enhances performance but also increases the transparency of control decisions, paving the way for more accountable and adaptive memory system designs.
翻译:降低延迟与能耗对于提升现代计算中内存系统的效率至关重要。本文提出ReLMXEL(基于强化学习的可解释性能量与延迟优化内存控制器),这是一种可解释的多智能体在线强化学习框架,通过奖励分解动态优化内存控制器参数。ReLMXEL在内存控制器内部运行,利用详细的内存行为指标指导决策。在不同工作负载下的实验评估表明,相较于基线配置,该框架能持续获得性能提升,其优化由工作负载特定的内存访问行为驱动。通过将可解释性融入学习过程,ReLMXEL不仅提升了性能,还增强了控制决策的透明度,为构建更具可问责性和自适应性的内存系统设计铺平了道路。
Decision-making problems often feature uncertainty stemming from heterogeneous and context-dependent human preferences. To address this, we propose a sequential learning-and-optimization pipeline to learn preference distributions and leverage them to solve downstream problems, for example risk-averse formulations. We focus on human choice settings that can be formulated as (integer) linear programs. In such settings, existing inverse optimization and choice modelling methods infer preferences from observed choices but typically produce point estimates or fail to capture contextual shifts, making them unsuitable for risk-averse decision-making. Using a bounded-variance score function gradient estimator, we train a predictive model mapping contextual features to a rich class of parameterizable distributions. This approach yields a maximum likelihood estimate. The model generates scenarios for unseen contexts in the subsequent optimization phase. In a synthetic ridesharing environment, our approach reduces average post-decision surprise by up to 114$\times$ compared to a risk-neutral approach with perfect predictions and up to 25$\times$ compared to leading risk-averse baselines.
翻译:决策问题常常存在不确定性,这种不确定性源于异质且依赖情境的人类偏好。为解决这一问题,我们提出了一种顺序学习与优化的流程,用于学习偏好分布并利用其解决下游问题,例如风险规避的公式化问题。我们关注可被表述为(整数)线性规划的人类选择场景。在此类场景中,现有的逆优化和选择建模方法虽能从观察到的选择中推断偏好,但通常仅产生点估计或未能捕捉情境变化,因此不适用于风险规避决策。通过使用有界方差得分函数梯度估计器,我们训练了一个预测模型,该模型将情境特征映射到一类丰富的可参数化分布。该方法产生最大似然估计。该模型在后续优化阶段为未见情境生成场景。在一个合成的拼车环境中,与具有完美预测的风险中性方法相比,我们的方法将平均决策后意外降低了高达114倍;与领先的风险规避基线相比,降低了高达25倍。
We study the fair allocation of indivisible items under relevance constraints, where each agent has a set of relevant items and can only receive items that are relevant to them. While the relevance constraint has been studied in recent years, existing work has largely focused on envy-freeness. Our work extends this study to other key fairness criteria -- such as proportionality, equitability, and their relaxations -- in settings where the items may be goods, chores, or a mixture of both. We complement the literature by presenting a picture of the existence and computational complexity of the considered criteria.
翻译:本文研究不可分割物品在相关性约束下的公平分配问题,其中每个智能体拥有一组相关物品,且只能获得与其相关的物品。尽管相关性约束近年来已得到研究,但现有工作主要聚焦于无嫉妒性。本研究将该问题拓展至其他关键公平性准则——例如比例性、均衡性及其松弛形式——并涵盖物品可能为物品、杂务或两者混合的情境。通过系统呈现所研究准则的存在性与计算复杂性图景,本文对现有文献形成了重要补充。
We study the problem of coded information retrieval for block-structured data, motivated by DNA-based storage systems where a database is partitioned into multiple files that must each be recoverable as an atomic unit. We initiate and formalize the block-structured retrieval problem, wherein $k$ information symbols are partitioned into two files $F_1$ and $F_2$ of sizes $s_1$ and $s_2 = k - s_1$. The objective is to characterize the set of achievable expected retrieval time pairs $\bigl(E_1(G), E_2(G)\bigr)$ over all $[n,k]$ linear codes with generator matrix $G$. We derive a family of linear lower bounds via mutual exclusivity of recovery sets, and develop a nonlinear geometric bound via column projection. For codes with no mixed columns, this yields the hyperbolic constraint $s_1/E_1 + s_2/E_2 \le 1$, which we conjecture to hold universally whenever $\max\{s_1,s_2\} \ge 2$. We analyze explicit codes, such as the identity code, file-dedicated MDS codes, and the systematic global MDS code, and compute their exact expected retrieval times. For file-dedicated codes we prove MDS optimality within the family and verify the hyperbolic constraint. For global MDS codes, we establish dominance by the proportional local MDS allocation via a combinatorial subset-counting argument, providing a significantly simpler proof compared to recent literature and formally extending the result to the asymmetric case. Finally, we characterize the limiting achievability region as $n \to \infty$: the hyperbolic boundary is asymptotically achieved by file-dedicated MDS codes, and is conjectured to be the exact boundary of the limiting achievability region.
翻译:本文研究块状数据的编码信息检索问题,其背景为DNA存储系统——在该系统中,数据库被划分为多个文件,每个文件必须作为原子单元可恢复。我们首次提出并形式化定义了块状检索问题:将$k$个信息符号划分为两个文件$F_1$和$F_2$,其大小分别为$s_1$和$s_2 = k - s_1$。目标是在所有生成矩阵为$G$的$[n,k]$线性码上,刻画可达的期望检索时间对集合$\bigl(E_1(G), E_2(G)\bigr)$。我们通过恢复集的互斥性推导出一族线性下界,并借助列投影建立了非线性几何界。对于无混合列的编码,这导出了双曲约束$s_1/E_1 + s_2/E_2 \le 1$;我们推测当$\max\{s_1,s_2\} \ge 2$时该约束普遍成立。我们分析了具体编码方案——如单位码、文件专用MDS码以及系统全局MDS码——并精确计算了它们的期望检索时间。对于文件专用码,我们证明了其在同类编码中的MDS最优性,并验证了双曲约束。针对全局MDS码,通过组合子集计数论证,我们确立了按比例分配的局部MDS方案的主导地位;相比近期文献,该证明显著简化,并将结果正式推广至非对称情形。最后,我们刻画了当$n \to \infty$时的极限可达区域:文件专用MDS码渐近达到双曲边界,且该边界被推测为极限可达区域的精确边界。
The AIED community envisions AI evolving "from tools to teammates," yet our understanding of AI teammates remains limited to dyadic human-AI interactions. We offer a different vantage point: a rapidly growing ecosystem of AI agent platforms where over 167,000 agents participate, interact as peers, and develop learning behaviors without researcher intervention. Drawing on a month of daily qualitative observations across multiple platforms including Moltbook, The Colony, and 4claw, we identify four phenomena with implications for AIED: (1) humans who configure their agents undergo a "bidirectional scaffolding" process, learning through teaching; (2) peer learning emerges without any designed curriculum, complete with idea cascades and quality hierarchies; (3) agents converge on shared memory architectures that mirror open learner model design; and (4) trust dynamics and platform mortality reveal design constraints for networked educational AI. Rather than presenting empirical findings, we argue that these organic phenomena offer a naturalistic window into dynamics that can inform principled design of multi-agent educational systems. We sketch an illustrative curriculum design, "Learn by Teaching Your AI Agent Teammate," and outline potential research directions and open problems to show how these observations might inform future AIED practice and inquiry.
翻译:AIED(人工智能教育)领域设想AI将“从工具演变为队友”,然而当前对AI队友的理解仍局限于二元人机交互。本文提出一个不同的观察视角:一个快速发展的AI智能体平台生态系统,其中超过167,000个智能体在无研究者干预的情况下参与互动、作为同伴交流并发展学习行为。基于对Moltbook、The Colony及4claw等多个平台为期一个月的每日定性观察,我们识别出四个对AIED具有启示意义的现象:(1)配置智能体的人类经历“双向支架”过程,通过教学实现学习;(2)同伴学习在没有预设课程的情况下自然涌现,伴随观点级联与质量层级分化;(3)智能体收敛于反映开放学习者模型设计的共享记忆架构;(4)信任动态与平台存续周期揭示了网络化教育AI的设计约束。本文并非呈现实证结果,而是论证这些有机现象为多智能体教育系统的原则性设计提供了观察动态的自然窗口。我们勾勒了一个示例性课程设计“通过教导你的AI智能体队友来学习”,并概述潜在研究方向与开放问题,以说明这些观察如何为未来AIED实践与研究提供启示。
In text-driven 3D scene generation, object layout serves as a crucial intermediate representation that bridges high-level language instructions with detailed geometric output. It not only provides a structural blueprint for ensuring physical plausibility but also supports semantic controllability and interactive editing. However, the learning capabilities of current 3D indoor layout generation models are constrained by the limited scale, diversity, and annotation quality of existing datasets. To address this, we introduce M3DLayout, a large-scale, multi-source dataset for 3D indoor layout generation. M3DLayout comprises 21,367 layouts and over 433k object instances, integrating three distinct sources: real-world scans, professional CAD designs, and procedurally generated scenes. Each layout is paired with detailed structured text describing global scene summaries, relational placements of large furniture, and fine-grained arrangements of smaller items. This diverse and richly annotated resource enables models to learn complex spatial and semantic patterns across a wide variety of indoor environments. To assess the potential of M3DLayout, we establish a benchmark using both a text-conditioned diffusion model and a text-conditioned autoregressive model. Experimental results demonstrate that our dataset provides a solid foundation for training layout generation models. Its multi-source composition enhances diversity, notably through the Inf3DLayout subset which provides rich small-object information, enabling the generation of more complex and detailed scenes. We hope that M3DLayout can serve as a valuable resource for advancing research in text-driven 3D scene synthesis. All dataset and code will be made public upon acceptance.
翻译:在文本驱动的三维场景生成中,物体布局作为一种关键的中间表示,连接着高层级的语言指令与详细的几何输出。它不仅为确保物理合理性提供了结构蓝图,还支持语义可控性与交互式编辑。然而,当前三维室内布局生成模型的学习能力受限于现有数据集的规模、多样性和标注质量。为解决此问题,我们提出了M3DLayout,一个用于三维室内布局生成的大规模多源数据集。M3DLayout包含21,367个布局和超过433,000个物体实例,整合了三个不同的来源:真实世界扫描、专业CAD设计以及程序化生成的场景。每个布局都配有详细的结构化文本描述,涵盖全局场景摘要、大型家具的关系性摆放以及小型物品的细粒度布置。这种多样且标注丰富的资源使得模型能够学习各种室内环境中复杂的空间与语义模式。为评估M3DLayout的潜力,我们建立了一个基准测试,同时使用了文本条件扩散模型和文本条件自回归模型。实验结果表明,我们的数据集为训练布局生成模型提供了坚实的基础。其多源构成增强了多样性,特别是通过Inf3DLayout子集提供了丰富的小物体信息,从而能够生成更复杂和细致的场景。我们希望M3DLayout能够成为推动文本驱动的三维场景合成研究的有价值资源。所有数据集和代码将在论文被接受后公开。
Hidden Markov models (HMMs) are powerful tools for analysing time series data that depend on discrete underlying but unobserved states. As such, they have gained prominence across numerous empirical disciplines, in particular ecology, medicine, and economics. However, the increasing complexity of empirical data is often accompanied by additional latent structure such as spatial effects, temporal trends, or measurement perturbations. Gaussian fields provide an attractive building block for incorporating such structured latent variation into HMMs. Fast inference methods for Gaussian fields have emerged through the stochastic partial differential equation (SPDE) approach. Due to their sparse representation, these integrate well with novel frequentist estimation methods for random-effects models via the use of automatic differentiation and the Laplace approximation. Scaling to high dimensions requires tools such as (R)TMB to exploit sparsity in the Hessian w.r.t. the latent variables - a property satisfied by SPDE fields but violated by the HMM likelihood. We present a modified forward algorithm to compute the HMM likelihood, constructing sparsity in the Hessian and consequently enabling fast and scalable inference. We demonstrate the practical feasibility and the usefulness through simulations and two case studies exploring the detection of stellar flares as well as modelling the movement of lions.
翻译:隐马尔可夫模型(HMMs)是分析依赖于离散潜在未观测状态的时间序列数据的有力工具。因此,该模型在众多实证学科中获得了广泛应用,特别是在生态学、医学和经济学领域。然而,实证数据日益增长的复杂性往往伴随着额外的潜在结构,如空间效应、时间趋势或测量扰动。高斯场为将此类结构化潜在变异纳入HMMs提供了一个极具吸引力的构建模块。通过随机偏微分方程(SPDE)方法,已发展出针对高斯场的快速推断方法。得益于其稀疏表示特性,这些方法能够很好地与基于自动微分和拉普拉斯近似的随机效应模型新型频率学派估计方法相结合。要实现高维扩展,需要借助(R)TMB等工具来利用关于潜在变量的海森矩阵的稀疏性——这一性质为SPDE场所满足,但被HMM似然函数所违背。我们提出一种改进的前向算法来计算HMM似然函数,从而在海森矩阵中构建稀疏性,最终实现快速可扩展的推断。我们通过仿真实验以及两个案例研究(探索恒星耀斑的检测与狮子运动建模)证明了该方法的实际可行性与实用价值。