In this study, we present a multimodal framework for predicting neuro-facial disorders by capturing both vocal and facial cues. We hypothesize that explicitly disentangling shared and modality-specific representations within multimodal foundation model embeddings can enhance clinical interpretability and generalization. To validate this hypothesis, we propose DIVINE a fully disentangled multimodal framework that operates on representations extracted from state-of-the-art (SOTA) audio and video foundation models, incorporating hierarchical variational bottlenecks, sparse gated fusion, and learnable symptom tokens. DIVINE operates in a multitask learning setup to jointly predict diagnostic categories (Healthy Control,ALS, Stroke) and severity levels (Mild, Moderate, Severe). The model is trained using synchronized audio and video inputs and evaluated on the Toronto NeuroFace dataset under full (audio-video) as well as single-modality (audio-only and video-only) test conditions. Our proposed approach, DIVINE achieves SOTA result, with the DeepSeek-VL2 and TRILLsson combination reaching 98.26% accuracy and 97.51% F1-score. Under modality-constrained scenarios, the framework performs well, showing strong generalization when tested with video-only or audio-only inputs. It consistently yields superior performance compared to unimodal models and baseline fusion techniques. To the best of our knowledge, DIVINE is the first framework that combines cross-modal disentanglement, adaptive fusion, and multitask learning to comprehensively assess neurological disorders using synchronized speech and facial video.


翻译:本研究提出了一种通过捕捉声音和面部线索来预测神经面部疾病的多模态框架。我们假设,在多模态基础模型嵌入中显式解耦共享表征和模态特定表征能够增强临床可解释性和泛化能力。为验证这一假设,我们提出了DIVINE——一个完全解耦的多模态框架,该框架基于从最先进的音频和视频基础模型中提取的表征进行操作,并融入了分层变分瓶颈、稀疏门控融合和可学习的症状标记。DIVINE在多任务学习设置下运行,以联合预测诊断类别(健康对照组、肌萎缩侧索硬化症、中风)和严重程度(轻度、中度、重度)。该模型使用同步的音频和视频输入进行训练,并在多伦多NeuroFace数据集上,在完整(音频-视频)以及单模态(仅音频和仅视频)测试条件下进行评估。我们提出的方法DIVINE取得了最先进的结果,其中DeepSeek-VL2与TRILLsson的组合达到了98.26%的准确率和97.51%的F1分数。在模态受限的场景下,该框架表现良好,在仅视频或仅音频输入测试时显示出强大的泛化能力。与单模态模型和基线融合技术相比,它始终能产生更优的性能。据我们所知,DIVINE是首个结合跨模态解耦、自适应融合和多任务学习,利用同步语音和面部视频全面评估神经系统疾病的框架。

1
下载
预览

Traditional reinforcement learning usually assumes either episodic interactions with resets or continuous operation to minimize average or cumulative loss. While episodic settings have many theoretical results, resets are often unrealistic in practice. The infinite-horizon setting avoids this issue but lacks non-asymptotic guarantees in online scenarios with unknown dynamics. In this work, we move towards closing this gap by introducing a reset-free framework called the periodic framework, where the goal is to find periodic policies: policies that not only minimize cumulative loss but also return the agents to their initial state distribution after a fixed number of steps. We formalize the problem of finding optimal periodic policies and identify sufficient conditions under which it is well-defined for tabular Markov decision processes. To evaluate algorithms in this framework, we introduce the periodic regret, a measure that balances cumulative loss with the terminal law constraint. We then propose the first algorithms for computing periodic policies in two multi-agent settings and show they achieve sublinear periodic regret of order $\tilde O(T^{3/4})$. This provides the first non-asymptotic guarantees for reset-free learning in the setting of $M$ homogeneous agents, for $M > 1$.


翻译:传统强化学习通常假设智能体与环境进行片段式交互(带有重置机制)或持续运行以最小化平均损失或累积损失。虽然片段式设定在理论上已有诸多成果,但重置机制在实践中往往不切实际。无限时域设定避免了这一问题,但在动态未知的在线场景中缺乏非渐近性保证。本研究通过引入一种称为周期性框架的无重置框架来弥合这一差距,其目标是寻找周期性策略:这类策略不仅最小化累积损失,还能在固定步数后将智能体返回到初始状态分布。我们形式化了寻找最优周期性策略的问题,并确定了在表格型马尔可夫决策过程中该问题定义良好的充分条件。为在此框架下评估算法,我们引入了周期性遗憾,这是一种平衡累积损失与终端律约束的度量。随后,我们提出了首个用于计算两种多智能体设定下周期性策略的算法,并证明其实现了阶为 $\tilde O(T^{3/4})$ 的次线性周期性遗憾。这为 $M$ 个同质智能体($M > 1$)场景下的无重置学习提供了首个非渐近性保证。

0
下载
预览

A central problem in machine learning theory is to characterize how learning dynamics select particular solutions among the many compatible with the training objective, a phenomenon, called implicit bias, which remains only partially characterized. In the present work, we identify a general mechanism, in terms of an explicit geometric correction of the learning dynamics, for the emergence of implicit biases, arising from the interaction between continuous symmetries in the model's parametrization and stochasticity in the optimization process. Our viewpoint is constructive in two complementary directions: given model symmetries, one can derive the implicit bias they induce; conversely, one can inverse-design a wide class of different implicit biases by computing specific redundant parameterizations. More precisely, we show that, when the dynamics is expressed in the quotient space obtained by factoring out the symmetry group of the parameterization, the resulting stochastic differential equation gains a closed form geometric correction in the stationary distribution of the optimizer dynamics favoring orbits with small local volume. We compute the resulting symmetry induced bias for a range of architectures, showing how several well known results fit into a single unified framework. The approach also provides a practical methodology for deriving implicit biases in new settings, and it yields concrete, testable predictions that we confirm by numerical simulations on toy models trained on synthetic data, leaving more complex scenarios for future work. Finally, we test the implicit bias inverse-design procedure in notable cases, including biases toward sparsity in linear features or in spectral properties of the model parameters.


翻译:机器学习理论中的一个核心问题是刻画学习动态如何从众多与训练目标兼容的解中选择特定解,这种现象被称为隐式偏差,目前仅得到部分表征。在本工作中,我们提出了一种基于学习动态显式几何修正的通用机制,用于解释隐式偏差的出现,该机制源于模型参数化中的连续对称性与优化过程中随机性之间的相互作用。我们的观点在两个互补方向上具有建设性:给定模型对称性,可以推导其诱导的隐式偏差;反之,可以通过计算特定的冗余参数化来逆向设计多种不同的隐式偏差。更精确地说,我们证明当动态在通过分解参数化对称群得到的商空间中表达时,所得随机微分方程会在优化器动态的平稳分布中获得闭合形式的几何修正,该修正倾向于局部体积较小的轨道。我们计算了一系列架构中由此产生的对称性诱导偏差,展示了多个已知结果如何融入统一的理论框架。该方法还为推导新场景中的隐式偏差提供了实用方法论,并产生了可通过数值模拟验证的具体预测——我们在合成数据训练的玩具模型上确认了这些预测,更复杂的场景留待未来研究。最后,我们在若干典型案例中测试了隐式偏差逆向设计流程,包括对线性特征稀疏性或模型参数谱特性的偏好偏差。

0
下载
预览

Yannakakis' seminal algorithm is optimal for acyclic joins, yet it has not been widely adopted due to its poor performance in practice. This paper briefly surveys recent advancements in making Yannakakis' algorithm more practical, in terms of both efficiency and ease of implementation, and points out several avenues for future research.


翻译:Yannakakis的开创性算法对于无环连接查询是最优的,但由于其在实际应用中性能不佳,并未被广泛采用。本文简要综述了近期在提升Yannakakis算法的实践性方面取得的进展,包括其执行效率和实现便捷性,并指出了未来研究的若干方向。

0
下载
预览

Large language models (LLMs) demonstrate remarkable capabilities in natural language understanding and generation. Despite being trained on large-scale, high-quality data, LLMs still fail to outperform traditional static analysis tools in specialized domains like smart contract vulnerability detection. To address this issue, this paper proposes a post-training algorithm based on atomic task decomposition and fusion. This algorithm aims to achieve combinatorial generalization under limited data by decomposing complex reasoning tasks. Specifically, we decompose the reentrancy vulnerability detection task into four linearly independent atomic tasks: identifying external calls, identifying state updates, identifying data dependencies between external calls and state updates, and determining their data flow order. These tasks form the core components of our approach. By training on synthetic datasets, we generate three compiler-verified datasets. We then employ the Slither tool to extract structural information from the control flow graph and data flow graph, which is used to fine-tune the LLM's adapter. Experimental results demonstrate that low-rank normalization fusion with the LoRA adapter improves the LLM's reentrancy vulnerability detection accuracy to 98.2%, surpassing state-of-the-art methods. On 31 real-world contracts, the algorithm achieves a 20% higher recall than traditional analysis tools.


翻译:大语言模型在自然语言理解与生成方面展现出卓越能力。尽管经过大规模高质量数据训练,大语言模型在智能合约漏洞检测等专业领域仍难以超越传统静态分析工具。为解决此问题,本文提出基于原子任务分解与融合的后训练算法。该算法通过分解复杂推理任务,旨在实现有限数据下的组合泛化。具体而言,我们将重入漏洞检测任务分解为四个线性无关的原子任务:识别外部调用、识别状态更新、识别外部调用与状态更新的数据依赖关系,以及确定其数据流顺序。这些任务构成我们方法的核心组件。通过在合成数据集上进行训练,我们生成了三个经编译器验证的数据集。随后采用Slither工具从控制流图和数据流图中提取结构信息,用于微调大语言模型的适配器。实验结果表明,结合LoRA适配器的低秩归一化融合将大语言模型的重入漏洞检测准确率提升至98.2%,超越了现有最优方法。在31个真实合约上,该算法的召回率较传统分析工具高出20%。

0
下载
预览

Retrieval-Augmented Generation (RAG) has emerged as the predominant paradigm for grounding Large Language Model outputs in factual knowledge, effectively mitigating hallucinations. However, conventional RAG systems operate under a "retrieve-always" assumption, querying vector databases for every input regardless of query complexity. This static approach incurs substantial computational overhead and inference latency, particularly problematic for high-throughput production deployments. We introduce L-RAG (Lazy Retrieval-Augmented Generation), an adaptive framework that implements hierarchical context management through entropy-based gating. L-RAG employs a two-tier architecture: queries are first processed with a compact document summary, and expensive chunk retrieval is triggered only when the model's predictive entropy exceeds a calibrated threshold, signaling genuine uncertainty. Through experiments on SQuAD 2.0 (N=500) using the Phi-2 model, we demonstrate that L-RAG provides a tunable accuracy-efficiency trade-off: at a conservative threshold (tau=0.5), L-RAG achieves 78.2% accuracy, matching Standard RAG (77.8%), with 8% retrieval reduction; at a balanced threshold (tau=1.0), retrieval reduction increases to 26% with modest accuracy trade-off (76.0%). Latency analysis shows that L-RAG saves 80-210ms per query when retrieval latency exceeds 500ms. Analysis of entropy distributions reveals statistically significant separation (p < 0.001) between correct predictions (H=1.72) and errors (H=2.20), validating entropy as a reliable uncertainty signal. L-RAG offers a practical, training-free approach toward more efficient RAG deployment, providing system architects with a configurable knob to balance accuracy and throughput requirements.


翻译:检索增强生成已成为将大型语言模型输出基于事实知识的主流范式,有效缓解了幻觉问题。然而,传统RAG系统遵循“始终检索”假设,无论查询复杂度如何,均对每个输入执行向量数据库查询。这种静态方法会产生大量计算开销和推理延迟,在高吞吐量生产部署中尤为突出。本文提出L-RAG(惰性检索增强生成),这是一种通过基于熵的门控机制实现分层上下文管理的自适应框架。L-RAG采用双层架构:首先使用精简文档摘要处理查询,仅当模型预测熵超过校准阈值(表明存在真实不确定性)时,才触发高成本的分块检索。通过在SQuAD 2.0数据集(N=500)上使用Phi-2模型进行实验,我们证明L-RAG提供了可调节的精度-效率权衡:在保守阈值(τ=0.5)下,L-RAG达到78.2%的准确率,与标准RAG(77.8%)相当,同时减少8%的检索量;在平衡阈值(τ=1.0)下,检索量减少提升至26%,而精度仅适度降低(76.0%)。延迟分析表明,当检索延迟超过500ms时,L-RAG可为每个查询节省80-210ms。熵分布分析显示正确预测(H=1.72)与错误预测(H=2.20)之间存在统计学显著差异(p < 0.001),验证了熵作为不确定性信号的可靠性。L-RAG为高效RAG部署提供了一种无需训练的实用方案,为系统架构师提供了可配置的调节机制以平衡精度与吞吐量需求。

0
下载
预览

Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.


翻译:大语言模型(LLMs)正在变革企业工作流程,但当员工无意中分享机密数据或生成违反政策的内容时,也带来了安全与伦理挑战。本文提出SafeGPT,一种双向防护系统,用于防范敏感数据泄露和不道德输出。SafeGPT集成了输入侧检测/脱敏、输出侧审核/重构以及人机协同反馈机制。实验表明,SafeGPT在保持用户满意度的同时,能有效降低数据泄露风险和偏见性输出。

0
下载
预览

Early identification of student success is crucial for enabling timely interventions, reducing dropout rates, and promoting on time graduation. In educational settings, AI powered systems have become essential for predicting student performance due to their advanced analytical capabilities. However, effectively leveraging diverse student data to uncover latent and complex patterns remains a key challenge. While prior studies have explored this area, the potential of dynamic data features and multi category entities has been largely overlooked. To address this gap, we propose a framework that integrates heterogeneous graph deep learning models to enhance early and continuous student performance prediction, using traditional machine learning algorithms for comparison. Our approach employs a graph metapath structure and incorporates dynamic assessment features, which progressively influence the student success prediction task. Experiments on the Open University Learning Analytics (OULA) dataset demonstrate promising results, achieving a 68.6% validation F1 score with only 7% of the semester completed, and reaching up to 89.5% near the semester's end. Our approach outperforms top machine learning models by 4.7% in validation F1 score during the critical early 7% of the semester, underscoring the value of dynamic features and heterogeneous graph representations in student success prediction.


翻译:早期识别学生学业成功对于实施及时干预、降低辍学率以及促进按时毕业至关重要。在教育场景中,凭借其先进的分析能力,人工智能驱动的系统已成为预测学生学业表现的重要工具。然而,如何有效利用多样化的学生数据来揭示潜在且复杂的模式,仍然是一个关键挑战。尽管先前的研究已在此领域有所探索,但动态数据特征与多类别实体的潜力在很大程度上被忽视了。为弥补这一不足,我们提出了一个集成异质图深度学习模型的框架,以增强早期且持续的学生学业表现预测,并使用传统机器学习算法进行对比。我们的方法采用图元路径结构,并融入动态评估特征,这些特征会逐步影响学生学业成功的预测任务。在开放大学学习分析数据集上的实验展示了有前景的结果:在学期仅完成7%时即获得68.6%的验证F1分数,并在学期末达到最高89.5%。在学期关键的早期7%阶段,我们的方法在验证F1分数上优于顶尖机器学习模型4.7%,这凸显了动态特征与异质图表征在学生学业成功预测中的价值。

0
下载
预览

We present the SER modeling language for automatically verifying serializability of concurrent programs, i.e., whether every concurrent execution of the program is equivalent to some serial execution. SER programs are suitably restricted to make this problem decidable, while still allowing for an unbounded number of concurrent threads of execution, each potentially running for an unbounded number of steps. Building on prior theoretical results, we give the first automated end-to-end decision procedure that either proves serializability by producing a checkable certificate, or refutes it by producing a counterexample trace. We also present a network-system abstraction to which SER programs compile. Our decision procedure then reduces serializability in this setting to a Petri net reachability query. Furthermore, in order to scale, we curtail the search space via multiple optimizations, including Petri net slicing, semilinear-set compression, and Presburger-formula manipulation. We extensively evaluate our framework and show that, despite the theoretical hardness of the problem, it can successfully handle various models of real-world programs, including stateful firewalls, BGP routers, and more.


翻译:我们提出SER建模语言,用于自动验证并发程序的可串行化性质,即程序的每个并发执行是否等价于某个串行执行。SER程序经过适当限制以使该问题可判定,同时仍允许执行无限数量的并发线程,每个线程可能运行无限步数。基于先前的理论结果,我们给出了首个自动化端到端判定程序:该程序要么通过生成可验证证书来证明可串行化,要么通过生成反例轨迹来证伪。我们还提出了一种网络系统抽象模型,SER程序可编译至该模型。在此框架下,我们的判定程序将可串行性问题归约为Petri网可达性查询。此外,为提升可扩展性,我们通过多种优化技术缩减搜索空间,包括Petri网切片、半线性集压缩以及Presburger公式操作。我们对本框架进行了广泛评估,结果表明尽管该问题在理论上具有高复杂度,本系统仍能成功处理各类实际程序模型,包括有状态防火墙、BGP路由器等。

0
下载
预览

Individualized treatment regimes (ITRs) aim to improve clinical outcomes by assigning treatment based on patient-specific characteristics. However, existing methods often struggle with high-dimensional covariates, limiting accuracy, interpretability, and real-world applicability. We propose a novel sufficient dimension reduction approach that directly targets the contrast between potential outcomes and identifies a low-dimensional subspace of the covariates capturing treatment effect heterogeneity. This reduced representation enables more accurate estimation of optimal ITRs through outcome-weighted learning. To accommodate observational data, our method incorporates kernel-based covariate balancing, allowing treatment assignment to depend on the full covariate set and avoiding the restrictive assumption that the subspace sufficient for modeling heterogeneous treatment effects is also sufficient for confounding adjustment. We show that the proposed method achieves universal consistency, i.e., its risk converges to the Bayes risk, under mild regularity conditions. We demonstrate its finite sample performance through simulations and an analysis of intensive care unit sepsis patient data to determine who should receive transthoracic echocardiography.


翻译:个体化治疗策略旨在通过基于患者特异性特征分配治疗来改善临床结果。然而,现有方法常难以处理高维协变量,限制了准确性、可解释性和实际应用性。我们提出了一种新颖的充分降维方法,该方法直接针对潜在结果间的对比,并识别出一个捕捉治疗效果异质性的低维协变量子空间。这种降维表示通过结果加权学习实现了对最优个体化治疗策略的更准确估计。为适应观察性数据,我们的方法结合了基于核的协变量平衡,允许治疗分配依赖于完整的协变量集,并避免了以下限制性假设:即对建模异质治疗效果充分的子空间对混杂调整也是充分的。我们证明,在温和的正则性条件下,所提方法实现了普遍一致性,即其风险收敛于贝叶斯风险。我们通过模拟和对重症监护室脓毒症患者数据的分析来评估其有限样本性能,以确定哪些患者应接受经胸超声心动图检查。

0
下载
预览
登陆后查看更多精品内容
VIP会员
本周荟萃主题
区块链
区块链(Blockchain)是由节点参与的分布式数据库系统,它的特点是不可更改,不可伪造,也可以将其理解为账簿系统(ledger)。它是比特币的一个重要概念,完整比特币区块链的副本,记录了其代币(token)的每一笔交易。通过这些信息,我们可以找到每一个地址,在历史上任何一点所拥有的价值。
深度学习
机器学习的一个分支,它基于试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的一系列算法。
机器学习
“机器学习是近20多年兴起的一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习理论主要是设计和分析一些让 可以自动“ 学习”的算法。机器学习算法是一类从数据中自动分析获得规律,并利用规律对未知数据进行预测的算法。因为学习算法中涉及了大量的统计学理论,机器学习与统计推断学联系尤为密切,也被称为统计学习理论。算法设计方面,机器学习理论关注可以实现的,行之有效的学习算法。很多 推论问题属于 无程序可循难度,所以部分的机器学习研究是开发容易处理的近似算法。”

——中文维基百科
强化学习
强化学习(RL)是机器学习的一个领域,与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外,强化学习是三种基本的机器学习范式之一。 强化学习与监督学习的不同之处在于,不需要呈现带标签的输入/输出对,也不需要显式纠正次优动作。相反,重点是在探索(未知领域)和利用(当前知识)之间找到平衡。 该环境通常以马尔可夫决策过程(MDP)的形式陈述,因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于,后者不假设MDP的确切数学模型,并且针对无法采用精确方法的大型MDP。
推荐系统
推荐系统,是指根据用户的习惯、偏好或兴趣,从不断到来的大规模信息中识别满足用户兴趣的信息的过程。推荐推荐任务中的信息往往称为物品(Item)。根据具体应用背景的不同,这些物品可以是新闻、电影、音乐、广告、商品等各种对象。推荐系统利用电子商务网站向客户提供商品信息和建议,帮助用户决定应该购买什么产品,模拟销售人员帮助客户完成购买过程。个性化推荐是根据用户的兴趣特点和购买行为,向用户推荐用户感兴趣的信息和商品。随着电子商务规模的不断扩大,商品个数和种类快速增长,顾客需要花费大量的时间才能找到自己想买的商品。这种浏览大量无关的信息和产品过程无疑会使淹没在信息过载问题中的消费者不断流失。为了解决这些问题,个性化推荐系统应运而生。个性化推荐系统是建立在海量数据挖掘基础上的一种高级商务智能平台,以帮助电子商务网站为其顾客购物提供完全个性化的决策支持和信息服务。
卷积神经网络
在深度学习中,卷积神经网络(CNN或ConvNet)是一类深度神经网络,最常用于分析视觉图像。基于它们的共享权重架构和平移不变性特征,它们也被称为位移不变或空间不变的人工神经网络(SIANN)。它们在图像和视频识别,推荐系统,图像分类,医学图像分析,自然语言处理,和财务时间序列中都有应用。
计算机网络
计算机网络( Computer Networks )指将地理位置不同的多台计算机及其外部设备,通过通信线路连接起来,在网络操作系统及网络通信协议的管理和协调下,实现资源共享和信息传递的计算机系统。
命名实体识别
命名实体识别(NER)(也称为实体标识,实体组块和实体提取)是信息抽取的子任务,旨在将非结构化文本中提到的命名实体定位和分类为预定义类别,例如人员姓名、地名、机构名、专有名词等。
机器翻译
机器翻译,又称为自动翻译,是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。它是计算语言学的一个分支,是人工智能的终极目标之一,具有重要的科学研究价值。
计算机视觉
计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取‘信息’的人工智能系统。
微信扫码咨询专知VIP会员