Deploying massive diffusion models for real-time, infinite-duration, audio-driven avatar generation presents a significant engineering challenge, primarily due to the conflict between computational load and strict latency constraints. Existing approaches often compromise visual fidelity by enforcing strictly unidirectional attention mechanisms or reducing model capacity. To address this problem, we introduce \textbf{SoulX-LiveTalk}, a 14B-parameter framework optimized for high-fidelity real-time streaming. Diverging from conventional unidirectional paradigms, we use a \textbf{Self-correcting Bidirectional Distillation} strategy that retains bidirectional attention within video chunks. This design preserves critical spatiotemporal correlations, significantly enhancing motion coherence and visual detail. To ensure stability during infinite generation, we incorporate a \textbf{Multi-step Retrospective Self-Correction Mechanism}, enabling the model to autonomously recover from accumulated errors and preventing collapse. Furthermore, we engineered a full-stack inference acceleration suite incorporating hybrid sequence parallelism, Parallel VAE, and kernel-level optimizations. Extensive evaluations confirm that SoulX-LiveTalk is the first 14B-scale system to achieve a \textbf{sub-second start-up latency (0.87s)} while reaching a real-time throughput of \textbf{32 FPS}, setting a new standard for high-fidelity interactive digital human synthesis.


翻译:部署大规模扩散模型以实现实时、无限时长、音频驱动的虚拟形象生成是一项重大的工程挑战,这主要源于计算负载与严格延迟约束之间的冲突。现有方法通常通过强制采用严格单向的注意力机制或降低模型容量来牺牲视觉保真度。为解决此问题,我们引入了\textbf{SoulX-LiveTalk},这是一个针对高保真实时流式传输优化的140亿参数框架。有别于传统的单向范式,我们采用了一种\textbf{自校正双向蒸馏}策略,该策略在视频块内保留了双向注意力。这种设计保留了关键的时空相关性,显著增强了运动连贯性和视觉细节。为确保无限生成过程中的稳定性,我们引入了一种\textbf{多步回顾性自校正机制},使模型能够从累积误差中自主恢复并防止崩溃。此外,我们设计了一套全栈推理加速套件,融合了混合序列并行、并行VAE以及内核级优化。广泛的评估证实,SoulX-LiveTalk是首个达到\textbf{亚秒级启动延迟(0.87秒)}并实现\textbf{32 FPS}实时吞吐量的140亿规模系统,为高保真交互式数字人合成树立了新标准。

0
下载
预览

3D scene graphs have empowered robots with semantic understanding for navigation and planning, yet they often lack the functional information required for physical manipulation, particularly regarding articulated objects. Existing approaches for inferring articulation mechanisms from static observations are prone to visual ambiguity, while methods that estimate parameters from state changes typically rely on constrained settings such as fixed cameras and unobstructed views. Furthermore, fine-grained functional elements like small handles are frequently missed by general object detectors. To bridge this gap, we present ArtiSG, a framework that constructs functional 3D scene graphs by encoding human demonstrations into structured robotic memory. Our approach leverages a robust articulation data collection pipeline utilizing a portable setup to accurately estimate 6-DoF articulation trajectories and axes even under camera ego-motion. We integrate these kinematic priors into a hierarchical and open-vocabulary graph while utilizing interaction data to discover inconspicuous functional elements missed by visual perception. Extensive real-world experiments demonstrate that ArtiSG significantly outperforms baselines in functional element recall and articulation estimation precision. Moreover, we show that the constructed graph serves as a reliable functional memory that effectively guides robots to perform language-directed manipulation tasks in real-world environments containing diverse articulated objects.


翻译:三维场景图赋予机器人语义理解能力以支持导航与规划,但其通常缺乏物理操作所需的功能性信息,尤其在关节物体方面。现有基于静态观测推断关节机制的方法易受视觉模糊性影响,而通过状态变化估计参数的方法通常依赖于固定相机和无遮挡视图等受限设置。此外,通用物体检测器常遗漏细粒度功能元件(如小型把手)。为弥补这一差距,我们提出ArtiSG框架,通过将人类演示编码为结构化机器人记忆来构建功能性三维场景图。该方法采用便携式装置构建的鲁棒关节数据采集流程,即使在相机自身运动条件下也能精确估计六自由度关节轨迹与轴线。我们将这些运动学先验整合至分层开放词汇图结构中,并利用交互数据发现视觉感知遗漏的不显眼功能元件。大量真实环境实验表明,ArtiSG在功能元件召回率与关节估计精度方面显著优于基线方法。此外,我们证明所构建的图可作为可靠的功能记忆,有效指导机器人在包含多样化关节物体的真实环境中执行语言指令驱动的操作任务。

1
下载
预览

High-fidelity simulations and physical experiments are essential for engineering analysis and design, yet their high cost often makes two critical tasks--global sensitivity analysis (GSA) and optimization--prohibitively expensive. This limitation motivates the common use of Gaussian processes (GPs) as proxy regression models that provide uncertainty-aware predictions from a limited number of high-quality observations. GPs naturally enable efficient sampling strategies that support informed decision-making under uncertainty by extracting information from a subset of possible functions for the model of interest. However, direct sampling from GPs is inefficient due to their infinite-dimensional nature and the high cost associated with large covariance matrix operations. Despite their popularity in machine learning and statistics communities, sampling from GPs has received little attention in the community of engineering optimization. In this paper, we present the formulation and detailed implementation of two notable sampling methods--random Fourier features and pathwise conditioning--for generating posterior samples from GPs at reduced computational cost. Alternative approaches are briefly described. Importantly, we detail how the generated samples can be applied in GSA, single-objective optimization, and multi-objective optimization. We show successful applications of these sampling methods through a series of numerical examples.


翻译:高保真仿真与物理实验是工程分析与设计的基础,然而其高昂成本常使两项关键任务——全局敏感性分析(GSA)与优化——变得难以承受。这一局限推动了高斯过程(GPs)作为代理回归模型的广泛应用,它能够基于有限的高质量观测数据提供具有不确定性感知的预测。高斯过程天然支持高效的采样策略,通过从目标模型可能函数集合的子集中提取信息,为不确定性条件下的决策提供依据。然而,由于高斯过程具有无限维特性,且大规模协方差矩阵运算成本高昂,直接采样效率低下。尽管高斯过程在机器学习与统计学界广受欢迎,其采样方法在工程优化领域却鲜有关注。本文系统阐述两种重要的采样方法——随机傅里叶特征与路径条件采样——的数学表述与详细实现,以降低计算成本生成高斯过程后验样本。文中简要介绍了其他替代方法,并重点详述了生成样本在全局敏感性分析、单目标优化及多目标优化中的具体应用。通过系列数值算例,我们展示了这些采样方法的成功应用。

0
下载
预览

We derive novel concentration inequalities that bound the statistical error for a large class of stochastic optimization problems, focusing on the case of unbounded objective functions. Our derivations utilize the following key tools: 1) A new form of McDiarmid's inequality that is based on sample-dependent one-component mean-difference bounds and which leads to a novel uniform law of large numbers result for unbounded functions. 2) A new Rademacher complexity bound for families of functions that satisfy an appropriate sample-dependent Lipschitz property, which allows for application to a large class of distributions with unbounded support. As an application of these results, we derive statistical error bounds for denoising score matching (DSM), an application that inherently requires one to consider unbounded objective functions and distributions with unbounded support, even in cases where the data distribution has bounded support. In addition, our results quantify the benefit of sample-reuse in algorithms that employ easily-sampled auxiliary random variables in addition to the training data, e.g., as in DSM, which uses auxiliary Gaussian random variables.


翻译:本文推导了一类新颖的集中不等式,用于界定一大类随机优化问题的统计误差,重点关注目标函数无界的情形。我们的推导运用了以下关键工具:1)一种基于样本依赖性单分量均值差界的新型McDiarmid不等式,该不等式导出了无界函数的新型一致大数定律结果;2)针对满足适当样本依赖性Lipschitz性质的函数族,提出了一种新的Rademacher复杂度界,这使得该方法能够适用于一大类具有无界支撑集的分布。作为这些结果的应用,我们推导了去噪分数匹配(DSM)的统计误差界——该应用本质上要求考虑无界目标函数及无界支撑集的分布,即使在数据分布具有有界支撑集的情况下亦如此。此外,我们的结果量化了在训练数据之外还采用易采样辅助随机变量的算法(例如DSM中使用的辅助高斯随机变量)中样本重用的优势。

0
下载
预览

A graph $G$ factors into graphs $H$ and $K$ via a matrix product if $A = BC$, where $A$, $B$, and $C$ are the adjacency matrices of $G$, $H$, and $K$, respectively. The graph $G$ is prime if, in every such factorization, one of the factors is a perfect matching that is, it corresponds to a permutation matrix. We characterize all prime graphs, then using this result we classify all factorable forests, answering a question of Akbari et al. [\emph{Linear Algebra and its Applications} (2025)]. We prove that every torus is factorable, and we characterize all possible factorizations of grids, addressing two questions posed by Maghsoudi et al. [\emph{Journal of Algebraic Combinatorics} (2025)].


翻译:若图$G$的邻接矩阵$A$可分解为$A = BC$,其中$B$和$C$分别为图$H$和$K$的邻接矩阵,则称图$G$可通过矩阵乘积分解为$H$和$K$。若在任意此类分解中,总有一个因子是完美匹配(即对应置换矩阵),则称图$G$是素的。本文首先刻画所有素图,进而利用该结果对可分解森林进行完全分类,从而回答了Akbari等人[\emph{Linear Algebra and its Applications} (2025)]提出的一个问题。我们证明了所有环面图皆可分解,并完整刻画了网格图所有可能的分解方式,由此解决了Maghsoudi等人[\emph{Journal of Algebraic Combinatorics} (2025)]提出的两个问题。

0
下载
预览

Selecting the number of communities is a fundamental challenge in network clustering. The silhouette score offers an intuitive, model-free criterion that balances within-cluster cohesion and between-cluster separation. Albeit its widespread use in clustering analysis, its performance in network-based community detection remains insufficiently characterized. In this study, we comprehensively evaluate the performance of the silhouette score across unweighted, weighted, and fully connected networks, examining how network size, separation strength, and community size imbalance influence its performance. Simulation studies show that the silhouette score accurately identifies the true number of communities when clusters are well separated and balanced, but it tends to underestimate under strong imbalance or weak separation and to overestimate in sparse networks. Extending the evaluation to a real airline reachability network, we demonstrate that the silhouette-based clustering can recover geographically interpretable and market-oriented clusters. These findings provide empirical guidance for applying the silhouette score in network clustering and clarify the conditions under which its use is most reliable.


翻译:选择社区数量是网络聚类中的一个基本挑战。轮廓分数提供了一种直观、无模型的准则,能够平衡簇内凝聚性和簇间分离度。尽管该指标在聚类分析中广泛应用,但其在网络社区检测中的性能尚未得到充分表征。本研究全面评估了轮廓分数在无权、加权及全连接网络中的性能表现,探究了网络规模、分离强度和社区规模不平衡性对其性能的影响。仿真研究表明:当聚类簇分离良好且规模均衡时,轮廓分数能准确识别真实社区数量;但在强不平衡或弱分离条件下易出现低估,在稀疏网络中则倾向于高估。通过对真实航空可达性网络的扩展评估,我们证明基于轮廓分数的聚类能够恢复具有地理可解释性和市场导向性的聚类簇。这些发现为轮廓分数在网络聚类中的应用提供了实证指导,并明确了其最可靠的使用条件。

0
下载
预览

Visual Place Recognition (VPR) is a major challenge for robotics and autonomous systems, with the goal of predicting the location of an image based solely on its visual features. State-of-the-art (SOTA) models extract global descriptors using the powerful foundation model DINOv2 as backbone. These models either explore the cross-image correlation or propose a time-consuming two-stage re-ranking strategy to achieve better performance. However, existing works only utilize the final output of DINOv2, and the current cross-image correlation causes unstable retrieval results. To produce both discriminative and constant global descriptors, this paper proposes stable cross-image correlation enhanced model for VPR called SciceVPR. This model explores the full potential of DINOv2 in providing useful feature representations that implicitly encode valuable contextual knowledge. Specifically, SciceVPR first uses a multi-layer feature fusion module to capture increasingly detailed task-relevant channel and spatial information from the multi-layer output of DINOv2. Secondly, SciceVPR considers the invariant correlation between images within a batch as valuable knowledge to be distilled into the proposed self-enhanced encoder. In this way, SciceVPR can acquire fairly robust global features regardless of domain shifts (e.g., changes in illumination, weather and viewpoint between pictures taken in the same place). Experimental results demonstrate that the base variant, SciceVPR-B, outperforms SOTA one-stage methods with single input on multiple datasets with varying domain conditions. The large variant, SciceVPR-L, performs on par with SOTA two-stage models, scoring over 3% higher in Recall@1 compared to existing models on the challenging Tokyo24/7 dataset. Our code will be released at https://github.com/shuimushan/SciceVPR.


翻译:视觉位置识别(VPR)是机器人与自主系统面临的主要挑战,其目标在于仅依据图像的视觉特征来预测其地理位置。当前最先进的模型以强大的基础模型DINOv2为骨干网络提取全局描述符。这些模型或探索跨图像相关性,或提出耗时的两阶段重排序策略以提升性能。然而,现有工作仅利用了DINOv2的最终输出,且当前的跨图像关联方法易导致检索结果不稳定。为生成兼具区分性与稳定性的全局描述符,本文提出一种用于VPR的稳定跨图像关联增强模型SciceVPR。该模型充分挖掘DINOv2在提供有用特征表示方面的潜力,这些特征隐式编码了有价值的上下文知识。具体而言,SciceVPR首先通过多层特征融合模块,从DINOv2的多层输出中捕获逐渐细化的任务相关通道与空间信息。其次,SciceVPR将批次内图像间的不变相关性视为有价值的知识,并将其蒸馏至提出的自增强编码器中。通过这种方式,SciceVPR能够获得相当鲁棒的全局特征,而不受领域偏移的影响(例如同一地点拍摄图片间的光照、天气与视角变化)。实验结果表明,基础变体SciceVPR-B在多个具有不同领域条件的数据集上,其性能优于当前最先进的单阶段单输入方法。大型变体SciceVPR-L与当前最先进的两阶段模型性能相当,在具有挑战性的Tokyo24/7数据集上,其Recall@1指标较现有模型提升超过3%。我们的代码将在https://github.com/shuimushan/SciceVPR发布。

0
下载
预览

We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the camera viewpoint and the motion sequence within the generative process, re-rendering the scene for continuous and arbitrary exploration across space and time. To achieve this, we introduce an effective animation time-embedding mechanism in the diffusion process, allowing explicit control of the output video's motion sequence with respect to that of the source video. As no datasets provide paired videos of the same dynamic scene with continuous temporal variations, we propose a simple yet effective temporal-warping training scheme that repurposes existing multi-view datasets to mimic temporal differences. This strategy effectively supervises the model to learn temporal control and achieve robust space-time disentanglement. To further enhance the precision of dual control, we introduce two additional components: an improved camera-conditioning mechanism that allows altering the camera from the first frame, and CamxTime, the first synthetic space-and-time full-coverage rendering dataset that provides fully free space-time video trajectories within a scene. Joint training on the temporal-warping scheme and the CamxTime dataset yields more precise temporal control. We evaluate SpaceTimePilot on both real-world and synthetic data, demonstrating clear space-time disentanglement and strong results compared to prior work. Project page: https://zheninghuang.github.io/Space-Time-Pilot/ Code: https://github.com/ZheningHuang/spacetimepilot


翻译:我们提出时空导航者(SpaceTimePilot),一种用于可控生成式渲染的视频扩散模型,其实现了空间与时间的解耦。给定单目视频,时空导航者能够在生成过程中独立改变摄像机视点和运动序列,从而重新渲染场景,实现跨时空的连续任意探索。为实现这一目标,我们在扩散过程中引入了一种高效的动画时间嵌入机制,允许对输出视频相对于源视频的运动序列进行显式控制。由于现有数据集均未提供具有连续时间变化的同一动态场景的配对视频,我们提出了一种简单而有效的时间扭曲训练方案,该方案通过重新利用现有的多视角数据集来模拟时间差异。该策略有效地监督模型学习时间控制,并实现稳健的时空解耦。为进一步提升双重控制的精度,我们引入了两个附加组件:一种改进的摄像机条件机制,允许从第一帧开始改变摄像机;以及CamxTime——首个合成的时空全覆盖渲染数据集,该数据集提供了场景内完全自由的时空视频轨迹。在时间扭曲方案与CamxTime数据集上的联合训练实现了更精确的时间控制。我们在真实世界和合成数据上评估了时空导航者,结果表明相较于先前工作,其展现出清晰的时空解耦能力并取得了优异的结果。项目页面:https://zheninghuang.github.io/Space-Time-Pilot/ 代码:https://github.com/ZheningHuang/spacetimepilot

0
下载
预览

Online joint estimation of unknown parameters and states in a dynamical system with uncertainty quantification is crucial in many applications. For example, digital twins dynamically update their knowledge of model parameters and states to support prediction and decision-making. Reliability and computational speed are vital for DTs. Online parameter-state estimation ensures computational efficiency, while uncertainty quantification is essential for making reliable predictions and decisions. In parameter-state estimation, the joint distribution of the state and model parameters conditioned on the data, termed the joint posterior, provides accurate uncertainty quantification. Because the joint posterior is generally intractable to compute, this paper presents an online variational inference framework to compute its approximation at each time step. The approximation is factorized into a marginal distribution over the model parameters and a state distribution conditioned on the parameters. This factorization enables recursive updates through a two-stage procedure: first, the parameter posterior is approximated via variational inference; second, the state distribution conditioned on the parameters is computed using Gaussian filtering based on the estimated parameter posterior. The algorithmic design is supported by a theorem establishing upper bounds on the joint posterior approximation error. Numerical experiments demonstrate that the proposed method (i) matches the performance of the joint particle filter in low-dimensional problems, accurately inferring both unobserved states and unknown parameters of dynamical and observation models; (ii) remains robust under noisy, partial observations and model discrepancies in a chaotic Lorenz 96 system; and (iii) scales effectively to a high-dimensional convection-diffusion system, where it outperforms the joint ensemble Kalman filter.


翻译:在动态系统中进行具有不确定性量化的未知参数与状态在线联合估计,对众多应用至关重要。例如,数字孪生通过动态更新其对模型参数与状态的认知,以支持预测与决策。可靠性与计算速度对数字孪生至关重要。在线参数-状态估计确保了计算效率,而不确定性量化则是实现可靠预测与决策的关键。在参数-状态估计中,以数据为条件的状态与模型参数的联合分布(称为联合后验分布)能够提供精确的不确定性量化。由于联合后验分布通常难以直接计算,本文提出了一种在线变分推断框架,用于在每个时间步计算其近似分布。该近似分布被分解为模型参数的边缘分布和以参数为条件的状态分布。这种分解使得可以通过一个两阶段过程实现递归更新:首先,通过变分推断近似参数后验分布;其次,基于估计的参数后验分布,使用高斯滤波计算以参数为条件的状态分布。该算法设计得到了一项定理的支持,该定理建立了联合后验近似误差的上界。数值实验表明,所提方法(i)在低维问题上与联合粒子滤波器的性能相当,能够准确推断动态模型和观测模型中未观测到的状态与未知参数;(ii)在混沌Lorenz 96系统中,面对噪声、部分观测及模型失配时仍保持稳健性;(iii)能够有效扩展至高维对流-扩散系统,在该系统中其性能优于联合集合卡尔曼滤波器。

0
下载
预览

The concept of fair orientations in graphs was introduced by Christodoulou, Fiat, Koutsoupias, and Sgouritsa in 2023, naturally modeling fair division scenarios in which resources are only contested by neighbors. In this model, vertices represent agents and undirected edges represent goods; edges have to be oriented towards one of their endpoints, i.e., allocated to one of their adjacent agents. Although EFX orientations (envy-free up to any good) have been extensively studied in this setting, EF orientations (envy-free) remain unexplored. In this work, we initiate their study, mostly under the lens of parameterized complexity, presenting various tractable cases, hardness results, and parameterizations. Our results concern both simple graphs and multigraphs. Interestingly, many of our results transfer to EFX orientations, thus complementing and improving upon previous work; notably, we answer an open question regarding the structural parameterized complexity of the latter problem on graphs of polynomially-bounded valuations. We also show that EF orientations are tractable in cases in which EFX orientations are not, particularly for binary valuations. Lastly, we consider charity in the orientation setting, establishing algorithms for finding the minimum amount of edges that have to be removed from a graph in order for EF(X) orientations to exist.


翻译:图论中的公平定向概念由 Christodoulou、Fiat、Koutsoupias 和 Sgouritsa 于 2023 年提出,自然地模拟了资源仅被相邻个体争夺的公平分配场景。在此模型中,顶点代表智能体,无向边代表物品;边必须定向到其某一端点,即分配给其相邻的某个智能体。尽管 EFX 定向(对任意物品无嫉妒)在此设定下已被广泛研究,EF 定向(无嫉妒)仍未得到探索。在本工作中,我们首次对其展开研究,主要从参数化复杂性视角出发,提出了多种可处理情形、困难性结果及参数化方案。我们的结果涉及简单图与多重图。有趣的是,许多结果可迁移至 EFX 定向问题,从而补充并改进了先前工作;特别地,我们回答了关于后者在多项式有界估值图上的结构参数化复杂性的开放问题。我们还证明,在 EFX 定向不可处理的情形下(尤其是二值估值情形),EF 定向是可处理的。最后,我们探讨了定向设定中的慈善问题,建立了寻找使 EF(X) 定向存在所需移除的最小边数的算法。

0
下载
预览
登陆后查看更多精品内容
VIP会员
本周荟萃主题
区块链
区块链(Blockchain)是由节点参与的分布式数据库系统,它的特点是不可更改,不可伪造,也可以将其理解为账簿系统(ledger)。它是比特币的一个重要概念,完整比特币区块链的副本,记录了其代币(token)的每一笔交易。通过这些信息,我们可以找到每一个地址,在历史上任何一点所拥有的价值。
深度学习
机器学习的一个分支,它基于试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的一系列算法。
机器学习
“机器学习是近20多年兴起的一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习理论主要是设计和分析一些让 可以自动“ 学习”的算法。机器学习算法是一类从数据中自动分析获得规律,并利用规律对未知数据进行预测的算法。因为学习算法中涉及了大量的统计学理论,机器学习与统计推断学联系尤为密切,也被称为统计学习理论。算法设计方面,机器学习理论关注可以实现的,行之有效的学习算法。很多 推论问题属于 无程序可循难度,所以部分的机器学习研究是开发容易处理的近似算法。”

——中文维基百科
强化学习
强化学习(RL)是机器学习的一个领域,与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外,强化学习是三种基本的机器学习范式之一。 强化学习与监督学习的不同之处在于,不需要呈现带标签的输入/输出对,也不需要显式纠正次优动作。相反,重点是在探索(未知领域)和利用(当前知识)之间找到平衡。 该环境通常以马尔可夫决策过程(MDP)的形式陈述,因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于,后者不假设MDP的确切数学模型,并且针对无法采用精确方法的大型MDP。
推荐系统
推荐系统,是指根据用户的习惯、偏好或兴趣,从不断到来的大规模信息中识别满足用户兴趣的信息的过程。推荐推荐任务中的信息往往称为物品(Item)。根据具体应用背景的不同,这些物品可以是新闻、电影、音乐、广告、商品等各种对象。推荐系统利用电子商务网站向客户提供商品信息和建议,帮助用户决定应该购买什么产品,模拟销售人员帮助客户完成购买过程。个性化推荐是根据用户的兴趣特点和购买行为,向用户推荐用户感兴趣的信息和商品。随着电子商务规模的不断扩大,商品个数和种类快速增长,顾客需要花费大量的时间才能找到自己想买的商品。这种浏览大量无关的信息和产品过程无疑会使淹没在信息过载问题中的消费者不断流失。为了解决这些问题,个性化推荐系统应运而生。个性化推荐系统是建立在海量数据挖掘基础上的一种高级商务智能平台,以帮助电子商务网站为其顾客购物提供完全个性化的决策支持和信息服务。
卷积神经网络
在深度学习中,卷积神经网络(CNN或ConvNet)是一类深度神经网络,最常用于分析视觉图像。基于它们的共享权重架构和平移不变性特征,它们也被称为位移不变或空间不变的人工神经网络(SIANN)。它们在图像和视频识别,推荐系统,图像分类,医学图像分析,自然语言处理,和财务时间序列中都有应用。
计算机网络
计算机网络( Computer Networks )指将地理位置不同的多台计算机及其外部设备,通过通信线路连接起来,在网络操作系统及网络通信协议的管理和协调下,实现资源共享和信息传递的计算机系统。
命名实体识别
命名实体识别(NER)(也称为实体标识,实体组块和实体提取)是信息抽取的子任务,旨在将非结构化文本中提到的命名实体定位和分类为预定义类别,例如人员姓名、地名、机构名、专有名词等。
机器翻译
机器翻译,又称为自动翻译,是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。它是计算语言学的一个分支,是人工智能的终极目标之一,具有重要的科学研究价值。
计算机视觉
计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取‘信息’的人工智能系统。
微信扫码咨询专知VIP会员