Data tensors of orders 2 and greater are now routinely being generated. These data collections are increasingly huge and growing. Many scientific and medical data tensors are tensor fields (e.g., images, videos, geographic data) in which the spatial neighborhood contains important information. Directly accessing such large data tensor collections for information has become increasingly prohibitive. We learn approximate full-rank and compact tensor sketches with decompositive representations providing compact space, time and spectral embeddings of tensor fields. All information querying and post-processing on the original tensor field can now be achieved more efficiently and with customizable accuracy as they are performed on these compact factored sketches in latent generative space. We produce optimal rank-r sketchy Tucker decomposition of arbitrary order data tensors by building compact factor matrices from a sample-efficient sub-sampling of tensor slices. Our sample efficient policy is learned via an adaptable stochastic Thompson sampling using Dirichlet distributions with conjugate priors.
翻译:二阶及更高阶的数据张量目前已变得非常普遍,且这些数据集的规模正日益庞大。许多科学与医学数据张量属于张量场(例如图像、视频、地理数据),其中空间邻域包含重要信息。直接访问如此大规模的数据张量进行信息检索已变得愈发困难。我们提出了一种基于分解表示的近似满秩紧凑张量素描方法,通过生成紧凑的空间、时间和谱嵌入来表征张量场。原始张量场的所有信息查询和后处理操作均可在这些紧凑因子化素描的隐式生成空间中实现,且可通过定制精度获得更高效率。我们通过对张量切片进行高效样本子采样来构建紧凑因子矩阵,从而生成任意阶数据张量的最优秩r素描式Tucker分解。该样本高效策略基于具有共轭先验的Dirichlet分布,通过可自适应随机Thompson采样学习获得。