We introduce a relevant yet challenging problem named Personalized Dictionary Learning (PerDL), where the goal is to learn sparse linear representations from heterogeneous datasets that share some commonality. In PerDL, we model each dataset's shared and unique features as global and local dictionaries. Challenges for PerDL not only are inherited from classical dictionary learning (DL), but also arise due to the unknown nature of the shared and unique features. In this paper, we rigorously formulate this problem and provide conditions under which the global and local dictionaries can be provably disentangled. Under these conditions, we provide a meta-algorithm called Personalized Matching and Averaging (PerMA) that can recover both global and local dictionaries from heterogeneous datasets. PerMA is highly efficient; it converges to the ground truth at a linear rate under suitable conditions. Moreover, it automatically borrows strength from strong learners to improve the prediction of weak learners. As a general framework for extracting global and local dictionaries, we show the application of PerDL in different learning tasks, such as training with imbalanced datasets and video surveillance.
翻译:我们提出一个相关且具有挑战性的问题,即个性化字典学习(PerDL),其目标是从具有一定共性的异构数据集中学习稀疏线性表示。在PerDL中,我们将每个数据集的共享特征与独特特征分别建模为全局字典和局部字典。PerDL的挑战不仅继承自经典字典学习(DL),还源于共享特征与独特特征的未知性。本文严格形式化了该问题,并给出了全局字典与局部字典可被证明解耦的条件。在此条件下,我们提出一种名为个性化匹配与平均(PerMA)的元算法,能够从异构数据集中同时恢复全局字典和局部字典。PerMA具有高效性:在适当条件下以线性速率收敛至真实解。此外,它能自动从强学习器借力以提升弱学习器的预测性能。作为提取全局和局部字典的通用框架,我们展示了PerDL在不同学习任务中的应用,例如非平衡数据集训练和视频监控。