Dictionary learning is traditionally formulated as an $L_1$-regularized signal reconstruction problem. While recent developments have incorporated discriminative, hierarchical, or generative structures, most approaches rely on encouraging representation sparsity over individual samples that overlook how atoms are shared across samples, resulting in redundant and sub-optimal dictionaries. We introduce a parsimony promoting regularizer based on the row-wise $L_\infty$ norm of the coefficient matrix. This additional penalty encourages entire rows of the coefficient matrix to vanish, thereby reducing the number of dictionary atoms activated across the dataset. We derive the formulation from a probabilistic model with Beta-Bernoulli priors, which provides a Bayesian interpretation linking the regularization parameters to prior distributions. We further establish theoretical calculation for optimal hyperparameter selection and connect our formulation to both Minimum Description Length, Bayesian model selection and pathlet learning. Extensive experiments on benchmark datasets demonstrate that our method achieves substantially improved reconstruction quality (with a 20\% reduction in RMSE) and enhanced representation sparsity, utilizing fewer than one-tenth of the available dictionary atoms, while empirically validating our theoretical analysis.
翻译:传统字典学习通常被表述为$L_1$正则化的信号重构问题。尽管近期研究引入了判别式、层次化或生成式结构,但大多数方法仅关注于促进单个样本表示的稀疏性,而忽略了原子在样本间的共享机制,导致产生冗余且次优的字典。本文提出一种基于系数矩阵行向$L_\infty$范数的简约性促进正则项。该附加惩罚项促使系数矩阵的整行元素趋于零,从而减少在整个数据集中被激活的字典原子数量。我们从具有Beta-Bernoulli先验的概率模型推导出该公式,这为连接正则化参数与先验分布提供了贝叶斯解释。我们进一步建立了最优超参数选择的理论计算方法,并将本框架与最小描述长度、贝叶斯模型选择及路径片段学习理论相关联。在基准数据集上的大量实验表明,本方法在仅使用不到十分之一可用字典原子的情况下,显著提升了重构质量(RMSE降低20%)并增强了表示稀疏性,同时通过实证验证了我们的理论分析。