Accurately estimating the proportion of true signals among a large number of variables is crucial for enhancing the precision and reliability of scientific research. Traditional signal proportion estimators often assume independence among variables and specific signal sparsity conditions, limiting their applicability in real-world scenarios where such assumptions may not hold. This paper introduces a novel signal proportion estimator that leverages arbitrary covariance dependence information among variables, thereby improving performance across a wide range of sparsity levels and dependence structures. Building on previous work that provides lower confidence bounds for signal proportions, we extend this approach by incorporating the principal factor approximation procedure to account for variable dependence. Our theoretical insights offer a deeper understanding of how signal sparsity, signal intensity, and covariance dependence interact. By comparing the conditions for estimation consistency before and after dependence adjustment, we highlight the advantages of integrating dependence information across different contexts. This theoretical foundation not only validates the effectiveness of the new estimator but also guides its practical application, ensuring reliable use in diverse scenarios. Through extensive simulations, we demonstrate that our method outperforms state-of-the-art estimators in both estimation accuracy and the detection of weaker signals that might otherwise go undetected.
翻译:准确估计大量变量中真实信号的比例,对于提升科学研究的精确性和可靠性至关重要。传统的信号比例估计器通常假设变量间相互独立并满足特定的信号稀疏性条件,这限制了其在现实场景中的适用性——因为此类假设往往不成立。本文提出了一种新型信号比例估计器,它能够利用变量间任意协方差依赖信息,从而在广泛的稀疏度水平和依赖结构下提升性能。基于先前为信号比例提供置信下限的研究,我们通过引入主因子近似过程来考虑变量依赖性,从而对该方法进行了拓展。我们的理论洞察深入揭示了信号稀疏性、信号强度与协方差依赖之间的交互机制。通过比较依赖调整前后估计一致性的条件,我们凸显了在不同情境中整合依赖信息的优势。这一理论基石不仅验证了新估计器的有效性,还为其实际应用提供了指导,确保其在多种场景下的可靠使用。通过大规模仿真实验,我们证明了该方法在估计精度以及检测可能被忽视的较弱信号方面,均优于现有最先进的估计器。