Traditional metrics like accuracy, F1-score, and precision are frequently used to evaluate machine learning models, however they may not be sufficient for evaluating performance on tiny, unbalanced, or high-dimensional datasets. A dataset-adaptive, normalized metric that incorporates dataset characteristics like size, feature dimensionality, class imbalance, and signal-to-noise ratio is presented in this study. Early insights into the model's performance potential in challenging circumstances are provided by the suggested metric, which offers a scalable and adaptable evaluation framework. The metric's capacity to accurately forecast model scalability and performance is demonstrated via experimental validation spanning classification, regression, and clustering tasks, guaranteeing solid assessments in settings with limited data. This method has important ramifications for effective resource allocation and model optimization in machine learning workflows.
翻译:传统指标如准确率、F1分数和精确率常用于评估机器学习模型,然而在评估小型、不平衡或高维数据集上的性能时可能不够充分。本研究提出了一种数据集自适应的归一化指标,该指标整合了数据集特征,包括规模、特征维度、类别不平衡和信噪比。该指标提供了一个可扩展且适应性强的评估框架,能够在具有挑战性的条件下为模型性能潜力提供早期洞察。通过涵盖分类、回归和聚类任务的实验验证,证明了该指标在准确预测模型可扩展性和性能方面的能力,确保了在数据有限环境下的稳健评估。该方法对机器学习工作流程中的高效资源分配和模型优化具有重要意义。