Bayesian Optimization (BO) is a standard tool for hyperparameter tuning thanks to its sample efficiency on expensive black-box functions. While most BO pipelines begin with uniform random initialization, default hyperparameter values shipped with popular ML libraries such as scikit-learn encode implicit expert knowledge and could serve as informative starting points that accelerate convergence. This hypothesis, despite its intuitive appeal, has remained largely unexamined. We formalize the idea by initializing BO with points drawn from truncated Gaussian distributions centered at library defaults and compare the resulting trajectories against a uniform-random baseline. We conduct an extensive empirical evaluation spanning three BO back-ends (BoTorch, Optuna, Scikit-Optimize), three model families (Random Forests, Support Vector Machines, Multilayer Perceptrons), and five benchmark datasets covering classification and regression tasks. Performance is assessed through convergence speed and final predictive quality, and statistical significance is determined via one-sided binomial tests. Across all conditions, default-informed initialization yields no statistically significant advantage over purely random sampling, with p-values ranging from 0.141 to 0.908. A sensitivity analysis on the prior variance confirms that, while tighter concentration around the defaults improves early evaluations, this transient benefit vanishes as optimization progresses, leaving final performance unchanged. Our results provide no evidence that default hyperparameters encode useful directional information for optimization. We therefore recommend that practitioners treat hyperparameter tuning as an integral part of model development and favor principled, data-driven search strategies over heuristic reliance on library defaults.
翻译:贝叶斯优化(BO)因其在昂贵黑盒函数上的样本高效性而成为超参数调优的标准工具。尽管大多数BO流程以均匀随机初始化开始,但主流机器学习库(如scikit-learn)内置的默认超参数值隐含着专家知识,本可作为加速收敛的信息化起点。这一假设虽具直观吸引力,却长期缺乏实证检验。我们通过从以库默认值为中心的截断高斯分布中采样点来初始化BO,并将所得轨迹与均匀随机基线进行比较,从而形式化该思想。我们开展了涵盖三个BO后端(BoTorch、Optuna、Scikit-Optimize)、三类模型族(随机森林、支持向量机、多层感知机)以及五个覆盖分类与回归任务的基准数据集的广泛实证评估。通过收敛速度和最终预测质量评估性能,并采用单侧二项检验确定统计显著性。在所有实验条件下,基于默认值的初始化相比纯随机采样均未产生统计显著优势(p值范围0.141-0.908)。对先验方差的敏感性分析证实:虽然更紧密地集中于默认值能改善早期评估,但这种短暂优势随优化进程逐渐消失,最终性能保持不变。我们的研究结果未发现默认超参数能为优化提供有效方向性证据。因此建议实践者将超参数调优视为模型开发的核心环节,优先采用基于数据的原理性搜索策略,而非依赖启发式的库默认值。