With the development of new sensors and monitoring devices, more sources of data become available to be used as inputs for machine learning models. These can on the one hand help to improve the accuracy of a model. On the other hand, combining these new inputs with historical data remains a challenge that has not yet been studied in enough detail. In this work, we propose a transfer learning algorithm that combines new and historical data with different input dimensions. This approach is easy to implement, efficient, with computational complexity equivalent to the ordinary least-squares method, and requires no hyperparameter tuning, making it straightforward to apply when the new data is limited. Different from other approaches, we provide a rigorous theoretical study of its robustness, showing that it cannot be outperformed by a baseline that utilizes only the new data. Our approach achieves state-of-the-art performance on 9 real-life datasets, outperforming the linear DSFT, another linear transfer learning algorithm, and performing comparably to non-linear DSFT.
翻译:随着新型传感器与监测设备的发展,更多数据源可作为机器学习模型的输入。这些数据一方面有助于提升模型精度,另一方面,如何将新数据与历史数据有效结合仍是一个尚未被充分研究的挑战。本文提出一种迁移学习算法,用于整合具有不同输入维度的新旧数据。该方法易于实现、计算效率高,其计算复杂度与普通最小二乘法相当,且无需超参数调优,在新数据有限时可直接应用。与其他方法不同,我们对其稳健性开展了严格的理论分析,证明该方法不会被仅使用新数据的基线方案所超越。在9个真实数据集上,该方法取得了当前最优性能,优于线性DSFT(另一种线性迁移学习算法),并与非线性DSFT性能相当。