A rigorous introduction to linear models

This book is meant to provide an introduction to linear models and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to ordinary least squares. In machine learning, the output is usually a nonlinear function of the input. Deep learning even aims to find a nonlinear dependence with many layers, which require a large amount of computation. However, most of these algorithms build upon simple linear models. We then describe linear models from different perspectives and find the properties and theories behind the models. The linear model is the main technique in regression problems, and the primary tool for it is the least squares approximation, which minimizes a sum of squared errors. This is a natural choice when we're interested in finding the regression function which minimizes the corresponding expected squared error. This book is primarily a summary of purpose, significance of important theories behind linear models, e.g., distribution theory and the minimum variance estimator. We first describe ordinary least squares from three different points of view, upon which we disturb the model with random noise and Gaussian noise. Through Gaussian noise, the model gives rise to the likelihood so that we introduce a maximum likelihood estimator. It also develops some distribution theories via this Gaussian disturbance. The distribution theory of least squares will help us answer various questions and introduce related applications. We then prove least squares is the best unbiased linear model in the sense of mean squared error, and most importantly, it actually approaches the theoretical limit. We end up with linear models with the Bayesian approach and beyond.

翻译：本书旨在介绍线性模型及其背后的理论。我们的目标是为已接触过普通最小二乘法的读者提供严格的入门指导。在机器学习中，输出通常是输入的非线性函数。深度学习甚至试图通过多层结构发现非线性依赖关系，这需要大量计算。然而，这些算法大多建立在简单线性模型之上。因此，我们从不同角度描述线性模型，探究其背后的性质与理论。线性模型是回归问题中的主要技术，其核心工具是最小二乘逼近，即最小化误差平方和。当我们希望找到使相应期望平方误差最小的回归函数时，这是一个自然的选择。本书主要总结了线性模型背后重要理论的目的和意义，例如分布理论和最小方差估计量。我们首先从三个不同视角描述普通最小二乘法，并在此基础上用随机噪声和高斯噪声扰动模型。通过高斯噪声，模型引出了似然函数，从而引入最大似然估计量。同时，高斯扰动还发展了部分分布理论。最小二乘法的分布理论有助于回答各种问题并引入相关应用。接着，我们证明在均方误差意义上，最小二乘法是最优无偏线性模型，且最重要的是它实际上逼近了理论极限。最后，我们以贝叶斯方法及其他扩展内容结束对线性模型的讨论。