This article and its sequel form an introduction to the field of regression analysis. We start by presenting briefly a panorama of regression models: linear models, generalized linear models, nonparametric and semi-parametric regression models, non-linear models. Then in the remainder of the text we focus on simple linear regression, which is the situation where the mean of a variable Y, i.e. the response variable, is depending linearly on another variable called the regressor. We are concerned with standard usual context, that is we assume, together with other conditions, that the error variable is gaussian. In this context we present estimates obtained by least squares methods and their basic properties. Then operational tools for statistical inference are designed: confidence and prediction intervals, significance tests and anova tables. The issue of diagnostic tests to detect observations which play particular roles in the regression is addressed in some detail: outliers, observations with high leverage effect and influent observations are concerned. Finally we indicate some ideas to deal with the situation where some of the model assumptions are not satisfied. It is worth emphasizing that all along the text we make a special effort to find a compromise between mathematical rigor and applied statistical concerns. We give precise proofs for specific issues that we consider of particular significance. Finally we illustrate both methods and results all along the text with one unique dataset analyzed by means of the R software.
Received: December 2016
Published online: February 2017
Gérard GrégoireLaboratoire Jean Kuntzmann, Université Grenoble Alpes,