Linear Regression Model
Linear Regression model is a regression technique of predictive analytics (Elkan, 2013). Let x be a vector of real numbers, of fixed length p, which is called dimension or dimensionality of x. Let y be its real valued label.
Linear Regression model is given by:
In the above function, right hand side is called linear function of x, which is defined by coefficients b0 to bp These coefficients are given as output by data mining algorithms.
In the above model, if x=0 for all i=1, 2, ..., p, then y=b0 which is called intercept. However it is not possible for x to have all its values as 0. Coefficient b. represents the amount by which y increases if xt increases by 1 and the values of all other features remain unchanged.
The objective of this model is to find optimal values of coefficients b0 to bp By optimality, it is meant to say that sum of squared errors on the training set must be minimized, where the squared error on training example i is (y. — y.)2. This algorithm finds the following:
The objective function f (x.; b) — y.) is called sum of squared error or SSE
in short. If the number of training tuples n is less than number of features p then the optimal values of coefficients are not unique. Even if n is greater than p, the optimal coefficients have multiple equivalent values, if some of the input features are linearly related. Here, “equivalent” means that the different sets of coefficients achieve the same minimum SSE.
Now the regression equation are best suited for prediction. There are two types of predictions:
- 1. Predicting the value of y, which is called a response variable. The response variable corresponds to value of predictor variable, x.
- 2. Computation of mean response /u0 when x=x0. For this case, the mean response
- 0 is estimated by Д0 = b0 + btx0
There are various other techniques that comes under the category of Regression model, for e.g. Logistic Regression Model, Discrete Choice Model, Probit Regression etc.