# Linear Regression In linear regression, given features and labels (X, Y), where Y is real-valued, we try to learn a function f(x) to predict Y given x. Figure 2 outlines this function: $$ \hat{y} = w_0 + w_1x_1 + w_2x_2 + \dots + w_mx_m = \mathbf{w}^T\mathbf{X} $$ _Figure 2: Learning Function_ where $\mathbf{X} = x_1\text{, } \dots \text{, } x_m$ are the feature values and $\mathbf{w} = w_0 \text{, } \dots \text{, } w_n$ can be seen as weights. The weights determine how the corresponding feature affects the predicted value. Thus, our task is to find the appropriate values of **w**. **Cost function:** The cost function helps us to figure out the best possible values for **w**. For the cost function, we use the Mean Squared Error (MSE), Figure 3. $$ MSE(\mathbf{w}) = \frac{1}{m}\sum_{i=1}^{m}{\left(\hat{y_i} - y_i\right)^2} $$ _Figure 3: MSE_ Using this MSE function we are going to update the values of w, such that the MSE value settles at the minimum. The method of updating w to minimize the cost function (MSE) is called gradient descent. We initialize the values of w and then update these values iteratively to minimize the cost. Sometimes the cost function can be a non-convex function where you can settle at a local minimum, but for linear regression, it is always a convex function. To update w, we take gradients from the cost function. To find these gradients, we take partial derivatives with respect to w. Figure 4 outlines this 'update rule'. - Initialize $w_i$ - Repeat until convergence $\{w_i := w_i - \alpha \times \frac{\partial MSE(\mathbf{w})}{\partial w_i}\}$ Parameter $\alpha$ is called learning rate. _Figure 4: Update Rule_ Code: In order to perform linear regression, we are going to use a Python module called scikit learn. In the following example, we will use the California Housing Data Set. The data set contains information about the housing values in the suburbs of Boston. There are 14 attributes for each **X**. Examples of these attributes include: - MedInc per capita crime rate by town - HouseAge Average age of a house in years - AveRooms Average Rooms in a home - Population City population The target value **Y** is: - MedHouseVal - Median value of owner-occupied homes in $1000's Next, we split the data into training and testing sets. We train the model with 80% of the samples and test with the remaining 20%. Finally, we will evaluate our model using MSE. ```py import sklearn from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.datasets import fetch_california_housing housing = fetch_california_housing() X = housing ['data'] Y = housing ['target'] X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state=5) lr = LinearRegression() lr.fit(X_train, Y_train) Y_pred = lr.predict(X_test) mse = sklearn.metrics.mean_squared_error(Y_test, Y_pred) print('Mean squared error for test set:', mse) ```