notes/math_notes/machine_learning/linear_regression.md
2026-04-16 18:35:03 -07:00

2.9 KiB

Linear Regression

In linear regression, given features and labels (X, Y), where Y is real-valued, we try to learn a function f(x) to predict Y given x. Figure 2 outlines this function:

 \hat{y} = w_0 + w_1x_1 + w_2x_2 + \dots + w_mx_m = \mathbf{w}^T\mathbf{X} 

Figure 2: Learning Function

where \mathbf{X} = x_1\text{, } \dots \text{, } x_m are the feature values and \mathbf{w} = w_0 \text{, } \dots \text{, } w_n can be seen as weights.

The weights determine how the corresponding feature affects the predicted value. Thus, our task is to find the appropriate values of w.

Cost function: The cost function helps us to figure out the best possible values for w. For the cost function, we use the Mean Squared Error (MSE), Figure 3.

 MSE(\mathbf{w}) = \frac{1}{m}\sum_{i=1}^{m}{\left(\hat{y_i} - y_i\right)^2} 

Figure 3: MSE

Using this MSE function we are going to update the values of w, such that the MSE value settles at the minimum. The method of updating w to minimize the cost function (MSE) is called gradient descent. We initialize the values of w and then update these values iteratively to minimize the cost. Sometimes the cost function can be a non-convex function where you can settle at a local minimum, but for linear regression, it is always a convex function. To update w, we take gradients from the cost function. To find these gradients, we take partial derivatives with respect to w. Figure 4 outlines this 'update rule'.

  • Initialize w_i
  • Repeat until convergence \{w_i := w_i - \alpha \times \frac{\partial MSE(\mathbf{w})}{\partial w_i}\} Parameter \alpha is called learning rate.

Figure 4: Update Rule

Code: In order to perform linear regression, we are going to use a Python module called scikit learn. In the following example, we will use the California Housing Data Set. The data set contains information about the housing values in the suburbs of Boston.

There are 14 attributes for each X. Examples of these attributes include:

  • MedInc per capita crime rate by town
  • HouseAge Average age of a house in years
  • AveRooms Average Rooms in a home
  • Population City population

The target value Y is:

  • MedHouseVal - Median value of owner-occupied homes in $1000's

Next, we split the data into training and testing sets. We train the model with 80% of the samples and test with the remaining 20%. Finally, we will evaluate our model using MSE.

import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X = housing ['data']
Y = housing ['target']

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state=5)

lr = LinearRegression()
lr.fit(X_train, Y_train)
Y_pred = lr.predict(X_test)

mse = sklearn.metrics.mean_squared_error(Y_test, Y_pred)

print('Mean squared error for test set:', mse)