Lesson 1 - Linear Regression

Today we will be going over simple linear regression

A linear regression is a line of best fit
A line of best fit has a correlation coefficient R^2.

1) Packages

Let’s first import all of the packages we need for this assignment.

numpy helps us make our training and test set arrays
sklearn helps us use the logistic regression function
tensorflow is what we will use to build our neural networks
keras helps us to make our neural networks

#Import TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras

#Helper libraries
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import cv2
from pytube import YouTube
import math
import sklearn

2) Import Data

Today we will be working with data from the Boston Housing Market
we will be working with inputs like crime rate, location, etc, to find the target
There are 13 features per house
The target is the median home price in ($ k)

from keras.datasets import boston_housing
(train_data, train_price) , (test_data, test_price) = boston_housing.load_data()

Using TensorFlow backend.

3) Take a look at the data

We have 404 houses in our training set We have 102 houses in our test set

print(f'Training Data : {train_data.shape}')
print(f'Test Data: {test_data.shape}')

Training Data : (404, 13)
Test Data: (102, 13)

4) Normalize the data by computing the z score

Only normalize the mean and standard deviation of the training data

mean = train_data.mean(axis = 0)
train_data -= mean
sigma = train_data.std(axis = 0)
train_data /= sigma

test_data -= mean
test_data /= sigma

5) Build the multiple linear regression model

This is a multiple regression, so there will be multiple inputs

from keras import models
from keras import layers

def buildLinearRegression():

    model = models.Sequential([

    layers.Dense(1, activation='linear', input_shape=(train_data.shape[1],))

    ])

    model.compile(optimizer = 'rmsprop' , loss = 'mse', metrics = ['mae'])



    return model

6) Train the model

linearModel = buildLinearRegression()

n_epochs = 10000

linearModel.fit(train_data, train_price, epochs = n_epochs, batch_size = 32, verbose = 0)

test_mse, test_mae = linearModel.evaluate(test_data, test_price, verbose = 0)

print(f'Model Mean Sqaured Error: {test_mse}')
print(f'Model Mean Average Error Value: {test_mae}')

Model Mean Sqaured Error: 23.02216728060853
Model Mean Average Error Value: 3.449336290359497

7) Test the model

predictions = linearModel.predict(test_data)

8) Get weights and bias

Weights = linearModel.layers[0].get_weights()[0]
Bias = linearModel.layers[0].get_weights()[1]

9) Plotting two features

import matplotlib.pyplot as plt

idx = 5

x = test_data[:, np.newaxis, idx]

y = test_price


def bestFitLine(m, b):

    axes = plt.gca()
    x_vals = np.array(axes.get_xlim())
    y_vals =  m * x_vals + b
    plt.plot(x_vals, y_vals, '--', color = 'red')


plt.figure()
plt.scatter(x, y, marker = '.', color = 'blue')

bestFitLine(Weights[idx], Bias)

png

10) Now it’s your turn!

We will be comparing the differences in linear and logistic regression on the same data set.

#We have to modify the data to make it categorical

maxtrain = np.amax(train_price)
maxtest = np.amax(test_price)

for i in range(len(train_price)):
    if(train_price[i] < maxtrain/2):
        train_price[i] = 0;
    else:
        train_price[i] = 1



for i in range(len(test_price)):
    if(test_price[i] < maxtest/2):
        test_price[i]= 0;

    else:
        test_price[i] = 1

# Define the logistic model here

from sklearn.linear_model import LogisticRegression


# all parameters not specified are set to their defaults
logisticRegr = LogisticRegression()


# Keep in mind the scoring metrics we will be using.

11) Train the model

# Train the model here

logisticRegr.fit(train_data, train_price)

C:\Users\coder\Anaconda3\envs\tf_gpu\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)





LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

12) Test the model on the data

# Test model here
logistic_predictions = logisticRegr.predict(test_data)
logistic_score = logisticRegr.score(test_data, test_price)
print(logistic_score)

0.8921568627450981

13) Get weights and compare them with thsose of the linear regression

# Get weights
logisticRegr.coef_

#Compre linear and logistic regression by printing the weight arrays

print("Linear weights:")
print(Weights)

print("Logistic weights:")
print(logisticRegr.coef_)

Linear weights:
[[-1.1109653 ]
 [ 1.3438065 ]
 [ 0.03124908]
 [ 0.9601239 ]
 [-2.372176  ]
 [ 2.3962164 ]
 [ 0.21242434]
 [-3.453772  ]
 [ 2.8981006 ]
 [-1.9676172 ]
 [-1.9720079 ]
 [ 0.83178866]
 [-4.0405097 ]]
Logistic weights:
[[-0.15257395  0.30051083 -0.46654762  0.31413753 -0.20023763  1.33839839
  -0.30362151 -1.10942773  1.53482915 -0.71457512 -0.41799184  0.16843306
  -2.47917009]]

14) Plot the line of best fit of a logistic regression on two features

# Plot line here

def bestFitLine(m):

    axes = plt.gca()
    x_vals = np.array(axes.get_xlim())
    y_vals =  m * x_vals
    plt.plot(x_vals, y_vals, '--', color = 'red')


plt.figure()
plt.scatter(x, y, marker = '.', color = 'blue')

bestFitLine(Weights[idx])

png

Resources

https://www.kaggle.com/shanekonaung/boston-housing-price-dataset-with-keras

Last updated on Wed, Jan 22, 2020