CSS 300 Module 4 Activity Worksheet
Use this worksheet to complete your lab activity. Submit it to the applicable assignment
submission folder when complete.
Deliverable:
–
A word document showing and explaining the results of the linear regression model
Using the Weather.csv dataset
1. Start by exploring the Weather.csv data using the describe() function from the last
module
2. You will need to import the Weather.csv dataset and the following libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as seabornInstance
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
%matplotlib inline
3. Next plot data points on a 2D graph to check for a relationship manually using the code
below:
df.plot(x=’MinTemp’, y=’MaxTemp’, style=’o’)
plt.title(‘MinTemp vs MaxTemp’)
plt.xlabel(‘MinTemp’)
plt.ylabel(‘MaxTemp’)
plt.show()
4. Plot the MaxTemp data using the code below:
plt.figure(figsize=(15,10))
plt.tight_layout()
seabornInstance.distplot(df[‘MaxTemp’])
5. Divide the data into attributes and labels by using the following:
X = df[‘MinTemp’].values.reshape(-1,1)
y = df[‘MaxTemp’].values.reshape(-1,1)
6. Split the data into training and testing sets using a 80% training and 20% testing split
using the code below:
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=0)
7. Train the algorithm using linear regression following the code below:
regressor = LinearRegression()
regressor.fit(X_train, y_train) #training the algorithm
8. Print the intercepts and slope using the following:
#To retrieve the intercept:
print(regressor.intercept_)
#For retrieving the slope:
print(regressor.coef_)
9. Use the test data to see how accurately the model works using the following:
y_pred = regressor.predict(X_test)
10. Compare the actual output values to the predicted values using the following:
compare = pd.DataFrame({‘Actual’: y_test.flatten(), ‘Predicted’:
y_pred.flatten()})
compare
11. Create a scatter plot with a line portraying the model to evaluate the model visually
using the following:
plt.scatter(X_test, y_test, color=’gray’)
plt.plot(X_test, y_pred, color=’red’, linewidth=2)
plt.show()