Linear Regression
Regression techniques are used in machine learning to predict continuous values, for example predicting salaries, ages or even profits.


Deep Dive
- Importing the libraries.
- Importing the data set.
- Classifying dependent and independent variables.
- Creating training and test sets
- Creating a Simple Linear Regressor.
- Training the regressor with training data.
- Predicting the salary for a test set.
- Calculating the accuracy of the predictions.
- Comparing Actual and Predicted Salaries for the test set.
Step 1. Data Preprocessing
Step 2. Simple Linear Regression
"""# I. Preparing the dataset """
#1 Importing essential libraries
import pandas as pd
#2 Importing the dataset
#https://drive.google.com/file/d/13_kwGjkC1z7lA0w9FQIEz5huzDKsAOc5/view?usp=sharing
dataset = pd.read_csv('Salary_Data.csv')
#3 classify dependent and independent variables
X = dataset.iloc[:,:-1].values #independent variable YearsofExperience
y = dataset.iloc[:,-1].values #dependent variable salary
print("\nIdependent Variable (Experience):\n", X)
print("\nDependent Variable (Salary):\n", y)
#4 Creating training set and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X ,y, test_size = 1/3,random_state = 0)
print("\n\nTraining Set :\n----------------\n")
print("X = \n", X_train)
print("y = \n", y_train)
print("\n\nTest Set :\n----------------\n")
print("X = \n",X_test)
print("y = \n", y_test)"""# II. Simple Linear Regressor """
#5 import SLR library
from sklearn.linear_model import LinearRegression
#6 Train the Regressor with training set
regressor = LinearRegression()
regressor.fit(X_train, y_train)
#7 predict the outcome of test sets
y_Pred = regressor.predict(X_test)
print("\n\nPredictions = ", y_Pred)
#8 Claculating the Accuracy of the predictions
from sklearn import metrics
print("Prediction Accuracy = ", metrics.r2_score(y_test, y_Pred))
#9 Comparing Actual and Predicted Salaries for he test set
print("\nActual vs Predicted Salaries \n-------------------------\n")
print("Actual :\n ", y_test)
print("Predicted :\n ", y_Pred)Multiple Linear Regression

#1 Importing the libraries
import numpy as np
import pandas as pd
#2 Importing the data set
#https://drive.google.com/file/d/1iXn2HmzPYeH2p-ZTHa3jUyEpStoDb8CG/view?usp=sharing
dataset = pd.read_csv('beer_data.csv')
#Printing first 10 rows of the dataset
print("\n----------------------------\n",dataset.head(10))
#3 Dealing with the categorical data
#spliting Cellar Temperature into Maximum and Minimum based on the given data and converting the type from str to int
dataset['Minimum_Cellar_Temp'] = dataset['Cellar Temperature'].apply(lambda x : int(x.split('-')[0].strip()))
dataset['Maximum_Cellar_Temp'] = dataset['Cellar Temperature'].apply(lambda x : int(x.split('-')[1].strip()))
#New dataset with selected features
dataset = dataset[['ABV', 'Ratings','Minimum_Cellar_Temp','Maximum_Cellar_Temp', 'Score']]
#Printing first 10 rows of the dataset
print("\n----------------------------\n",dataset.head(10))
#Printing the summary of the dataset
print("\n----------------------------\n")
print(dataset.info())#4 Classifying dependent and independent variables
#All columns except the last column are independent features- (Selecting every column except Score)
X = dataset.iloc[:,:-1].values
#Only the last column is the dependent feature or the target variable(Score)
y = dataset.iloc[:,-1].values
##5 Creating training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2,random_state = 0)
#################Data Preprocessing Ends #################################
""" Multiple Linear regression """
#6 Creating the Regressor and training it with the training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression(normalize = True)
#7 Feeding the data and training the model
regressor.fit(X_train,y_train)
##8 Predicting the Score for test set observations
y_pred = regressor.predict(X_test)
#printing the predictions
print("\n----------------------------\nPredictions = \n",y_pred)
#9 Calculating score from Root Mean Log Squared Error
def rmlse(y_test, y_pred):
error = np.square(np.log10(y_pred +1) - np.log10(y_test +1)).mean() ** 0.5
score = 1 - error
return score
print("\n----------------------------\nRMLSE Score = ", rmlse(y_test, y_pred))Linear Regression
By Data Science Portal
Linear Regression
Simiple & Multiple Linear Regression Techniques
- 71