Build and Train Linear Regression Model

Business Scenario

Welcome back!

Today is your fourth day as a Junior Data Scientist at AutoVision Analytics.

Yesterday, you improved the quality of the automobile dataset by analyzing feature relationships, applying scaling techniques, engineering new features, and selecting the most relevant variables for prediction.

The management team is satisfied with the prepared dataset and has now approved the development of the first Machine Learning model for the project.

Your manager has assigned you the responsibility of building a Vehicle Price Prediction Model using Linear Regression. By the end of the day, you should have a trained model capable of estimating vehicle prices for unseen vehicles.

Pre-Lab Preparation

Topic : Supervised Learning – Regression

1) Simple Linear Regression
2) Multiple Linear Regression
3) Polynomial Regression

git pull origin branchName

Git Pull

Task 1: Prepare Data for Model Building

Before training a Machine Learning model, the dataset must be cleaned and divided into input features and the target variable.

The Data Science team has already identified and prepared the most relevant features. Your responsibility is to prepare the data for training and testing.

Create a new file in Google Collab and upload the given Dataset

1

Click to download dataset : Automobile_dataset.xls

df=pd.read_csv("Automobile_data.csv")

Verify and understand the Dataset

2

df.head()

Standardised the data in the columns

3

df["normalized-losses"].replace("?",np.nan,inplace=True)

Check for Null Values

4

df.isnull().sum()

Fill the missing Values

5

df["normalized-losses"] = pd.to_numeric(
    df["normalized-losses"],
    errors="coerce"
)

m=df["normalized-losses"].mean()
print("Mean of normalized-losses : ",m)
df["normalized-losses"].fillna(m,inplace=True)

Verify

6

df.isnull().sum()

Do the same for Horse-power Column

7

#replace ? with null values in "horsepower" column permanently
df["horsepower"].replace("?",np.nan,inplace=True)
#change the datatype of df["horsepower"] column from object to float
#permanently
df["horsepower"]=df["horsepower"].astype("float")
df.isnull().sum()
=df["horsepower"].mean()
print("Mean of horsepower : ",m)
df["horsepower"].fillna(m,inplace=True)

separate numerical type column and object type column and store in new dataset

8

df_num=df.select_dtypes(["int64","float64"])#hold to int and float type data 
df_cat=df.select_dtypes(object)#hold object type data

Encode the Categorical Columns

9

from sklearn.preprocessing import LabelEncoder
for col in df_cat :
    le=LabelEncoder() #create the object of LabelEncoder class
    df_cat[col]=le.fit_transform(df_cat[col])
df_cat.head()

Verify

10

Concatenation of both dataset df_num and df_cat and hold new dataset df_new

11

df_new=pd.concat([df_num,df_cat],axis=1) #axis=1 column wise
df_new.head()

Select independent means input variables(X) and target means dependent variable(Y)

12

X=df_new.drop("price",axis=1) #apart from price, all columns hold in X variable
Y=df_new["price"] #output variable

Train test split 70%-30%

13

from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,
                                              random_state=1)

Check the shape of test and train data

14

X_train.shape,X_test.shape

Apply StandardScaler on X_train and X_test

15

from sklearn.preprocessing import StandardScaler
ss=StandardScaler()
X_train=ss.fit_transform(X_train)
X_test=ss.transform(X_test)

Verify

16

X_train

Task 2: Build and Train Linear Regression Model

The management team now wants the first Machine Learning model to be developed.

You have been asked to build a Linear Regression model that can learn pricing patterns from historical automobile data.

What is Linear Regression?

Linear Regression is a supervised Machine Learning algorithm used to predict continuous numerical values by finding the relationship between input features and a target variable.

Types of Linear Regression:

  • Simple Linear Regression: Uses one independent variable.
  • Multiple Linear Regression: Uses two or more independent variables. (Used in this project.)
  • Polynomial Regression: Used when the relationship between variables is non-linear and follows a curved pattern.

In this lab, we will use Multiple Linear Regression to predict vehicle prices using multiple automobile features.

Import Linear Regression

1

from sklearn.linear_model import LinearRegression

Create Model Object

2

lr=LinearRegression()

Train the Model with 70% data

3

lr.fit(X_train,Y_train)

What does .fit() do?

The .fit() function trains the Machine Learning model using historical data.

The model learns patterns and relationships between vehicle features and vehicle prices.

Task 3: Predict Vehicle Prices

Now that the model has been trained, it is time to test its prediction capabilities.

The management team wants to know whether the model can estimate prices for vehicles it has never seen before.

Your task is to generate predictions using the testing dataset.

Generate Predictions

1

y_pred = lr.predict(X_test)

View Predicted Prices

2

y_pred[:10]

 

Great job!

You have successfully completed 4th Lab Build and Train Linear Regression Model.

In this lab, you prepared the dataset for model building, separated features and target variables, performed Train-Test Split, built a Linear Regression model, trained it using historical automobile data, and generated predictions for unseen vehicles.

Your first Machine Learning model is now ready for evaluation and improvement.

You are now ready to move to the next stage of the Automobile Intelligence Project.

Checkpoint

   Git Push

git push origin branchName

Next-Lab Preparation

Topic : Supervised Learning – Regression

1) Regularisation techniques (Ridge & Lasso)

ML Lab 4 : Build and Train Linear Regression Model

By Content ITV

ML Lab 4 : Build and Train Linear Regression Model

  • 20