Linear Regression

 Linear Regression 

#101daysofcode
#day-3 


Linear Regression is predicting the unknown values by looking at the known values. Example predicting someone's height by his weight and vice versa 

There are many libraries that you could use for performing Linear Regression, but here I will be using SciKit Learn.

Before applying linear regression to your data make sure that it is cleaned- it means that it should not contain any missing values and your data must contain numerical values only if it has categorical values, you can use categorical encoding for it 

Practical 

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer

############## DATA CLEANING #####################################################
df=pd.read_csv(r'D:\Coding\Python\Machine Learning\Algorithms\Bengaluru_House_Data.csv')
# Numerical
num_var=df.select_dtypes(include=['int64','float64']).columns
print(df[num_var])
im=SimpleImputer(strategy='mean')
im.fit(df[num_var])
df[num_var]=im.transform(df[num_var])
print(df[num_var].isnull().sum())

# Categorical
cat_var=df.select_dtypes(include='O').columns
imp=SimpleImputer(strategy='most_frequent')
imp.fit(df[cat_var])
df[cat_var]=imp.transform(df[cat_var])

print(df.isnull().sum().sum())

################# DATA PREPROCESSING ###########################################
df2=df.drop(columns=df[cat_var])
# print(df2) #this is not very effective way of data preprocessing I recommend, to use
#different method
################## DATA SPlITING ################################################
X=df2.drop(columns='price', axis=1)
y=df2['price']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=69)

################## FEATURE SCALING ####################################################
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
sc.fit(X_train)
X_train=sc.transform(X_train)
X_test=sc.transform(X_test)

################### TRAINING ####################################################
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
lr.fit(X_train, y_train)

# print(lr.coef_) # used to  print feature coefeciant that our model has learned


print(lr.intercept_)

################# PREDICTION ###############################################
pre=lr.predict(X_test)
print(pre) # the predicted values
print(y_test) # the original values

score=lr.score(X_test, y_test) # shows you the accuracy percentage of your model
print(score*100)
print(pre)

Comments

Popular Posts