Root Mean Square Error

   Root Mean Square Error 

Before looking into root mean square error, let's see what error means. 

what is error 

Error is a frequently used measure of the differences between values predicted by a model, or an estimator and the values observed.

Types of Errors 

1.MSE (Mean Squared Error)
2.RMSE (Root Mean Squared Error)

MSE (Mean Squared Error)

MSE (Mean Squared Error) represents the difference between the original and predicted values which are extracted by squaring the average difference over the data set. It is a measure of how close a fitted line is to actual data points. The lesser the Mean Squared Error, the closer the fit is to the data set. The MSE has the units squared of whatever is plotted on the vertical axis.

RMSE (Root Mean Squared Error)

RMSE (Root Mean Squared Error) is the error rate by the square root of MSE. RMSE is the most easily interpreted statistic, as it has the same units as the quantity plotted on the vertical axis or Y-axis. RMSE can be directly interpreted in terms of measurement units, and hence it is a better measure of fit than a correlation coefficient.

Practical 

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer

############## DATA CLEANING #####################################################
df=pd.read_csv(r'D:\Coding\Python\Machine Learning\Algorithms\Bengaluru_House_Data.csv')
# print(df.isnull().sum())
a=df.isnull().sum()/df.shape[0]*100
n=a[a>17].keys()
df=df.drop(columns=n)

# Numerical
num_var=df.select_dtypes(include=['int64','float64']).columns
print(df[num_var])
im=SimpleImputer(strategy='mean')
im.fit(df[num_var])
df[num_var]=im.transform(df[num_var])
print(df[num_var].isnull().sum())

# Categorical
cat_var=df.select_dtypes(include='O').columns
imp=SimpleImputer(strategy='most_frequent')
imp.fit(df[cat_var])
df[cat_var]=imp.transform(df[cat_var])

print(df.isnull().sum().sum())

################# DATA PREPROCESSING ###########################################
df2=df.drop(columns=df[cat_var])
# print(df2)
################## DATA SPlITING ################################################
X=df2.drop(columns='price', axis=1)
y=df2['price']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=69)

################## FEATURE SCALING ####################################################
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
sc.fit(X_train)
X_train=sc.transform(X_train)
X_test=sc.transform(X_test)

################### TRAINING ####################################################
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
lr.fit(X_train, y_train)

# print(lr.coef_) # used to  print feature coefeciant that our model has learned


print(lr.intercept_)

################# PREDICTION ###############################################
pre=lr.predict(X_test)
print(pre) # the predicted values
print(y_test) # the original values

score=lr.score(X_test, y_test) # shows you the accuracy percentage of your model
print(score*100)
 
############### Model Evaluation #######################################################
from sklearn.metrics import mean_squared_error
import numpy as np
y_pre=lr.predict(X_test)
print(y_pre)
# print(y_test)

mse=mean_squared_error(y_test, y_pre)
rmse=np.sqrt(mse)

print('MSE', mse)
print('RMSE', rmse)


I would appreciate if you could make it better 😊

Comments

Post a Comment

Popular Posts