House Prices: Advanced Regression Techniques

Build machine learning model using linear regression algorithms

Khushipatidar
3 min readOct 27, 2020

Here we are going to learn and build model for house price prediction .This is my first step towards data science through this Kaggle competition .You can download data from here. It has 2 csv files (train.csv,test.csv).

Here I am going to explain the approach to build the model . You can refer and download the code from here.

These are the few simple steps to solve any machine learning problem:-

  1. Get data : This problem we have already solved ,as we gathered data from Kaggle.

2. Data Preprocessing(Clean, Prepare and Manipulate Data):

Here comes the most time consuming part 😣-> finding missing values , outliers ,bad encoding ,wrongly labeled and biased data….. Don’t worry we are going to solve this problem through python’s library .😀

Checking missing/null values using isnull function:-

Now we can clearly see that there is more than 30% of missing/null values in both test and train columns(‘ PoolQC’ , ’MiscFeature’ ,’Alley’ ,’Fence’).So deleting those columns will give better result.

Here we will use mode /median method to fill null values in other columns:-

-> mode method for categorical columns.

-> median method for numerical columns.

Model don’t understand categorical columns

Now create dummies of all the categorical columns(from both train and test data)using one-hot encoding(Note : There are several other methods as well to change categorical values into numerical, you can explore on internet.)

3. Train our model : We are using linear regression algorithms to train our model (As Price is a continues value) with all the columns as feature except price ,which is our target column.

from sklearn import datasets, linear_model, metrics reg=linear_model.LinearRegression()
reg.fit(data_concate[columns],y)

4.Test Data: Using predict function to predict our test data pricing.

predicted_prices=reg.predict(test_concate)

5. Accuracy : We are checking training accuracy of our model using score function.

reg.score(data_concate[columns],y)

Here I took all the columns while building our model and got 0.23293% error . Model accuracy can be increased further by precise feature selection. Tunning our model ,removing biased data and using different machine learning algorithms .I will try using different algorithms .If you have any suggestion please feel free to write and give feedbacks .

You can download/refer detailed code from here.

Thank you for the reading…

--

--

Khushipatidar
Khushipatidar

Written by Khushipatidar

I am undergraduate student . Exploring knowledge of data science through different projects and platforms.

No responses yet