[__working draft__ ]
What is the Goal of this Project?
This project focuses on developing an ML model that to predict how a basement renovation project will affect the value of a house.
This Project uses the classic Housing Dataset from Ames Iowa.
It includes individual residential property in Ames, Iowa from 2006 to 2010. A total of 2930 observations and a large number of explanatory variables (23 nominal, 23 ordinal, 14 discrete, and 20 continuous). The data came directly from the Assessor’s Office in the form of a data dump from their records system.
After removal of extraneous variables, 80 variables remained that were directly related to property sales.
The data comprises many attributes of a property that a potential buyer could factor into their decision to buy a house and how much they are willing to pay for it.
Who is it For?
This information would be valuable to several categories of potential stakeholders:
- current homeowner – who is trying to be pragmatic about how much to spend on the renovation.
- Prospective Buyer – who intends to flip the house by buying it, changing or upgrading aspects of the home and then reselling it.
- Contractors or Renovation company – wanting to understand project and home context when providing project bids and estimates
Exploratory Data Analysis
Before Developing the ML model we want to understand details about the original dataset.
There are many details about the raw data tht could be interesting to explore, however we look at the data now with the goal of finding information helpful in developing the ML Model.
Some features of note:
Neighborhood has a clear relationship to Price
Exterior Quality has another clear positive relationship with Sales Price.
However, ‘Exterior Condition’ displays surprising plateau for anything above a Typical/Average rating.
We do a simple correlation analysis, to check how strongly each of the original features correlates with the Sale Price of the house.
In developing the predictive model the first goal is to accurately predict the price of a house. So we are interested in all the features that we can use. Second, because we are investigating basement renovations, we will look specifically for the importance of features related to basement renovations. Amount of Basement Finished Square Feet is shown in green, and Year Remodeled in light Blue
The Predictive ML model is based on an ensembling method known as a voting regressor. This takes the predictions from several model types and linearly combines their individual predictions to produce a final predicted Sale Price.
The Data Preprocessing and Model Structure are summarized by the two following graphics.
More Details about the preprocessing, feature engineering, and model training can be found in the technical appendix section.
The following graph describes the accuracy of our model for different test observations
The Performance of our model on the training and test sets. Although the stacked model slightly outperforms the Voting Regressor on the RMSLE, this difference is small and the Voting Regressor has a lower bias. So, the final implementation for our prediction model uses the Voting Regressor.
Now we can look at a simple visualization comparing the Prediction of our model against the actual training data Sale Price. Upon first inspection of this graph, our approach looks be an accurate model for the entire collection of training data.
One final verification of the integrity of our model is the q-q normal plot which indicates residuals for the Voting Regressor model are uniformly distributed, the desired outcome.