XK Logo
Back to Portfolio

Predictive Modeling of Housing Prices

Charlotte, North Carolina | Spatial Data Analysis

Charlotte skyline aerial view

Overview

This project developed a predictive model for housing prices in Charlotte, NC by integrating spatial data on property attributes, neighborhood crime rates, park proximity, and other key features. The model aimed to identify factors influencing home values and uncover spatial patterns that impact property pricing.

Objective

To build a robust predictive model that estimates home prices using spatial and non-spatial predictors and to evaluate spatial autocorrelation patterns in residuals to improve model accuracy.

Methodology

Data Collection and Cleaning

Collected geospatial data on:

  • Housing attributes (price, size, year built, etc.)
  • Crime incidents (local crime counts by ZIP code)
  • Park locations (distance from each property)

Cleaned data by:

  • Removing extreme outliers using interquartile range (IQR) filtering
  • Filtering records with incomplete or invalid property characteristics
  • Ensured data alignment using Charlotte ZIP code boundaries

Feature Engineering

Created engineered variables for improved model performance:

  • Log-transformed home price and home size to address skewness
  • Interaction terms such as bedrooms x full baths to capture joint effects
  • Distance to nearest park calculated using spatial distance metrics in R
  • Encoded categorical variables for heating type, property type, and building grade

Exploratory Data Analysis

Conducted correlation analysis to identify key predictors:

Correlation Matrix of Housing Variables

Correlation matrix showing relationships between housing variables

Price vs Crime Count

Relationship between property prices and neighborhood crime counts

Price vs Distance to Nearest Park

Relationship between property prices and distance to nearest park

Price vs Year Built

Relationship between property prices and year built

Price vs Log of Area

Relationship between property prices and logarithm of property area

Model Building and Evaluation

Trained an OLS regression model with selected features:

  • Key predictors: shape_Area, yearbuilt, fullbaths, crime_count, distance_to_nearest_park, and engineered interaction terms
  • Performed 10-fold cross-validation for model validation
Observed vs Predicted Values

Observed vs predicted housing prices showing model fit

Log-Transformed Observed vs Predicted Values

Log-transformed observed vs predicted values showing improved model fit

Model Diagnostic Plots

Diagnostic plots for the regression model showing residual patterns

Residual Analysis and Spatial Autocorrelation

Mean Residuals by Neighborhood

Spatial distribution of model residuals showing areas of under and over prediction

Observed and Permuted Moran's I

Moran's I test results showing significant spatial autocorrelation in residuals

  • Visualized residuals using spatial mapping to identify clustering patterns
  • Conducted Moran's I test to assess spatial dependence in residuals
  • Moran's I = 0.214 (p < 0.05), indicating moderate spatial autocorrelation
  • Identified underpredicted homes in central Charlotte and overpredicted values in peripheral areas

Key Findings

  • Home Size and Building Grade: These were the strongest predictors of higher property values, confirming the importance of property characteristics.
  • Crime Rates: Areas with higher crime rates showed lower property values, reinforcing the role of safety in home pricing.
  • Park Proximity: While closer parks had a slight positive effect, its influence was weaker than anticipated, suggesting other neighborhood features have a stronger influence.
  • Year Built: Newer homes generally commanded higher prices, with a clear upward trend for properties built after 1950.
  • Spatial Bias: Central neighborhoods were frequently underpredicted, while some suburban areas were overestimated.

Spatial Insights

Home Prices in Charlotte, NC

Spatial distribution of home prices across Charlotte neighborhoods

Size of Homes in Charlotte, NC

Spatial distribution of home sizes (square footage) across Charlotte

Crime Count in Charlotte, NC

Spatial distribution of crime counts across Charlotte neighborhoods

Park Locations in Charlotte, NC

Distance to nearest park across Charlotte neighborhoods

  • Higher home prices clustered in suburban regions, where larger properties and newer builds are common.
  • Lower property values aligned with areas reporting higher crime counts.
  • The presence of residual spatial patterns suggests additional neighborhood-level characteristics may impact pricing.
  • Property size (area) showed a strong logarithmic relationship with price, indicating diminishing returns on very large properties.

Challenges and Limitations

  • Multicollinearity: Variables like home size, bedrooms, and bathrooms had high VIF scores, indicating potential redundancy.
  • Outlier Sensitivity: The model struggled to predict high-value homes accurately, likely due to non-linear effects not captured in the model.
  • Unaccounted Neighborhood Factors: Unobserved variables such as school quality, walkability, and transit access may contribute to residual spatial autocorrelation.

Urban Planning Implications

The model's insights can guide:

  • Targeted investment strategies in underpredicted central Charlotte neighborhoods where additional amenities may improve property values.
  • Crime prevention initiatives in areas with historically suppressed property values.
  • Park improvement efforts in residential areas where green space has stronger correlations with home pricing.
  • Housing development policies that consider the strong relationship between property age, size, and market value.

Conclusion

This project demonstrates the power of integrating spatial data and OLS regression in housing price prediction. The combination of engineered features, distance metrics, and spatial diagnostics provided valuable insights into Charlotte's housing market, informing planners and policymakers about factors driving property values. The correlation analysis revealed complex relationships between housing attributes and prices, while the spatial analysis highlighted neighborhood-level patterns that can guide targeted urban development strategies.

Project Details

Location

Charlotte, North Carolina

Tools Used

  • R (Version 4.4.2)
  • sf for spatial data handling
  • caret for model training
  • spdep for Moran's I test
  • ggplot2 for visualization

Model Performance

0.599
Moran's I0.214

Key Variables

  • Building Grade
    Strong +
  • Home Size
    Strong +
  • Full Baths
    Moderate +
  • Year Built
    Moderate +
  • Crime Count
    Moderate -
  • Park Distance
    Weak -

Research Highlights

Data Points

Analyzed over 10,000 property records across Charlotte metropolitan area

Time Period

Housing data from 2018-2022, providing recent market insights

Key Innovation

Integration of crime data and park proximity metrics with traditional housing variables