Predictive Modeling of Housing Prices
Charlotte, North Carolina | Spatial Data Analysis

Overview
This project developed a predictive model for housing prices in Charlotte, NC by integrating spatial data on property attributes, neighborhood crime rates, park proximity, and other key features. The model aimed to identify factors influencing home values and uncover spatial patterns that impact property pricing.
Objective
To build a robust predictive model that estimates home prices using spatial and non-spatial predictors and to evaluate spatial autocorrelation patterns in residuals to improve model accuracy.
Methodology
Data Collection and Cleaning
Collected geospatial data on:
- Housing attributes (price, size, year built, etc.)
- Crime incidents (local crime counts by ZIP code)
- Park locations (distance from each property)
Cleaned data by:
- Removing extreme outliers using interquartile range (IQR) filtering
- Filtering records with incomplete or invalid property characteristics
- Ensured data alignment using Charlotte ZIP code boundaries
Feature Engineering
Created engineered variables for improved model performance:
- Log-transformed home price and home size to address skewness
- Interaction terms such as bedrooms x full baths to capture joint effects
- Distance to nearest park calculated using spatial distance metrics in R
- Encoded categorical variables for heating type, property type, and building grade
Exploratory Data Analysis
Conducted correlation analysis to identify key predictors:

Correlation matrix showing relationships between housing variables

Relationship between property prices and neighborhood crime counts

Relationship between property prices and distance to nearest park

Relationship between property prices and year built

Relationship between property prices and logarithm of property area
Model Building and Evaluation
Trained an OLS regression model with selected features:
- Key predictors: shape_Area, yearbuilt, fullbaths, crime_count, distance_to_nearest_park, and engineered interaction terms
- Performed 10-fold cross-validation for model validation

Observed vs predicted housing prices showing model fit

Log-transformed observed vs predicted values showing improved model fit

Diagnostic plots for the regression model showing residual patterns
Residual Analysis and Spatial Autocorrelation

Spatial distribution of model residuals showing areas of under and over prediction

Moran's I test results showing significant spatial autocorrelation in residuals
- Visualized residuals using spatial mapping to identify clustering patterns
- Conducted Moran's I test to assess spatial dependence in residuals
- Moran's I = 0.214 (p < 0.05), indicating moderate spatial autocorrelation
- Identified underpredicted homes in central Charlotte and overpredicted values in peripheral areas
Key Findings
- Home Size and Building Grade: These were the strongest predictors of higher property values, confirming the importance of property characteristics.
- Crime Rates: Areas with higher crime rates showed lower property values, reinforcing the role of safety in home pricing.
- Park Proximity: While closer parks had a slight positive effect, its influence was weaker than anticipated, suggesting other neighborhood features have a stronger influence.
- Year Built: Newer homes generally commanded higher prices, with a clear upward trend for properties built after 1950.
- Spatial Bias: Central neighborhoods were frequently underpredicted, while some suburban areas were overestimated.
Spatial Insights

Spatial distribution of home prices across Charlotte neighborhoods

Spatial distribution of home sizes (square footage) across Charlotte

Spatial distribution of crime counts across Charlotte neighborhoods

Distance to nearest park across Charlotte neighborhoods
- Higher home prices clustered in suburban regions, where larger properties and newer builds are common.
- Lower property values aligned with areas reporting higher crime counts.
- The presence of residual spatial patterns suggests additional neighborhood-level characteristics may impact pricing.
- Property size (area) showed a strong logarithmic relationship with price, indicating diminishing returns on very large properties.
Challenges and Limitations
- Multicollinearity: Variables like home size, bedrooms, and bathrooms had high VIF scores, indicating potential redundancy.
- Outlier Sensitivity: The model struggled to predict high-value homes accurately, likely due to non-linear effects not captured in the model.
- Unaccounted Neighborhood Factors: Unobserved variables such as school quality, walkability, and transit access may contribute to residual spatial autocorrelation.
Urban Planning Implications
The model's insights can guide:
- Targeted investment strategies in underpredicted central Charlotte neighborhoods where additional amenities may improve property values.
- Crime prevention initiatives in areas with historically suppressed property values.
- Park improvement efforts in residential areas where green space has stronger correlations with home pricing.
- Housing development policies that consider the strong relationship between property age, size, and market value.
Conclusion
This project demonstrates the power of integrating spatial data and OLS regression in housing price prediction. The combination of engineered features, distance metrics, and spatial diagnostics provided valuable insights into Charlotte's housing market, informing planners and policymakers about factors driving property values. The correlation analysis revealed complex relationships between housing attributes and prices, while the spatial analysis highlighted neighborhood-level patterns that can guide targeted urban development strategies.
Project Details
Location
Tools Used
- R (Version 4.4.2)
- sf for spatial data handling
- caret for model training
- spdep for Moran's I test
- ggplot2 for visualization
Model Performance
Key Variables
- Building GradeStrong +
- Home SizeStrong +
- Full BathsModerate +
- Year BuiltModerate +
- Crime CountModerate -
- Park DistanceWeak -
Research Highlights
Data Points
Analyzed over 10,000 property records across Charlotte metropolitan area
Time Period
Housing data from 2018-2022, providing recent market insights
Key Innovation
Integration of crime data and park proximity metrics with traditional housing variables