Charlotte, NCarrow_backBack
Charlotte NC cityscape

Predictive Modeling of Housing Prices

Charlotte, North Carolina

Tools & Libraries
R (v4.4.2)sfcaretspdepggplot2

Section 01

Project Objective

Build a robust predictive model that estimates home prices using spatial and non-spatial predictors, and evaluate residual autocorrelation patterns to understand where systematic bias remains.

Statistical Power
0.599
R² Variance Explained
Spatial Bias
0.214
Moran's I (p < 0.05)
Sample Density
10k+
Validated Records
Temporal Scale
4yr
2018-2022 Window

Section 02

Methodological Framework

01. FEATURE ENGINEERING
architecture

Spatial Transformations

Application of log-transforms to price data to address skewness. Integration of complex interaction terms and high-resolution distance-to-park metrics improved explanatory power across varied housing submarkets.

02. VALIDATION
query_stats

Regression Analysis

Global OLS regression implemented with rigorous 10-fold cross-validation. Performance assessment focused on RMSE reduction and predictive stability across central and peripheral neighborhoods.

03. SPATIAL ANALYSIS
share_location

Residual Autocorrelation

Advanced residual analysis using global and local Moran's I revealed where the model failed to capture neighborhood-specific drivers of value, especially between urban core and suburban tracts.

Section 03

Analysis Results & Visualizations

This section assembles the core analytical outputs from the Charlotte housing model, from exploratory relationships to predictive fit and residual diagnostics. Together, these visualizations show both what the model explains well and where spatial bias persists.

Correlation matrix of housing variables

Correlation Matrix of Numeric Predictors

This heat map shows strong positive relationships among square footage, building grade, and sale price. It also flags where predictors begin to overlap, which informed later variable selection and pruning decisions.

Scatter plot of housing price versus crime count

Price vs. Crime Count

The scatter suggests a negative relationship between local crime incidence and home value. Even with substantial variation, higher crime counts generally align with lower-priced properties.

Scatter plot of price versus log area

Price vs. Log Area

Log-transforming area produces one of the clearest positive associations in the dataset. This made square footage a core structural predictor in the housing price model.

Observed versus predicted housing prices on log scale

Log-Transformed Observed vs. Predicted

After log transformation, predicted and observed values align more tightly. The improved fit suggests the transformation reduced skewness and stabilized variance across the sample.

Observed and permuted Moran's I distribution plot

Observed and Permuted Moran's I

The observed Moran's I sits away from the permutation baseline, indicating statistically meaningful residual clustering. In practice, that means spatial bias remains even after accounting for major housing attributes.

Key Findings & Spatial Bias

Variable CategoryObservational NarrativeStatistical Significance
Physical AttributesTotal home size combined with professional building grade classification emerged as the primary drivers of market valuation.p < 0.001
Socio-EconomicElevated regional crime rates systematically depressed predicted property values across the metro area.Strongly Negative
Environmental ProxiesPark proximity demonstrated a comparatively weak effect on valuation, challenging the assumption of a uniform green-space premium.Weak Correlation
Systematic BiasCritical discovery: the model consistently underpredicts central urban neighborhood values while overpredicting peripheral suburban properties.High Magnitude

Section 04

Spatial Maps

Map of Charlotte housing prices

Home Prices Map

This map shows the uneven geography of housing values across the metro area. Higher-value clusters concentrate in select central and southern neighborhoods, while lower values dominate more peripheral areas.

Map of crime count distribution in Charlotte

Crime Count Map

Crime intensity is spatially concentrated rather than evenly distributed. These concentrations overlap with some of the model's lower predicted-value zones, reinforcing crime as a meaningful neighborhood-level signal.

Map of mean residuals by neighborhood

Mean Residuals by Neighborhood

Residual clusters reveal where the model systematically underpredicts and overpredicts local values. Central neighborhoods tend to contain positive residual hotspots, while some outer areas show overprediction.

Section 05 - Challenges

01.

Multicollinearity

VIF testing revealed high redundancy between bedroom counts and total square footage, requiring variable pruning.

02.

Outlier Sensitivity

Exceptional high-value luxury properties in South Charlotte disproportionately skewed global regression coefficients.

03.

Missing Variables

Absence of high-granularity school district performance and transit proximity limits the model's localized predictive power.

Section 06 - Urban Planning

Recommendations for Housing Policy

  • check_circle

    Prioritize targeted infrastructure investment in central underpredicted zones to capture latent value.

  • check_circle

    Integrate spatial lag variables into official tax assessment workflows to reduce systematic valuation bias.

  • check_circle

    Deploy localized crime prevention strategies as a direct mechanism for property value stabilization.

arrow_upward