There are a lot of models you could estimate to explain spatial autocorrelation. And there’s a vast body of literature on the best choice for which application. Important for us: theory-grounded reasoning for the underlying data generating process.
Getting this wrong has real consequences:
Misspecifying SAR as OLS → biased β (omitted variable: Wy)
Misspecifying SEM as OLS → correct β, wrong inference (deflated SEs)
Misspecifying SEM as SAR → spurious diffusion, inefficient β
Misspecifying SAR as SLX → underestimates spillovers (captures first-order neighbour effect)
We’d explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.
‘Research’ question and data
We will use the same example as in the previous session. But this time, we will test if one of our spatial regression models helps further investigate the data generation process. We may ask:
Do immigrant shares affect CDU voting shares within voting districts?
Do immigrant shares affect CDU voting shares between neighborhoods? (=spillover)
Do CDU voting shares affect CDU voting shares between neighborhoods? (=diffusion)
Controlling inhabitant numbers within the voting districts might also be a good idea.
Linear regression
linear_regression <-lm(cdu_share ~ immigrant_share + inhabitants, data = election_results)summary(linear_regression)
Call:
lm(formula = cdu_share ~ immigrant_share + inhabitants, data = election_results)
Residuals:
Min 1Q Median 3Q Max
-15.0379 -3.3415 -0.3242 3.2834 24.7445
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.066680 1.050168 26.726 < 2e-16 ***
immigrant_share -0.077070 0.014311 -5.385 1.08e-07 ***
inhabitants -0.083491 0.004932 -16.928 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.174 on 540 degrees of freedom
Multiple R-squared: 0.409, Adjusted R-squared: 0.4068
F-statistic: 186.8 on 2 and 540 DF, p-value: < 2.2e-16
Now we need a spatial weight
Once again, we have to construct a spatial weight as in the analysis of spatial autocorrelation to estimate a spatial regression. In fact, we’ll use the same approach as before.
Call:spatialreg::lagsarlm(formula = cdu_share ~ immigrant_share +
inhabitants, data = election_results, listw = queen_W)
Residuals:
Min 1Q Median 3Q Max
-10.48876 -2.36875 -0.21506 1.93747 23.78484
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 9.0224178 1.0909437 8.2703 2.22e-16
immigrant_share -0.0295856 0.0101910 -2.9031 0.003695
inhabitants -0.0323508 0.0038451 -8.4135 < 2.2e-16
Rho: 0.71173, LR test value: 311.67, p-value: < 2.22e-16
Asymptotic standard error: 0.033348
z-value: 21.342, p-value: < 2.22e-16
Wald statistic: 455.49, p-value: < 2.22e-16
Log likelihood: -1505.609 for lag model
ML residual variance (sigma squared): 13.319, (sigma: 3.6495)
Number of observations: 543
Number of parameters estimated: 5
AIC: 3021.2, (AIC for lm: 3330.9)
LM test for residual autocorrelation
test value: 29.403, p-value: 5.8781e-08
A 1pp increase in immigrant share decreases CDU vote share in this unit by 0.0341pp (direct), and decreases CDU vote share in the whole neighbourhood system by a further 0.0685pp (indirect).