Spatial Econometrics & Outlook

Stefan Jünger, Anne-Kathrin Stroppe, Dennis Abel

2025-04-24

Now

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Spatial Wrangling
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Analysis
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

What are spatial econometrics?

Classic econometrics:

  • Using statistics to model (complex) theories, esp. causal thinking
  • As default, we think about regression analysis

One core assumption: Observations are independent of each other. However, we just learnt that is often not the case.

   

Where does spatial dependence and spatial processes enter our models and affect our outcome of interest?

Is it meaningful or just nuisances?

    Space can be important in our analysis in two ways.

    • It’s meaningful in our theory, and we thus interpret it accordingly after estimation
    • It can distort our empirical estimates, producing bias, inconsistency, and inefficiency

    We can address these different perspectives in our analysis with spatial econometric methods.

Spatial Diffusion

  • \(y_i\) affects \(y_j\) through \(w_{ij}\)
  • \(y_j\) affects \(y_i\) through \(w_{ji}\)
  • endogenous by design!
  • Examples:
    • tax competition: if a state cuts corporate tax, neighbours respond by cutting theirs too
    • civil war onset: conflict in one country raises the probability of onset in neighbours

Spatial Spill-Over

  • \(x_i\) affects \(y_j\) through \(w_{ij}\)
  • \(x_j\) affects \(y_i\) through \(w_{ij}\)
  • Examples:
    • trade and export: a neighbour region’s GDP (their X) raises your export volumes or wages (your Y)
    • crime displacement: increased policing in one neighbourhood raises crime in your neighbourhood, as criminals relocate

Formulas…

    Linear Regression: \[\small Y = X\beta + \epsilon\]

    Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion): \[\small Y = \rho WY + X\beta + \epsilon\]

    Spatial Lag X Model (SLX, Spillover): \[\small Y = X\beta + WX\theta + \epsilon\]

    Spatial Error Model (SEM): \[\small Y = X\beta + u\] \[\small u = \lambda Wu + \epsilon\]

Flavors and extensions

But what if….

  • … you have interdependence and spillovers in covariates?
  • … spillovers and clustering in errors?
  • … interdependence and clustering in errors?
  • … everything is related with everything?

Flavors and extensions

Spatial Durbin Model

\[Y = \rho WY + X\beta + WX\theta + \epsilon\]

Spatial Durbin Error Model

\[Y = X\beta + WX\theta + u\] \[u = \lambda Wu + \epsilon\]

Combined Spatial Autocorrelation Model

\[Y = \rho WY + X\beta + u\] \[u = \lambda Wu + \epsilon\]

Manski Model

\[Y = \rho WY + WX\theta + X\beta + u\] \[u = \lambda Wu + \epsilon\]

Which model to choose?

Intermediate summary

There are a lot of models you could estimate to explain spatial autocorrelation. And there’s a vast body of literature on the best choice for which application. Important for us: theory-grounded reasoning for the underlying data generating process.

Getting this wrong has real consequences:

  • Misspecifying SAR as OLS → biased β (omitted variable: Wy)
  • Misspecifying SEM as OLS → correct β, wrong inference (deflated SEs)
  • Misspecifying SEM as SAR → spurious diffusion, inefficient β
  • Misspecifying SAR as SLX → underestimates spillovers (captures first-order neighbour effect)

We’d explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.

‘Research’ question and data

We will use the same example as in the previous session. But this time, we will test if one of our spatial regression models helps further investigate the data generation process. We may ask:

  1. Do immigrant shares affect CDU voting shares within voting districts?
  1. Do immigrant shares affect CDU voting shares between neighborhoods? (=spillover)
  1. Do CDU voting shares affect CDU voting shares between neighborhoods? (=diffusion)

Controlling inhabitant numbers within the voting districts might also be a good idea.

Linear regression

linear_regression <-
  lm(cdu_share ~ immigrant_share + inhabitants, data = election_results)

summary(linear_regression)

Call:
lm(formula = cdu_share ~ immigrant_share + inhabitants, data = election_results)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.0379  -3.3415  -0.3242   3.2834  24.7445 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     28.066680   1.050168  26.726  < 2e-16 ***
immigrant_share -0.077070   0.014311  -5.385 1.08e-07 ***
inhabitants     -0.083491   0.004932 -16.928  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.174 on 540 degrees of freedom
Multiple R-squared:  0.409, Adjusted R-squared:  0.4068 
F-statistic: 186.8 on 2 and 540 DF,  p-value: < 2.2e-16

Now we need a spatial weight

Once again, we have to construct a spatial weight as in the analysis of spatial autocorrelation to estimate a spatial regression. In fact, we’ll use the same approach as before.

queen_neighborhoods <- spdep::poly2nb(election_results, queen = TRUE)

queen_W <- spdep::nb2listw(queen_neighborhoods, style = "W")

Spatial Error Model

spatial_error_model <-
  spatialreg::errorsarlm(
    cdu_share ~ immigrant_share + inhabitants,
    data = election_results, listw = queen_W)

summary(spatial_error_model)

Call:spatialreg::errorsarlm(formula = cdu_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_W)

Residuals:
      Min        1Q    Median        3Q       Max 
-10.09970  -2.33331  -0.33229   1.99941  24.40641 

Type: error 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value  Pr(>|z|)
(Intercept)     21.8137216  1.1589897 18.8213 < 2.2e-16
immigrant_share -0.0307508  0.0117756 -2.6114  0.009017
inhabitants     -0.0293662  0.0047947 -6.1248 9.082e-10

Lambda: 0.7907, LR test value: 270.73, p-value: < 2.22e-16
Asymptotic standard error: 0.030988
    z-value: 25.516, p-value: < 2.22e-16
Wald statistic: 651.08, p-value: < 2.22e-16

Log likelihood: -1526.079 for error model
ML residual variance (sigma squared): 13.821, (sigma: 3.7177)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 3062.2, (AIC for lm: 3330.9)

 

Spatial Lag X Model

spatial_lag_x_model <-
  spatialreg::lmSLX(
    cdu_share ~ immigrant_share + inhabitants,
    data = election_results, listw = queen_W
  )

summary(spatial_lag_x_model)

Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
    data = as.data.frame(x), weights = weights)

Coefficients:
                     Estimate    Std. Error  t value     Pr(>|t|)  
(Intercept)           3.343e+01   1.626e+00   2.055e+01   1.044e-69
immigrant_share      -3.502e-02   1.448e-02  -2.419e+00   1.590e-02
inhabitants          -2.993e-02   6.086e-03  -4.918e+00   1.164e-06
lag.immigrant_share  -8.482e-02   2.561e-02  -3.312e+00   9.879e-04
lag.inhabitants      -1.038e-01   9.502e-03  -1.092e+01   3.165e-25

 

Spatial Lag Y Model

spatial_lag_y_model <-
  spatialreg::lagsarlm(
    cdu_share ~ immigrant_share + inhabitants,
    data = election_results, listw = queen_W)

summary(spatial_lag_y_model)

Call:spatialreg::lagsarlm(formula = cdu_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_W)

Residuals:
      Min        1Q    Median        3Q       Max 
-10.48876  -2.36875  -0.21506   1.93747  23.78484 

Type: lag 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value  Pr(>|z|)
(Intercept)      9.0224178  1.0909437  8.2703  2.22e-16
immigrant_share -0.0295856  0.0101910 -2.9031  0.003695
inhabitants     -0.0323508  0.0038451 -8.4135 < 2.2e-16

Rho: 0.71173, LR test value: 311.67, p-value: < 2.22e-16
Asymptotic standard error: 0.033348
    z-value: 21.342, p-value: < 2.22e-16
Wald statistic: 455.49, p-value: < 2.22e-16

Log likelihood: -1505.609 for lag model
ML residual variance (sigma squared): 13.319, (sigma: 3.6495)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 3021.2, (AIC for lm: 3330.9)
LM test for residual autocorrelation
test value: 29.403, p-value: 5.8781e-08

Comparison: What’s ‘better’?

AIC(spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)
                    df      AIC
spatial_error_model  5 3062.159
spatial_lag_x_model  6 3187.867
spatial_lag_y_model  5 3021.218
spdep::lm.LMtests(linear_regression, queen_W, test = c("LMerr", "LMlag"))

    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial dependence

data:  
model: lm(formula = cdu_share ~ immigrant_share + inhabitants, data = election_results)
test weights: listw

RSerr = 245.52, df = 1, p-value < 2.2e-16


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial dependence

data:  
model: lm(formula = cdu_share ~ immigrant_share + inhabitants, data = election_results)
test weights: listw

RSlag = 372.45, df = 1, p-value < 2.2e-16

Comparison: What’s ‘better’?

Test Estimate Points to
RSerr 245.5, p < .001 SEM (\(\lambda \neq 0\))
RSlag 372.5, p < .001 SAR (\(\rho \neq 0\))

    RSlag > RSerr → SAR preferred … but the SAR residuals still show autocorrelation (LM = 29.4, p < .001).

Let’s stick to our theory, shall we?

Of higher importance: interpretation

Unfortunately, in a Spatial Lag Y Model, the spatial parameter \(\rho\) only tells us whether the effect is (statistically) significant.

  • Remember: these models are endogenous by design
    • We have effects of \(y_j\) on \(y_i\) and vice versa
    • What a mess

Luckily, there’s a method to decompose the spatial effects into direct, indirect, and total effects: estimating impacts

Impact estimation in R

This time, let’s start with the Spatial Lag Y Model:

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
Impact measures (lag, exact):
                           Direct    Indirect      Total
immigrant_share dy/dx -0.03408500 -0.06854653 -0.1026315
inhabitants dy/dx     -0.03727076 -0.07495324 -0.1122240

Compare it to the ‘simple’ regression output:

coef(spatial_lag_y_model)
            rho     (Intercept) immigrant_share     inhabitants 
     0.71172970      9.02241777     -0.02958562     -0.03235085 

A 1pp increase in immigrant share decreases CDU vote share in this unit by 0.0341pp (direct), and decreases CDU vote share in the whole neighbourhood system by a further 0.0685pp (indirect).

Spatial Lag X impacts

spatialreg::impacts(spatial_lag_x_model, listw = queen_W)
Impact measures (SlX, glht):
                           Direct    Indirect      Total
immigrant_share dy/dx -0.03501709 -0.08481509 -0.1198322
inhabitants dy/dx     -0.02992990 -0.10379397 -0.1337239

Compare it to the ‘simple’ regression output:

coef(spatial_lag_x_model)
        (Intercept)     immigrant_share         inhabitants lag.immigrant_share     lag.inhabitants 
        33.42607569         -0.03501709         -0.02992990         -0.08481509         -0.10379397 

Nothing is really gained.

If you need p-values and stuff

spatialreg::impacts(spatial_lag_y_model, listw = queen_W, R = 500) |> 
  summary(zstats = TRUE, short = TRUE)
Impact measures (lag, exact):
                           Direct    Indirect      Total
immigrant_share dy/dx -0.03408500 -0.06854653 -0.1026315
inhabitants dy/dx     -0.03727076 -0.07495324 -0.1122240
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
                           Direct   Indirect      Total
immigrant_share dy/dx 0.011803537 0.02557806 0.03668881
inhabitants dy/dx     0.004100508 0.01183029 0.01412443

Simulated z-values:
                         Direct  Indirect     Total
immigrant_share dy/dx -2.813177 -2.604825 -2.721042
inhabitants dy/dx     -9.062308 -6.295083 -7.903523

Simulated p-values:
                      Direct     Indirect   Total     
immigrant_share dy/dx 0.0049055  0.0091921  0.0065076 
inhabitants dy/dx     < 2.22e-16 3.0724e-10 2.6645e-15

Exercise: Spatial Regression 💪

🖱 Click here for the exercise

Outlook

This workshop

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Spatial Wrangling
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Analysis
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

What else is out there?

Modelling:

  • Geographically Weighted Regression
  • Multilevel models with spatial autocorrelation
  • Dynamic spatial models (lagged y in both space and time)
  • Small area estimation/MRP estimating quantities for areas where surveys have too few observations

What else is out there?

Causal inference:

  • Spatial regression discontinuity design (borders as cutoffs)
  • Using geographic features as instruments (distance to historical roads)
  • Difference-in-differences with spatial spillovers (account for SUTVA violation)

What else is out there?

GIS techniques

  • Routing & network analysis
  • Cluster analysis
  • Point pattern analysis
  • Areal interpolation and imputaion

Data Sources

    General note

    • Geospatial data are interdisciplinary
    • Amount of data feels unlimited and increases
    • Thanks to AI more possibility to extract identifiers

    Some examples

    • More remote sensing/satellite imagery
    • Geotagged social media data
    • Retrieving place references from unstructured texts
    • Mobile survey with gps tracking
    • Crowdsourced / citizen science data

Advanced Geospatial Data Processing for Social Scientists

09-20 June 2026 Online

  • Register online
  • With Stefan and Dennis
  • Expand your knowledge of geospatial data wrangling
  • Focus on raster data and complex datacubes
  • Remote sensing and Earth observation APIs

Source:R-Spatial

Geodata and Spatial Regression Analysis

30-2 July 2026 On-Site Mannheim

  • Register online
  • With Tobias Rüttenauer
  • Analyzing spatial research questions
  • Various spatial regression techniques

The End