Spatial Econometrics & Outlook

Stefan Jünger, Anne-Kathrin Stroppe, Dennis Abel

2025-04-24

Now

Day	Time	Title
April 09	10:00-11:30	Introduction
April 09	11:30-11:45	Coffee Break
April 09	11:45-13:00	Data Formats
April 09	13:00-14:00	Lunch Break
April 09	14:00-15:30	Mapping
April 09	15:30-15:45	Coffee Break
April 09	15:45-17:00	Spatial Wrangling
April 10	09:00-10:30	Spatial Wrangling
April 10	10:30-10:45	Coffee Break
April 10	10:45-12:00	Applied Spatial Linking
April 10	12:00-13:00	Lunch Break
April 10	13:00-14:30	Spatial Analysis
April 10	14:30-14:45	Coffee Break
April 10	14:45-16:00	Spatial Econometrics & Outlook

What are spatial econometrics?

Classic econometrics:

Using statistics to model (complex) theories, esp. causal thinking
As default, we think about regression analysis

One core assumption: Observations are independent of each other. However, we just learnt that is often not the case.

Where does spatial dependence and spatial processes enter our models and affect our outcome of interest?

Is it meaningful or just nuisances?

Space can be important in our analysis in two ways.

It’s meaningful in our theory, and we thus interpret it accordingly after estimation

It can distort our empirical estimates, producing bias, inconsistency, and inefficiency

We can address these different perspectives in our analysis with spatial econometric methods.

Spatial Diffusion

\(y_i\) affects \(y_j\) through \(w_{ij}\)
\(y_j\) affects \(y_i\) through \(w_{ji}\)
endogenous by design!

Examples:
- tax competition: if a state cuts corporate tax, neighbours respond by cutting theirs too
- civil war onset: conflict in one country raises the probability of onset in neighbours

Spatial Spill-Over

\(x_i\) affects \(y_j\) through \(w_{ij}\)
\(x_j\) affects \(y_i\) through \(w_{ij}\)

Examples:
- trade and export: a neighbour region’s GDP (their X) raises your export volumes or wages (your Y)
- crime displacement: increased policing in one neighbourhood raises crime in your neighbourhood, as criminals relocate

Formulas…

Linear Regression: \[\small Y = X\beta + \epsilon\]

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion): \[\small Y = \rho WY + X\beta + \epsilon\]

Spatial Lag X Model (SLX, Spillover): \[\small Y = X\beta + WX\theta + \epsilon\]

Spatial Error Model (SEM): \[\small Y = X\beta + u\] \[\small u = \lambda Wu + \epsilon\]

Flavors and extensions

But what if….

… you have interdependence and spillovers in covariates?
… spillovers and clustering in errors?
… interdependence and clustering in errors?
… everything is related with everything?

Flavors and extensions

Spatial Durbin Model

\[Y = \rho WY + X\beta + WX\theta + \epsilon\]

Spatial Durbin Error Model

\[Y = X\beta + WX\theta + u\] \[u = \lambda Wu + \epsilon\]

Combined Spatial Autocorrelation Model

\[Y = \rho WY + X\beta + u\] \[u = \lambda Wu + \epsilon\]

Manski Model

\[Y = \rho WY + WX\theta + X\beta + u\] \[u = \lambda Wu + \epsilon\]

Which model to choose?

Intermediate summary

There are a lot of models you could estimate to explain spatial autocorrelation. And there’s a vast body of literature on the best choice for which application. Important for us: theory-grounded reasoning for the underlying data generating process.

Getting this wrong has real consequences:

Misspecifying SAR as OLS → biased β (omitted variable: Wy)
Misspecifying SEM as OLS → correct β, wrong inference (deflated SEs)
Misspecifying SEM as SAR → spurious diffusion, inefficient β
Misspecifying SAR as SLX → underestimates spillovers (captures first-order neighbour effect)

We’d explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.

‘Research’ question and data

We will use the same example as in the previous session. But this time, we will test if one of our spatial regression models helps further investigate the data generation process. We may ask:

Do immigrant shares affect CDU voting shares within voting districts?

Do immigrant shares affect CDU voting shares between neighborhoods? (=spillover)

Do CDU voting shares affect CDU voting shares between neighborhoods? (=diffusion)

Controlling inhabitant numbers within the voting districts might also be a good idea.

Linear regression

linear_regression <-
  lm(cdu_share ~ immigrant_share + inhabitants, data = election_results)

summary(linear_regression)


Call:
lm(formula = cdu_share ~ immigrant_share + inhabitants, data = election_results)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.0379  -3.3415  -0.3242   3.2834  24.7445 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     28.066680   1.050168  26.726  < 2e-16 ***
immigrant_share -0.077070   0.014311  -5.385 1.08e-07 ***
inhabitants     -0.083491   0.004932 -16.928  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.174 on 540 degrees of freedom
Multiple R-squared:  0.409, Adjusted R-squared:  0.4068 
F-statistic: 186.8 on 2 and 540 DF,  p-value: < 2.2e-16

Now we need a spatial weight

Once again, we have to construct a spatial weight as in the analysis of spatial autocorrelation to estimate a spatial regression. In fact, we’ll use the same approach as before.

queen_neighborhoods <- spdep::poly2nb(election_results, queen = TRUE)

queen_W <- spdep::nb2listw(queen_neighborhoods, style = "W")

Spatial Error Model

spatial_error_model <-
  spatialreg::errorsarlm(
    cdu_share ~ immigrant_share + inhabitants,
    data = election_results, listw = queen_W)

summary(spatial_error_model)


Call:spatialreg::errorsarlm(formula = cdu_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_W)

Residuals:
      Min        1Q    Median        3Q       Max 
-10.09970  -2.33331  -0.33229   1.99941  24.40641 

Type: error 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value  Pr(>|z|)
(Intercept)     21.8137216  1.1589897 18.8213 < 2.2e-16
immigrant_share -0.0307508  0.0117756 -2.6114  0.009017
inhabitants     -0.0293662  0.0047947 -6.1248 9.082e-10

Lambda: 0.7907, LR test value: 270.73, p-value: < 2.22e-16
Asymptotic standard error: 0.030988
    z-value: 25.516, p-value: < 2.22e-16
Wald statistic: 651.08, p-value: < 2.22e-16

Log likelihood: -1526.079 for error model
ML residual variance (sigma squared): 13.821, (sigma: 3.7177)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 3062.2, (AIC for lm: 3330.9)

Spatial Lag X Model

spatial_lag_x_model <-
  spatialreg::lmSLX(
    cdu_share ~ immigrant_share + inhabitants,
    data = election_results, listw = queen_W
  )

summary(spatial_lag_x_model)


Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
    data = as.data.frame(x), weights = weights)

Coefficients:
                     Estimate    Std. Error  t value     Pr(>|t|)  
(Intercept)           3.343e+01   1.626e+00   2.055e+01   1.044e-69
immigrant_share      -3.502e-02   1.448e-02  -2.419e+00   1.590e-02
inhabitants          -2.993e-02   6.086e-03  -4.918e+00   1.164e-06
lag.immigrant_share  -8.482e-02   2.561e-02  -3.312e+00   9.879e-04
lag.inhabitants      -1.038e-01   9.502e-03  -1.092e+01   3.165e-25

Spatial Lag Y Model

spatial_lag_y_model <-
  spatialreg::lagsarlm(
    cdu_share ~ immigrant_share + inhabitants,
    data = election_results, listw = queen_W)

summary(spatial_lag_y_model)


Call:spatialreg::lagsarlm(formula = cdu_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_W)

Residuals:
      Min        1Q    Median        3Q       Max 
-10.48876  -2.36875  -0.21506   1.93747  23.78484 

Type: lag 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value  Pr(>|z|)
(Intercept)      9.0224178  1.0909437  8.2703  2.22e-16
immigrant_share -0.0295856  0.0101910 -2.9031  0.003695
inhabitants     -0.0323508  0.0038451 -8.4135 < 2.2e-16

Rho: 0.71173, LR test value: 311.67, p-value: < 2.22e-16
Asymptotic standard error: 0.033348
    z-value: 21.342, p-value: < 2.22e-16
Wald statistic: 455.49, p-value: < 2.22e-16

Log likelihood: -1505.609 for lag model
ML residual variance (sigma squared): 13.319, (sigma: 3.6495)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 3021.2, (AIC for lm: 3330.9)
LM test for residual autocorrelation
test value: 29.403, p-value: 5.8781e-08

Comparison: What’s ‘better’?

AIC(spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)

                    df      AIC
spatial_error_model  5 3062.159
spatial_lag_x_model  6 3187.867
spatial_lag_y_model  5 3021.218

spdep::lm.LMtests(linear_regression, queen_W, test = c("LMerr", "LMlag"))


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial dependence

data:  
model: lm(formula = cdu_share ~ immigrant_share + inhabitants, data = election_results)
test weights: listw

RSerr = 245.52, df = 1, p-value < 2.2e-16


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial dependence

data:  
model: lm(formula = cdu_share ~ immigrant_share + inhabitants, data = election_results)
test weights: listw

RSlag = 372.45, df = 1, p-value < 2.2e-16

Comparison: What’s ‘better’?

Test	Estimate	Points to
RSerr	245.5, p < .001	SEM (\(\lambda \neq 0\))
RSlag	372.5, p < .001	SAR (\(\rho \neq 0\))

RSlag > RSerr → SAR preferred … but the SAR residuals still show autocorrelation (LM = 29.4, p < .001).

Let’s stick to our theory, shall we?

Of higher importance: interpretation

Unfortunately, in a Spatial Lag Y Model, the spatial parameter \(\rho\) only tells us whether the effect is (statistically) significant.

Remember: these models are endogenous by design
- We have effects of \(y_j\) on \(y_i\) and vice versa
- What a mess

Luckily, there’s a method to decompose the spatial effects into direct, indirect, and total effects: estimating impacts

Impact estimation in `R`

This time, let’s start with the Spatial Lag Y Model:

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)

Impact measures (lag, exact):
                           Direct    Indirect      Total
immigrant_share dy/dx -0.03408500 -0.06854653 -0.1026315
inhabitants dy/dx     -0.03727076 -0.07495324 -0.1122240

Compare it to the ‘simple’ regression output:

coef(spatial_lag_y_model)

            rho     (Intercept) immigrant_share     inhabitants 
     0.71172970      9.02241777     -0.02958562     -0.03235085

A 1pp increase in immigrant share decreases CDU vote share in this unit by 0.0341pp (direct), and decreases CDU vote share in the whole neighbourhood system by a further 0.0685pp (indirect).

Spatial Lag X impacts

spatialreg::impacts(spatial_lag_x_model, listw = queen_W)

Impact measures (SlX, glht):
                           Direct    Indirect      Total
immigrant_share dy/dx -0.03501709 -0.08481509 -0.1198322
inhabitants dy/dx     -0.02992990 -0.10379397 -0.1337239

Compare it to the ‘simple’ regression output:

coef(spatial_lag_x_model)

        (Intercept)     immigrant_share         inhabitants lag.immigrant_share     lag.inhabitants 
        33.42607569         -0.03501709         -0.02992990         -0.08481509         -0.10379397

Nothing is really gained.

If you need p-values and stuff

spatialreg::impacts(spatial_lag_y_model, listw = queen_W, R = 500) |> 
  summary(zstats = TRUE, short = TRUE)

Impact measures (lag, exact):
                           Direct    Indirect      Total
immigrant_share dy/dx -0.03408500 -0.06854653 -0.1026315
inhabitants dy/dx     -0.03727076 -0.07495324 -0.1122240
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
                           Direct   Indirect      Total
immigrant_share dy/dx 0.011803537 0.02557806 0.03668881
inhabitants dy/dx     0.004100508 0.01183029 0.01412443

Simulated z-values:
                         Direct  Indirect     Total
immigrant_share dy/dx -2.813177 -2.604825 -2.721042
inhabitants dy/dx     -9.062308 -6.295083 -7.903523

Simulated p-values:
                      Direct     Indirect   Total     
immigrant_share dy/dx 0.0049055  0.0091921  0.0065076 
inhabitants dy/dx     < 2.22e-16 3.0724e-10 2.6645e-15

Exercise: Spatial Regression 💪

🖱 Click here for the exercise

Outlook

This workshop

Day	Time	Title
April 09	10:00-11:30	Introduction
April 09	11:30-11:45	Coffee Break
April 09	11:45-13:00	Data Formats
April 09	13:00-14:00	Lunch Break
April 09	14:00-15:30	Mapping
April 09	15:30-15:45	Coffee Break
April 09	15:45-17:00	Spatial Wrangling
April 10	09:00-10:30	Spatial Wrangling
April 10	10:30-10:45	Coffee Break
April 10	10:45-12:00	Applied Spatial Linking
April 10	12:00-13:00	Lunch Break
April 10	13:00-14:30	Spatial Analysis
April 10	14:30-14:45	Coffee Break
April 10	14:45-16:00	Spatial Econometrics & Outlook

What else is out there?

Modelling:

Geographically Weighted Regression
Multilevel models with spatial autocorrelation
Dynamic spatial models (lagged y in both space and time)
Small area estimation/MRP estimating quantities for areas where surveys have too few observations

What else is out there?

Causal inference:

Spatial regression discontinuity design (borders as cutoffs)
Using geographic features as instruments (distance to historical roads)
Difference-in-differences with spatial spillovers (account for SUTVA violation)

What else is out there?

GIS techniques

Routing & network analysis
Cluster analysis
Point pattern analysis
Areal interpolation and imputaion

Data Sources

General note

Geospatial data are interdisciplinary
Amount of data feels unlimited and increases
Thanks to AI more possibility to extract identifiers

Some examples

More remote sensing/satellite imagery
Geotagged social media data
Retrieving place references from unstructured texts
Mobile survey with gps tracking
Crowdsourced / citizen science data

Geodata and Spatial Regression Analysis

30-2 July 2026 On-Site Mannheim

Register online
With Tobias Rüttenauer
Analyzing spatial research questions
Various spatial regression techniques

Source:Geodata_Spatial_Regression

Spatial Econometrics & Outlook

Now

What are spatial econometrics?

Is it meaningful or just nuisances?

Spatial Diffusion

Spatial Spill-Over

Formulas…

Flavors and extensions

Flavors and extensions

Which model to choose?

Intermediate summary

‘Research’ question and data

Linear regression

Now we need a spatial weight

Spatial Error Model

Spatial Lag X Model

Spatial Lag Y Model

Comparison: What’s ‘better’?

Comparison: What’s ‘better’?

Of higher importance: interpretation

Impact estimation in R

Spatial Lag X impacts

If you need p-values and stuff

Exercise: Spatial Regression 💪

Outlook

This workshop

What else is out there?

What else is out there?

What else is out there?

Data Sources

Advanced Geospatial Data Processing for Social Scientists

Geodata and Spatial Regression Analysis

The End

Impact estimation in `R`