Exercise 8: Spatial Regression

Introduction to Geospatial Techniques for Social Scientists in R

Author

Stefan Jünger, Anne Stroppe & Dennis Abel

Exercises

Let’s see how different neighborhood matrix styles can impact the estimates of spatial regression models. Run the code below to have the data in place for this exercise. (You can ignore any warning messages.)

voting_districts <-
  sf::st_read("./data/Stimmbezirk.shp") |> 
  dplyr::mutate(
    district_id = as.numeric(nummer)
    ) |> 
  dplyr::select(district_id, Shape_Area, geometry) 

btw21_votes <-
  glue::glue(
    "https://www.stadt-koeln.de/wahlen/bundestagswahl/09-2021/praesentation/\\
    Open-Data-Bundestagswahl476.csv"
  ) |> 
  readr::read_csv2() |>
  dplyr::mutate(
    district_id = as.numeric(`gebiet-nr`),
    valid_votes = `F`,
    cdu_share = (F1 / valid_votes) * 100,
    spd_share = (F2 / valid_votes) * 100,
    fdp_share = (F3 / valid_votes) * 100,
    afd_share = (F4 / valid_votes) * 100,
    greens_share = (F5 / valid_votes) * 100,
    linke_share = (F6 / valid_votes) * 100,
    .keep = "none"
  )

election_results <-
  dplyr::left_join(
    voting_districts,
    btw21_votes,
    by = "district_id"
  )

immigrants_cologne <- terra::rast("./data/immigrants_cologne.tif")
inhabitants_cologne <- terra::rast("./data/inhabitants_cologne.tif")

immigrants_cologne  <- terra::subst(immigrants_cologne,  from = -9, to = NA)
inhabitants_cologne <- terra::subst(inhabitants_cologne, from = -9, to = NA)

immigrant_share_cologne <- (immigrants_cologne / inhabitants_cologne)*100

age_rast <- terra::rast("./data/census22_age_avg.tif")
rent_rast <- terra::rast("./data/census22_rent_avg.tif")

election_results <-
  election_results |>
  dplyr::mutate(
    immigrant_share = 
      exactextractr::exact_extract(
        immigrant_share_cologne, election_results, 'mean', progress = FALSE
      ),
    inhabitants = 
      exactextractr::exact_extract(
        inhabitants_cologne, election_results, 'mean', progress = FALSE
      ),
    age_avg = 
      exactextractr::exact_extract(
        age_rast, election_results, 'mean', progress = FALSE
      ),
    rent_avg = 
      exactextractr::exact_extract(
        rent_rast, election_results, 'mean', progress = FALSE
      )
  )
Reading layer `Stimmbezirk' from data source 
  `C:\Users\stroppan\Documents\gesis-workshop-geospatial-techniques-R-2026\data\Stimmbezirk.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 543 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 343914.7 ymin: 5632759 xmax: 370674.3 ymax: 5661475
Projected CRS: ETRS89 / UTM zone 32N
Note🏋 Exercise 1

Re-use the code from the previous exercise for the Queen neighborhoods. But this time, do one weight matrix with row-normalization and another with min-max-normalization. Insert them into two spatial regression lag models of your choice. You can select variables and model of your own choice (f.e. afd_share ~ immigrant_share + inhabitants and Spatial Lag Y).

For min-max-normalization, you would have to use the option style = "minmax" in the spdep:nb2listw() function.

Note🏋 Exercise 2

Calculate the impacts of both models. What is your observation?

Solutions

# spdep
queen_neighborhood <-
  spdep::poly2nb(
    election_results,
    queen = TRUE
  )

queen_W <- spdep::nb2listw(queen_neighborhood, style = "W")

queen_minmax <- spdep::nb2listw(queen_neighborhood, style = "minmax")

# run regressions
spatial_lag_y_W <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W
    )

summary(spatial_lag_y_W)

Call:spatialreg::lagsarlm(formula = afd_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_W)

Residuals:
     Min       1Q   Median       3Q      Max 
-6.49207 -1.39265 -0.27635  1.12012  9.50687 

Type: lag 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value  Pr(>|z|)
(Intercept)     -0.6134172  0.3722575 -1.6478   0.09939
immigrant_share  0.1111987  0.0136151  8.1673 2.220e-16
inhabitants     -0.0073967  0.0018639 -3.9684 7.234e-05

Rho: 0.72325, LR test value: 289.22, p-value: < 2.22e-16
Asymptotic standard error: 0.032832
    z-value: 22.029, p-value: < 2.22e-16
Wald statistic: 485.27, p-value: < 2.22e-16

Log likelihood: -1225.744 for lag model
ML residual variance (sigma squared): 4.7274, (sigma: 2.1743)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 2461.5, (AIC for lm: 2748.7)
LM test for residual autocorrelation
test value: 10.194, p-value: 0.0014093
spatial_lag_y_minmax <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_minmax
    )


summary(spatial_lag_y_minmax)

Call:spatialreg::lagsarlm(formula = afd_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_minmax)

Residuals:
     Min       1Q   Median       3Q      Max 
-8.01961 -1.86539 -0.20147  1.53174 10.89683 

Type: lag 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value  Pr(>|z|)
(Intercept)      0.5734959  0.4946409  1.1594    0.2463
immigrant_share  0.2097864  0.0163578 12.8249 < 2.2e-16
inhabitants     -0.0181542  0.0024539 -7.3981 1.381e-13

Rho: 0.72408, LR test value: 63.322, p-value: 1.7764e-15
Asymptotic standard error: 0.091033
    z-value: 7.954, p-value: 1.7764e-15
Wald statistic: 63.266, p-value: 1.7764e-15

Log likelihood: -1338.692 for lag model
ML residual variance (sigma squared): 7.9906, (sigma: 2.8268)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 2687.4, (AIC for lm: 2748.7)
LM test for residual autocorrelation
test value: 133.46, p-value: < 2.22e-16
# ρ = 0.723 (queen_W) vs. 0.724 (queen_minmax), nearly identical and both within [-1,1]: The spatial clustering of AfD vote is robust to the weights specification

#  AIC 2461 (queen_W) vs. 2687,  LM residual p = .001 (queen_W) vs. p < .001 (queen_minmax)
# also quite similar but slightly better fit for queen_W
spatialreg::impacts(spatial_lag_y_W, listw = queen_W)
Impact measures (lag, exact):
                            Direct    Indirect       Total
immigrant_share dy/dx  0.129008335  0.27278707  0.40179541
inhabitants dy/dx     -0.008581316 -0.01814512 -0.02672644
spatialreg::impacts(spatial_lag_y_minmax, listw = queen_minmax)
Impact measures (lag, exact):
                           Direct    Indirect       Total
immigrant_share dy/dx  0.21301001  0.54729596  0.76030596
inhabitants dy/dx     -0.01843319 -0.04736122 -0.06579441
# immigrant_share is positive and significant in both (p < .001), 
# but β(direct) = 0.129 (queen_W) vs. 0.213 (queen_minmax) 
# effect direction stays the same but magnitude doubles