Introduction to Geospatial Techniques for Social Scientists in R

Spatial Econometrics & Outlook

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 24, 2024

1 / 56

Now
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

2 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

What are spatial econometrics?

Econometrics could be reduced to using statistics to model (complex) theories ...

it is interesting for causal inference and thinking
as default we think about regression analysis

Therefore, spatial econometrics combine spatial analysis and econometrics

study of why spatial relationships (i.e., autocorrelation) exist
how spatial autocorrelation affects our outcome of interest

What is the data generation process?

3 / 56

Spatial diffusion vs. spatial spillover

There are at least two common mechanisms we are interested in spatial econometrics

Diffusion

$y_i$ affects $y_j$ through $w_{ij}$
$y_j$ affects $y_i$ through $w_{ji}$
that's a feedback effect
- endogenous by design!
Examples:
- pandemic and policy measures to contain the pandemic
- diffusion of violence in a war

Spillover

$x_i$ affects $y_j$ through $w_{ij}$
$x_j$ affects $y_i$ through $w_{ij}$
Examples:
- spillover of economic strength and trade

4 / 56

Let's have another look at our chessboard

We have to think about theories and mechanisms and how they translate into spatial effects and the data generation process.

That said, there are tests to check for the specific data generation process at hand, but they are not recommended to be used naively.

5 / 56

Is it meaningful or just nuisances?

Space can be important in our analysis in two ways.

it's meaningful in our theory and we thus interpret it accordingly after estimation
it can distort our empirical estimates, producing bias, inconsistency, and inefficiency

We can address both of these different perspectives in our analysis with spatial econometric methods.

6 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

7 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):

$Y = \rho WY + X\beta + \epsilon$

8 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):

$Y = \rho WY + X\beta + \epsilon$

Spatial Lag X Model (SLX, Spillover):

$Y = X\beta + WX\theta + \epsilon$

9 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):

$Y = \rho WY + X\beta + \epsilon$

Spatial Lag X Model (SLX, Spillover):

$Y = X\beta + WX\theta + \epsilon$

Spatial Error Model (SEM):

$Y = X\beta + u$ $u = \lambda Wu + \epsilon$

10 / 56

Flavors and extensions

Spatial Durbin Model:

$Y = \rho WY + X\beta + WX\theta + \epsilon$

Spatial Durbin Error Model:

$Y = X\beta + WX\theta + u$ $u = \lambda Wu + \epsilon$

Combined Spatial Autocorrelation Model:

$Y = \rho WY + X\beta + u$ $u = \lambda Wu + \epsilon$

Manski Model:

$Y = \rho WY + WX\theta + X\beta + u$ $u = \lambda Wu + \epsilon$

Source:Tenor

11 / 56

Intermediate summary

There are a lot of models you could estimate to explain spatial autocorrelation. And there's a vast body of literature on what's the best choice for which application.

We'd explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.

In this session, we will only estimate Spatial Lag Y and X and Spatial Error Models.

12 / 56

'Research' question and data

We will use the same example as in the previous session. But this time, we will actually test if one of our spatial regression models helps investigating the data generation process any further. We may ask:

Do immigrant shares have an effect on AfD voting shares within voting districts?
Do immigrant shares have an effect on AfD voting shares between neighborhoods? (=spillover)
Do AfD voting shares have an effect on AfD voting shares between neighborhoods? (=diffusion)

It might also be a good idea to control for inhabitant numbers within the voting districts.

13 / 56

Linear regression

linear_regression <-
  lm(afd_share ~ immigrant_share + inhabitants, data = election_results)
summary(linear_regression)

## 
## Call:
## lm(formula = afd_share ~ immigrant_share + inhabitants, data = election_results)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.010  -3.397  -0.232   2.790  25.032 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     27.737242   0.579582  47.857  < 2e-16 ***
## immigrant_share -0.097675   0.026150  -3.735 0.000207 ***
## inhabitants     -0.079595   0.003812 -20.879  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.843 on 540 degrees of freedom
## Multiple R-squared:  0.4822,    Adjusted R-squared:  0.4803 
## F-statistic: 251.4 on 2 and 540 DF,  p-value: < 2.2e-16

14 / 56

Now we need a spatial weight

To estimate a spatial regression we, once again, have to construct a spatial weight as in the analysis of spatial autocorrelation. In fact, we'll use the same approach as before.

queen_neighborhoods <- spdep::poly2nb(election_results, queen = TRUE)
queen_W <- spdep::nb2listw(queen_neighborhoods, style = "W")

15 / 56

Spatial Error Model: If we want to control nuisance

spatial_error_model <-
  spatialreg::errorsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W
    )
summary(spatial_error_model)

## 
## Call:
## spatialreg::errorsarlm(formula = afd_share ~ immigrant_share + 
##     inhabitants, data = election_results, listw = queen_W)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -9.60213 -2.38063 -0.40782  1.97417 25.55441 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)     22.8185498  0.9398113 24.2799 < 2.2e-16
## immigrant_share -0.0806095  0.0281025 -2.8684  0.004125
## inhabitants     -0.0337644  0.0045643 -7.3974 1.388e-13
## 
## Lambda: 0.75749, LR test value: 216.39, p-value: < 2.22e-16
## Asymptotic standard error: 0.033094
##     z-value: 22.889, p-value: < 2.22e-16
## Wald statistic: 523.9, p-value: < 2.22e-16
## 
## Log likelihood: -1517.349 for error model
## ML residual variance (sigma squared): 13.532, (sigma: 3.6785)
## Number of observations: 543 
## Number of parameters estimated: 5 
## AIC: NA (not available for weighted model), (AIC for lm: 3259.1)

16 / 56

Spatial Lag X Model: estimating spillovers

spatial_lag_x_model <-
  spatialreg::lmSLX(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W
  )
summary(spatial_lag_x_model)

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.4243  -3.0311  -0.1935   2.4388  25.0694 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         30.649157   0.671665  45.632  < 2e-16 ***
## immigrant_share     -0.069702   0.034623  -2.013   0.0446 *  
## inhabitants         -0.026439   0.005841  -4.526  7.4e-06 ***
## lag.immigrant_share -0.026168   0.048127  -0.544   0.5869    
## lag.inhabitants     -0.085389   0.007656 -11.153  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.364 on 538 degrees of freedom
## Multiple R-squared:  0.5811,    Adjusted R-squared:  0.578 
## F-statistic: 186.6 on 4 and 538 DF,  p-value: < 2.2e-16

17 / 56

Spatial Lag Y Model: estimating diffusion

spatial_lag_y_model <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W)
summary(spatial_lag_y_model)

## 
## Call:
## spatialreg::lagsarlm(formula = afd_share ~ immigrant_share + 
##     inhabitants, data = election_results, listw = queen_W)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10.17786  -2.27359  -0.29956   1.98212  24.26683 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)     10.2465884  0.9782773 10.4741 < 2.2e-16
## immigrant_share -0.0527021  0.0196904 -2.6765  0.007439
## inhabitants     -0.0330830  0.0034265 -9.6551 < 2.2e-16
## 
## Rho: 0.66446, LR test value: 261.11, p-value: < 2.22e-16
## Asymptotic standard error: 0.03489
##     z-value: 19.045, p-value: < 2.22e-16
## Wald statistic: 362.69, p-value: < 2.22e-16
## 
## Log likelihood: -1494.985 for lag model
## ML residual variance (sigma squared): 12.992, (sigma: 3.6045)
## Number of observations: 543 
## Number of parameters estimated: 5 
## AIC: 3000, (AIC for lm: 3259.1)
## LM test for residual autocorrelation
## test value: 21.043, p-value: 4.4919e-06

18 / 56

Comparison: What's 'better'?

AIC(spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)

##                     df      AIC
## spatial_error_model  5 3044.697
## spatial_lag_x_model  6 3147.995
## spatial_lag_y_model  5 2999.971

spdep::lm.LMtests(linear_regression, queen_W, test = c("LMerr", "LMlag"))

## 
##     Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = afd_share ~ immigrant_share +
## inhabitants, data = election_results)
## weights: queen_W
## 
## LMerr = 198.29, df = 1, p-value < 2.2e-16
## 
## 
##     Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = afd_share ~ immigrant_share +
## inhabitants, data = election_results)
## weights: queen_W
## 
## LMlag = 299.73, df = 1, p-value < 2.2e-16

Let's stick to our theory, shall we?

19 / 56

Of higher importance: interpretation

Unfortunately, in case of a Spatial Lag Y Model the spatial parameter $\rho$ only tells us that the effect is (statistically) significant -- or not.

remember: these models are endegenous by design
- we have effects of $y_j$ on $y_i$ and vice versa
- what a mess

Luckily, there's a method to decompose the spatial effects into direct, indirect and total effects: estimating impacts

20 / 56

Impact estimation in `R`

This time, let's start with the Spatial Lag Y Model:

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578

Compare it to the 'simple' regression output:

coef(spatial_lag_y_model)

##             rho     (Intercept) immigrant_share     inhabitants 
##      0.66445817     10.24658839     -0.05270212     -0.03308301

21 / 56

Spatial Lag X impacts

spatialreg::impacts(spatial_lag_x_model, listw = queen_W)

## Impact measures (SlX, glht):
##                      Direct    Indirect       Total
## immigrant_share -0.06970227 -0.02616764 -0.09586991
## inhabitants     -0.02643886 -0.08538884 -0.11182770

Compare it to the 'simple' regression output:

coef(spatial_lag_x_model)

##         (Intercept)     immigrant_share         inhabitants 
##         30.64915652         -0.06970227         -0.02643886 
## lag.immigrant_share     lag.inhabitants 
##         -0.02616764         -0.08538884

22 / 56

If you need p-values and stuff

spatialreg::impacts(spatial_lag_y_model, listw = queen_W, R = 500) %>% 
  summary(zstats = TRUE, short = TRUE)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##                      Direct    Indirect       Total
## immigrant_share 0.022390179 0.038273065 0.059658509
## inhabitants     0.003535946 0.008409926 0.009944576
## 
## Simulated z-values:
##                     Direct  Indirect      Total
## immigrant_share  -2.661032 -2.575303  -2.650849
## inhabitants     -10.629243 -7.405056 -10.041695
## 
## Simulated p-values:
##                 Direct     Indirect   Total   
## immigrant_share 0.0077901  0.010015   0.008029
## inhabitants     < 2.22e-16 1.3101e-13 < 2e-16

23 / 56

Exercise 2_3_2: Spatial Regression

Exercise

Solution

24 / 56

Outlook25 / 56

This week
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

26 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Introduction
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

27 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Introduction

Main Messages

Geospatial data are relevant in the social sciences
- already for a long time
R can serve as a full-blown Geographic Information System (GIS)

28 / 56

Vector Data
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

29 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Vector Data

Main Messages

Most common geospatial data type
Vector data come as points, lines or polygons
Information on the geometries stored in the geometry column
Attributes can be assigned to each geometric object
Attribute tables are treated as data frames

30 / 56

Mapping
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

31 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Mapping

Main Messages

the basis of each map is the geometries of geospatial data
spatial distribution of attributes becomes visible when defining an attribute and adding color scales
layer shapefiles to add more information or for aesthetic reasons

32 / 56

Raster Data
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

33 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Raster Data

Main Messages

Data format for efficient and fast analysis of geospatial data
flexible in their application
can get rather involved
however, straightforward extraction of values

34 / 56

Advanced Data Import
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

35 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Advanced Data Import

Main Messages

Geospatial data tend to be large
Often distributed over the internet
APIs help in downloading these data
can get pretty involved

36 / 56

Applied Data Wrangling
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

37 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Applied Data Wrangling

Main Messages

Georeferenced survey data require handling sensitive data
our example of wrangling and linking data, but applications may vary
spatial joins are the perfect tool to add geospatial information to other georeferenced data

38 / 56

Investigating Spatial Autocorrelation
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

39 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Investigating Spatial Autocorrelation

Main Messages

spatial autocorrelation is something that must be detected
we need a model for that
- spatial weight matrices!
there are global and local indicators of spatial autocorrelation

40 / 56

Spatial Econometrics
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

41 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Spatial Econometrics

Main Messages

spatial autocorrelation can also be explained by
- diffusion
- or spillover effect
these effects can be tested but should always be inspected with theory

42 / 56

What's left

Other map types such as

cartograms
hexagon maps
animated maps
(more) network graphs

GIS techniques, such as

geocoding
routing
cluster analysis

More Advanced Spatial(-temporal) Modeling

More data sources...

Check out gganimate

43 / 56

Data Sources

Some more information:

geospatial data are interdisciplinary
amount of data feels unlimited
data providers and data portals are often specific in the area and/or the information they cover

44 / 56

Data Sources

Some more information:

geospatial data are interdisciplinary
amount of data feels unlimited
data providers and data portals are often specific in the area and/or the information they cover

Some random examples:

45 / 56

The End46 / 56

Addon-slides: Missings in Spatial Econometrics47 / 56

What if you got missing values?

Missing values in spatial regression models do produce similar problems as in ordinary regression analysis

yield biased estimates
reduces statistical power

However, the issue gets a bit more severe as the observations interdependent

we are missing out on more information
even randomness of missings might get problematic -

Thus, it might be a good idea to think of methods to navigate this bias.

48 / 56

Let's produce a dataset with missing data

# ~10% missing values
missing_index <- 
  sample(
    1:nrow(election_results), 
    round(nrow(election_results) * .1, 0)
    )
election_results_missing <- 
  election_results
election_results_missing$afd_share[missing_index] <- 
  NA
# list-wise deletion
election_results_missing <- 
  na.omit(election_results_missing)
tm_shape(election_results_missing) +
  tm_fill("afd_share", palette = "viridis")

49 / 56

How does a Spatial Lag X Model perform?

queen_neighborhoods_missing <- 
  spdep::poly2nb(election_results_missing, queen = TRUE)
queen_W_missing <- 
  spdep::nb2listw(queen_neighborhoods_missing, style = "W", zero.policy = TRUE)
spatial_lag_y_model_missing <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results_missing,
    listw = queen_W_missing,
    zero.policy = TRUE
    )

50 / 56

Model comparison

spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05450334 -0.07883944 -0.13334278
## inhabitants     -0.03918632 -0.05668327 -0.09586959

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578

51 / 56

What to do now?

The way how to deal with missing data in geospatial data depends on their general geometric structure. For points, there are established methods, such as interpolation.

Often these are somewhat ways of aggregating data, which does not help in our case. I'd say that good old imputation techniques might also help:

good for multivariate cases
yet, they are no spatial techniques and cannot create plausible values for spatial relationships
- but imputing spatial relationships would be a matter of contingency anyway

52 / 56

Simplest case of imputation

# ~10% missing values
missing_index <- 
  sample(
    1:nrow(election_results), 
    round(nrow(election_results) * .1, 0)
    )
election_results_missing <- 
  election_results
election_results_missing$afd_share[missing_index] <- NA
election_results_missing <-
  election_results_missing %>% 
  sf::st_drop_geometry() %>% 
  mice::mice(method = "norm.predict", m = 1) %>% 
  mice::complete() %>% 
  dplyr::left_join(
    election_results_missing %>% 
      dplyr::select(-afd_share, -immigrant_share, -inhabitants)
  ) %>% 
  sf::st_as_sf()

## 
##  iter imp variable
##   1   1  afd_share
##   2   1  afd_share
##   3   1  afd_share
##   4   1  afd_share
##   5   1  afd_share

53 / 56

And again run the model

queen_neighborhoods_missing <- 
  spdep::poly2nb(election_results_missing, queen = TRUE)
queen_W_missing <- 
  spdep::nb2listw(queen_neighborhoods_missing, style = "W")
spatial_lag_y_model_missing <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results_missing,
    listw = queen_W_missing
    )

54 / 56

...and compare it with the original one

spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)

## Impact measures (lag, exact):
##                      Direct    Indirect      Total
## immigrant_share -0.04834610 -0.06855918 -0.1169053
## inhabitants     -0.04148773 -0.05883338 -0.1003211

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578

55 / 56

anne-kathrin.stroppe@gesis.org

@astroppe

stroppann

NA

56 / 56

Now

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Day

Time

Title

April 23

10:00-11:30

Introduction to GIS

April 23

11:45-13:00

Vector Data

April 23

13:00-14:00

Lunch Break

April 23

14:00-15:30

Mapping

April 23

15:45-17:00

Raster Data

April 24

09:00-10:30

Advanced Data Import & Processing

April 24

10:45-12:00

Applied Data Wrangling & Linking

April 24

12:00-13:00

Lunch Break

April 24

13:00-14:30

Investigating Spatial Autocorrelation

April 24

14:45-16:00

Spatial Econometrics & Outlook

2 / 56

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Tile View: Overview of Slides

Spatial Econometrics & Outlook

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 24, 2024

1 / 56

Now
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

2 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

What are spatial econometrics?

Econometrics could be reduced to using statistics to model (complex) theories ...

it is interesting for causal inference and thinking
as default we think about regression analysis

Therefore, spatial econometrics combine spatial analysis and econometrics

study of why spatial relationships (i.e., autocorrelation) exist
how spatial autocorrelation affects our outcome of interest

What is the data generation process?

3 / 56

Spatial diffusion vs. spatial spillover

There are at least two common mechanisms we are interested in spatial econometrics

Diffusion

$y_i$ affects $y_j$ through $w_{ij}$
$y_j$ affects $y_i$ through $w_{ji}$
that's a feedback effect
- endogenous by design!
Examples:
- pandemic and policy measures to contain the pandemic
- diffusion of violence in a war

Spillover

$x_i$ affects $y_j$ through $w_{ij}$
$x_j$ affects $y_i$ through $w_{ij}$
Examples:
- spillover of economic strength and trade

4 / 56

Let's have another look at our chessboard

We have to think about theories and mechanisms and how they translate into spatial effects and the data generation process.

That said, there are tests to check for the specific data generation process at hand, but they are not recommended to be used naively.

5 / 56

Is it meaningful or just nuisances?

Space can be important in our analysis in two ways.

it's meaningful in our theory and we thus interpret it accordingly after estimation
it can distort our empirical estimates, producing bias, inconsistency, and inefficiency

We can address both of these different perspectives in our analysis with spatial econometric methods.

6 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

7 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):

$Y = \rho WY + X\beta + \epsilon$

8 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):

$Y = \rho WY + X\beta + \epsilon$

Spatial Lag X Model (SLX, Spillover):

$Y = X\beta + WX\theta + \epsilon$

9 / 56

Formulas... models, models, models

Linear Regression:

$Y = X\beta + \epsilon$

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):

$Y = \rho WY + X\beta + \epsilon$

Spatial Lag X Model (SLX, Spillover):

$Y = X\beta + WX\theta + \epsilon$

Spatial Error Model (SEM):

$Y = X\beta + u$ $u = \lambda Wu + \epsilon$

10 / 56

Flavors and extensions

Spatial Durbin Model:

$Y = \rho WY + X\beta + WX\theta + \epsilon$

Spatial Durbin Error Model:

$Y = X\beta + WX\theta + u$ $u = \lambda Wu + \epsilon$

Combined Spatial Autocorrelation Model:

$Y = \rho WY + X\beta + u$ $u = \lambda Wu + \epsilon$

Manski Model:

$Y = \rho WY + WX\theta + X\beta + u$ $u = \lambda Wu + \epsilon$

Source:Tenor

11 / 56

Intermediate summary

There are a lot of models you could estimate to explain spatial autocorrelation. And there's a vast body of literature on what's the best choice for which application.

We'd explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.

In this session, we will only estimate Spatial Lag Y and X and Spatial Error Models.

12 / 56

'Research' question and data

Do immigrant shares have an effect on AfD voting shares within voting districts?
Do immigrant shares have an effect on AfD voting shares between neighborhoods? (=spillover)
Do AfD voting shares have an effect on AfD voting shares between neighborhoods? (=diffusion)

It might also be a good idea to control for inhabitant numbers within the voting districts.

13 / 56

Linear regression

linear_regression <-
  lm(afd_share ~ immigrant_share + inhabitants, data = election_results)
summary(linear_regression)

## 
## Call:
## lm(formula = afd_share ~ immigrant_share + inhabitants, data = election_results)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.010  -3.397  -0.232   2.790  25.032 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     27.737242   0.579582  47.857  < 2e-16 ***
## immigrant_share -0.097675   0.026150  -3.735 0.000207 ***
## inhabitants     -0.079595   0.003812 -20.879  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.843 on 540 degrees of freedom
## Multiple R-squared:  0.4822,    Adjusted R-squared:  0.4803 
## F-statistic: 251.4 on 2 and 540 DF,  p-value: < 2.2e-16

14 / 56

Now we need a spatial weight

To estimate a spatial regression we, once again, have to construct a spatial weight as in the analysis of spatial autocorrelation. In fact, we'll use the same approach as before.

queen_neighborhoods <- spdep::poly2nb(election_results, queen = TRUE)
queen_W <- spdep::nb2listw(queen_neighborhoods, style = "W")

15 / 56

Spatial Error Model: If we want to control nuisancespatial_error_model <-
  spatialreg::errorsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W
    )
summary(spatial_error_model)

## 
## Call:
## spatialreg::errorsarlm(formula = afd_share ~ immigrant_share + 
##     inhabitants, data = election_results, listw = queen_W)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -9.60213 -2.38063 -0.40782  1.97417 25.55441 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)     22.8185498  0.9398113 24.2799 < 2.2e-16
## immigrant_share -0.0806095  0.0281025 -2.8684  0.004125
## inhabitants     -0.0337644  0.0045643 -7.3974 1.388e-13
## 
## Lambda: 0.75749, LR test value: 216.39, p-value: < 2.22e-16
## Asymptotic standard error: 0.033094
##     z-value: 22.889, p-value: < 2.22e-16
## Wald statistic: 523.9, p-value: < 2.22e-16
## 
## Log likelihood: -1517.349 for error model
## ML residual variance (sigma squared): 13.532, (sigma: 3.6785)
## Number of observations: 543 
## Number of parameters estimated: 5 
## AIC: NA (not available for weighted model), (AIC for lm: 3259.1)
16 / 56

Spatial Lag X Model: estimating spilloversspatial_lag_x_model <-
  spatialreg::lmSLX(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W
  )
summary(spatial_lag_x_model)

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.4243  -3.0311  -0.1935   2.4388  25.0694 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         30.649157   0.671665  45.632  < 2e-16 ***
## immigrant_share     -0.069702   0.034623  -2.013   0.0446 *  
## inhabitants         -0.026439   0.005841  -4.526  7.4e-06 ***
## lag.immigrant_share -0.026168   0.048127  -0.544   0.5869    
## lag.inhabitants     -0.085389   0.007656 -11.153  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.364 on 538 degrees of freedom
## Multiple R-squared:  0.5811,    Adjusted R-squared:  0.578 
## F-statistic: 186.6 on 4 and 538 DF,  p-value: < 2.2e-16
17 / 56

Spatial Lag Y Model: estimating diffusionspatial_lag_y_model <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W)
summary(spatial_lag_y_model)

## 
## Call:
## spatialreg::lagsarlm(formula = afd_share ~ immigrant_share + 
##     inhabitants, data = election_results, listw = queen_W)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10.17786  -2.27359  -0.29956   1.98212  24.26683 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)     10.2465884  0.9782773 10.4741 < 2.2e-16
## immigrant_share -0.0527021  0.0196904 -2.6765  0.007439
## inhabitants     -0.0330830  0.0034265 -9.6551 < 2.2e-16
## 
## Rho: 0.66446, LR test value: 261.11, p-value: < 2.22e-16
## Asymptotic standard error: 0.03489
##     z-value: 19.045, p-value: < 2.22e-16
## Wald statistic: 362.69, p-value: < 2.22e-16
## 
## Log likelihood: -1494.985 for lag model
## ML residual variance (sigma squared): 12.992, (sigma: 3.6045)
## Number of observations: 543 
## Number of parameters estimated: 5 
## AIC: 3000, (AIC for lm: 3259.1)
## LM test for residual autocorrelation
## test value: 21.043, p-value: 4.4919e-06
18 / 56

Comparison: What's 'better'?

AIC(spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)

##                     df      AIC
## spatial_error_model  5 3044.697
## spatial_lag_x_model  6 3147.995
## spatial_lag_y_model  5 2999.971

spdep::lm.LMtests(linear_regression, queen_W, test = c("LMerr", "LMlag"))

## 
##     Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = afd_share ~ immigrant_share +
## inhabitants, data = election_results)
## weights: queen_W
## 
## LMerr = 198.29, df = 1, p-value < 2.2e-16
## 
## 
##     Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = afd_share ~ immigrant_share +
## inhabitants, data = election_results)
## weights: queen_W
## 
## LMlag = 299.73, df = 1, p-value < 2.2e-16

Let's stick to our theory, shall we?

19 / 56

Of higher importance: interpretation

Unfortunately, in case of a Spatial Lag Y Model the spatial parameter $\rho$ only tells us that the effect is (statistically) significant -- or not.

remember: these models are endegenous by design
- we have effects of $y_j$ on $y_i$ and vice versa
- what a mess

Luckily, there's a method to decompose the spatial effects into direct, indirect and total effects: estimating impacts

20 / 56

Impact estimation in `R`

This time, let's start with the Spatial Lag Y Model:

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578

Compare it to the 'simple' regression output:

coef(spatial_lag_y_model)

##             rho     (Intercept) immigrant_share     inhabitants 
##      0.66445817     10.24658839     -0.05270212     -0.03308301

21 / 56

Spatial Lag X impacts

spatialreg::impacts(spatial_lag_x_model, listw = queen_W)

## Impact measures (SlX, glht):
##                      Direct    Indirect       Total
## immigrant_share -0.06970227 -0.02616764 -0.09586991
## inhabitants     -0.02643886 -0.08538884 -0.11182770

Compare it to the 'simple' regression output:

coef(spatial_lag_x_model)

##         (Intercept)     immigrant_share         inhabitants 
##         30.64915652         -0.06970227         -0.02643886 
## lag.immigrant_share     lag.inhabitants 
##         -0.02616764         -0.08538884

22 / 56

If you need p-values and stuff

spatialreg::impacts(spatial_lag_y_model, listw = queen_W, R = 500) %>% 
  summary(zstats = TRUE, short = TRUE)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##                      Direct    Indirect       Total
## immigrant_share 0.022390179 0.038273065 0.059658509
## inhabitants     0.003535946 0.008409926 0.009944576
## 
## Simulated z-values:
##                     Direct  Indirect      Total
## immigrant_share  -2.661032 -2.575303  -2.650849
## inhabitants     -10.629243 -7.405056 -10.041695
## 
## Simulated p-values:
##                 Direct     Indirect   Total   
## immigrant_share 0.0077901  0.010015   0.008029
## inhabitants     < 2.22e-16 1.3101e-13 < 2e-16

23 / 56

Exercise 2_3_2: Spatial Regression

Exercise

Solution

24 / 56

Outlook25 / 56

This week
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

26 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Introduction
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

27 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Introduction

Main Messages

Geospatial data are relevant in the social sciences
- already for a long time
R can serve as a full-blown Geographic Information System (GIS)

28 / 56

Vector Data
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

29 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Vector Data

Main Messages

Most common geospatial data type
Vector data come as points, lines or polygons
Information on the geometries stored in the geometry column
Attributes can be assigned to each geometric object
Attribute tables are treated as data frames

30 / 56

Mapping
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

31 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Mapping

Main Messages

the basis of each map is the geometries of geospatial data
spatial distribution of attributes becomes visible when defining an attribute and adding color scales
layer shapefiles to add more information or for aesthetic reasons

32 / 56

Raster Data
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

33 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Raster Data

Main Messages

Data format for efficient and fast analysis of geospatial data
flexible in their application
can get rather involved
however, straightforward extraction of values

34 / 56

Advanced Data Import
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

35 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Advanced Data Import

Main Messages

Geospatial data tend to be large
Often distributed over the internet
APIs help in downloading these data
can get pretty involved

36 / 56

Applied Data Wrangling
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

37 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Applied Data Wrangling

Main Messages

Georeferenced survey data require handling sensitive data
our example of wrangling and linking data, but applications may vary
spatial joins are the perfect tool to add geospatial information to other georeferenced data

38 / 56

Investigating Spatial Autocorrelation
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

39 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Investigating Spatial Autocorrelation

Main Messages

spatial autocorrelation is something that must be detected
we need a model for that
- spatial weight matrices!
there are global and local indicators of spatial autocorrelation

40 / 56

Spatial Econometrics
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

41 / 56

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Spatial Econometrics

Main Messages

spatial autocorrelation can also be explained by
- diffusion
- or spillover effect
these effects can be tested but should always be inspected with theory

42 / 56

What's left

Other map types such as

cartograms
hexagon maps
animated maps
(more) network graphs

GIS techniques, such as

geocoding
routing
cluster analysis

More Advanced Spatial(-temporal) Modeling

More data sources...

Check out gganimate

43 / 56

Data Sources

Some more information:

geospatial data are interdisciplinary
amount of data feels unlimited
data providers and data portals are often specific in the area and/or the information they cover

44 / 56

Data Sources

Some more information:

geospatial data are interdisciplinary
amount of data feels unlimited
data providers and data portals are often specific in the area and/or the information they cover

Some random examples:

45 / 56

The End46 / 56

Addon-slides: Missings in Spatial Econometrics47 / 56

What if you got missing values?

Missing values in spatial regression models do produce similar problems as in ordinary regression analysis

yield biased estimates
reduces statistical power

However, the issue gets a bit more severe as the observations interdependent

we are missing out on more information
even randomness of missings might get problematic -

Thus, it might be a good idea to think of methods to navigate this bias.

48 / 56

Let's produce a dataset with missing data

# ~10% missing values
missing_index <- 
  sample(
    1:nrow(election_results), 
    round(nrow(election_results) * .1, 0)
    )
election_results_missing <- 
  election_results
election_results_missing$afd_share[missing_index] <- 
  NA
# list-wise deletion
election_results_missing <- 
  na.omit(election_results_missing)
tm_shape(election_results_missing) +
  tm_fill("afd_share", palette = "viridis")

49 / 56

How does a Spatial Lag X Model perform?

queen_neighborhoods_missing <- 
  spdep::poly2nb(election_results_missing, queen = TRUE)
queen_W_missing <- 
  spdep::nb2listw(queen_neighborhoods_missing, style = "W", zero.policy = TRUE)
spatial_lag_y_model_missing <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results_missing,
    listw = queen_W_missing,
    zero.policy = TRUE
    )

50 / 56

Model comparison

spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05450334 -0.07883944 -0.13334278
## inhabitants     -0.03918632 -0.05668327 -0.09586959

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578

51 / 56

What to do now?

The way how to deal with missing data in geospatial data depends on their general geometric structure. For points, there are established methods, such as interpolation.

Often these are somewhat ways of aggregating data, which does not help in our case. I'd say that good old imputation techniques might also help:

good for multivariate cases
yet, they are no spatial techniques and cannot create plausible values for spatial relationships
- but imputing spatial relationships would be a matter of contingency anyway

52 / 56

Simplest case of imputation

# ~10% missing values
missing_index <- 
  sample(
    1:nrow(election_results), 
    round(nrow(election_results) * .1, 0)
    )
election_results_missing <- 
  election_results
election_results_missing$afd_share[missing_index] <- NA
election_results_missing <-
  election_results_missing %>% 
  sf::st_drop_geometry() %>% 
  mice::mice(method = "norm.predict", m = 1) %>% 
  mice::complete() %>% 
  dplyr::left_join(
    election_results_missing %>% 
      dplyr::select(-afd_share, -immigrant_share, -inhabitants)
  ) %>% 
  sf::st_as_sf()

## 
##  iter imp variable
##   1   1  afd_share
##   2   1  afd_share
##   3   1  afd_share
##   4   1  afd_share
##   5   1  afd_share

53 / 56

And again run the model

queen_neighborhoods_missing <- 
  spdep::poly2nb(election_results_missing, queen = TRUE)
queen_W_missing <- 
  spdep::nb2listw(queen_neighborhoods_missing, style = "W")
spatial_lag_y_model_missing <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results_missing,
    listw = queen_W_missing
    )

54 / 56

...and compare it with the original one

spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)

## Impact measures (lag, exact):
##                      Direct    Indirect      Total
## immigrant_share -0.04834610 -0.06855918 -0.1169053
## inhabitants     -0.04148773 -0.05883338 -0.1003211

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)

## Impact measures (lag, exact):
##                      Direct    Indirect       Total
## immigrant_share -0.05948993 -0.09757580 -0.15706572
## inhabitants     -0.03734396 -0.06125182 -0.09859578

55 / 56

anne-kathrin.stroppe@gesis.org

@astroppe

stroppann

NA

56 / 56

Introduction to Geospatial Techniques for Social Scientists in R

Spatial Econometrics & Outlook

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 24, 2024

Now

What are spatial econometrics?

Spatial diffusion vs. spatial spillover

Let's have another look at our chessboard

Is it meaningful or just nuisances?

Formulas... models, models, models

Formulas... models, models, models

Formulas... models, models, models

Formulas... models, models, models

Flavors and extensions

Intermediate summary

'Research' question and data

Linear regression

Now we need a spatial weight

Spatial Error Model: If we want to control nuisance

Spatial Lag X Model: estimating spillovers

Spatial Lag Y Model: estimating diffusion

Comparison: What's 'better'?

Of higher importance: interpretation

Impact estimation in R

Spatial Lag X impacts

If you need p-values and stuff

Exercise 2_3_2: Spatial Regression

Outlook

This week

Introduction

Introduction

Vector Data

Vector Data

Mapping

Mapping

Raster Data

Raster Data

Advanced Data Import

Advanced Data Import

Applied Data Wrangling

Applied Data Wrangling

Investigating Spatial Autocorrelation

Investigating Spatial Autocorrelation

Spatial Econometrics

Spatial Econometrics

What's left

Data Sources

Data Sources

The End

Addon-slides: Missings in Spatial Econometrics

What if you got missing values?

Let's produce a dataset with missing data

How does a Spatial Lag X Model perform?

Model comparison

What to do now?

Simplest case of imputation

And again run the model

...and compare it with the original one

Now

Help

Introduction to Geospatial Techniques for Social Scientists in R

Introduction to Geospatial Techniques for Social Scientists in R

Spatial Econometrics & Outlook

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 24, 2024

Now

What are spatial econometrics?

Spatial diffusion vs. spatial spillover

Let's have another look at our chessboard

Is it meaningful or just nuisances?

Formulas... models, models, models

Formulas... models, models, models

Formulas... models, models, models

Formulas... models, models, models

Flavors and extensions

Intermediate summary

'Research' question and data

Linear regression

Impact estimation in `R`

Impact estimation in `R`