GESIS Workshop
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Econometrics could be reduced to using statistics to model (complex) theories ...
Therefore, spatial econometrics combine spatial analysis and econometrics
What is the data generation process?
There are at least two common mechanisms we are interested in spatial econometrics
Diffusion
Spillover
We have to think about theories and mechanisms and how they translate into spatial effects and the data generation process.
That said, there are tests to check for the specific data generation process at hand, but they are not recommended to be used naively.
Space can be important in our analysis in two ways.
We can address both of these different perspectives in our analysis with spatial econometric methods.
Linear Regression:
Y=Xβ+ϵ
Linear Regression:
Y=Xβ+ϵ
Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):
Y=ρWY+Xβ+ϵ
Linear Regression:
Y=Xβ+ϵ
Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):
Y=ρWY+Xβ+ϵ
Spatial Lag X Model (SLX, Spillover):
Y=Xβ+WXθ+ϵ
Linear Regression:
Y=Xβ+ϵ
Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):
Y=ρWY+Xβ+ϵ
Spatial Lag X Model (SLX, Spillover):
Y=Xβ+WXθ+ϵ
Spatial Error Model (SEM):
Y=Xβ+u u=λWu+ϵ
Spatial Durbin Model:
Y=ρWY+Xβ+WXθ+ϵ
Spatial Durbin Error Model:
Y=Xβ+WXθ+u u=λWu+ϵ
Combined Spatial Autocorrelation Model:
Y=ρWY+Xβ+u u=λWu+ϵ
Manski Model:
Y=ρWY+WXθ+Xβ+u u=λWu+ϵ
Source:Tenor
There are a lot of models you could estimate to explain spatial autocorrelation. And there's a vast body of literature on what's the best choice for which application.
We'd explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.
In this session, we will only estimate Spatial Lag Y and X and Spatial Error Models.
We will use the same example as in the previous session. But this time, we will actually test if one of our spatial regression models helps investigating the data generation process any further. We may ask:
It might also be a good idea to control for inhabitant numbers within the voting districts.
linear_regression <- lm(afd_share ~ immigrant_share + inhabitants, data = election_results)summary(linear_regression)
## ## Call:## lm(formula = afd_share ~ immigrant_share + inhabitants, data = election_results)## ## Residuals:## Min 1Q Median 3Q Max ## -15.010 -3.397 -0.232 2.790 25.032 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.737242 0.579582 47.857 < 2e-16 ***## immigrant_share -0.097675 0.026150 -3.735 0.000207 ***## inhabitants -0.079595 0.003812 -20.879 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 4.843 on 540 degrees of freedom## Multiple R-squared: 0.4822, Adjusted R-squared: 0.4803 ## F-statistic: 251.4 on 2 and 540 DF, p-value: < 2.2e-16
To estimate a spatial regression we, once again, have to construct a spatial weight as in the analysis of spatial autocorrelation. In fact, we'll use the same approach as before.
queen_neighborhoods <- spdep::poly2nb(election_results, queen = TRUE)queen_W <- spdep::nb2listw(queen_neighborhoods, style = "W")
spatial_error_model <- spatialreg::errorsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results, listw = queen_W )summary(spatial_error_model)
## ## Call:## spatialreg::errorsarlm(formula = afd_share ~ immigrant_share + ## inhabitants, data = election_results, listw = queen_W)## ## Residuals:## Min 1Q Median 3Q Max ## -9.60213 -2.38063 -0.40782 1.97417 25.55441 ## ## Type: error ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 22.8185498 0.9398113 24.2799 < 2.2e-16## immigrant_share -0.0806095 0.0281025 -2.8684 0.004125## inhabitants -0.0337644 0.0045643 -7.3974 1.388e-13## ## Lambda: 0.75749, LR test value: 216.39, p-value: < 2.22e-16## Asymptotic standard error: 0.033094## z-value: 22.889, p-value: < 2.22e-16## Wald statistic: 523.9, p-value: < 2.22e-16## ## Log likelihood: -1517.349 for error model## ML residual variance (sigma squared): 13.532, (sigma: 3.6785)## Number of observations: 543 ## Number of parameters estimated: 5 ## AIC: NA (not available for weighted model), (AIC for lm: 3259.1)
spatial_lag_x_model <- spatialreg::lmSLX( afd_share ~ immigrant_share + inhabitants, data = election_results, listw = queen_W )summary(spatial_lag_x_model)
## ## Call:## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), ## data = as.data.frame(x), weights = weights)## ## Residuals:## Min 1Q Median 3Q Max ## -10.4243 -3.0311 -0.1935 2.4388 25.0694 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.649157 0.671665 45.632 < 2e-16 ***## immigrant_share -0.069702 0.034623 -2.013 0.0446 * ## inhabitants -0.026439 0.005841 -4.526 7.4e-06 ***## lag.immigrant_share -0.026168 0.048127 -0.544 0.5869 ## lag.inhabitants -0.085389 0.007656 -11.153 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 4.364 on 538 degrees of freedom## Multiple R-squared: 0.5811, Adjusted R-squared: 0.578 ## F-statistic: 186.6 on 4 and 538 DF, p-value: < 2.2e-16
spatial_lag_y_model <- spatialreg::lagsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results, listw = queen_W)summary(spatial_lag_y_model)
## ## Call:## spatialreg::lagsarlm(formula = afd_share ~ immigrant_share + ## inhabitants, data = election_results, listw = queen_W)## ## Residuals:## Min 1Q Median 3Q Max ## -10.17786 -2.27359 -0.29956 1.98212 24.26683 ## ## Type: lag ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 10.2465884 0.9782773 10.4741 < 2.2e-16## immigrant_share -0.0527021 0.0196904 -2.6765 0.007439## inhabitants -0.0330830 0.0034265 -9.6551 < 2.2e-16## ## Rho: 0.66446, LR test value: 261.11, p-value: < 2.22e-16## Asymptotic standard error: 0.03489## z-value: 19.045, p-value: < 2.22e-16## Wald statistic: 362.69, p-value: < 2.22e-16## ## Log likelihood: -1494.985 for lag model## ML residual variance (sigma squared): 12.992, (sigma: 3.6045)## Number of observations: 543 ## Number of parameters estimated: 5 ## AIC: 3000, (AIC for lm: 3259.1)## LM test for residual autocorrelation## test value: 21.043, p-value: 4.4919e-06
AIC(spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)
## df AIC## spatial_error_model 5 3044.697## spatial_lag_x_model 6 3147.995## spatial_lag_y_model 5 2999.971
spdep::lm.LMtests(linear_regression, queen_W, test = c("LMerr", "LMlag"))
## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = afd_share ~ immigrant_share +## inhabitants, data = election_results)## weights: queen_W## ## LMerr = 198.29, df = 1, p-value < 2.2e-16## ## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = afd_share ~ immigrant_share +## inhabitants, data = election_results)## weights: queen_W## ## LMlag = 299.73, df = 1, p-value < 2.2e-16
Let's stick to our theory, shall we?
Unfortunately, in case of a Spatial Lag Y Model the spatial parameter ρ only tells us that the effect is (statistically) significant -- or not.
Luckily, there's a method to decompose the spatial effects into direct, indirect and total effects: estimating impacts
R
This time, let's start with the Spatial Lag Y Model:
spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578
Compare it to the 'simple' regression output:
coef(spatial_lag_y_model)
## rho (Intercept) immigrant_share inhabitants ## 0.66445817 10.24658839 -0.05270212 -0.03308301
spatialreg::impacts(spatial_lag_x_model, listw = queen_W)
## Impact measures (SlX, glht):## Direct Indirect Total## immigrant_share -0.06970227 -0.02616764 -0.09586991## inhabitants -0.02643886 -0.08538884 -0.11182770
Compare it to the 'simple' regression output:
coef(spatial_lag_x_model)
## (Intercept) immigrant_share inhabitants ## 30.64915652 -0.06970227 -0.02643886 ## lag.immigrant_share lag.inhabitants ## -0.02616764 -0.08538884
spatialreg::impacts(spatial_lag_y_model, listw = queen_W, R = 500) %>% summary(zstats = TRUE, short = TRUE)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578## ========================================================## Simulation results ( variance matrix):## ========================================================## Simulated standard errors## Direct Indirect Total## immigrant_share 0.022390179 0.038273065 0.059658509## inhabitants 0.003535946 0.008409926 0.009944576## ## Simulated z-values:## Direct Indirect Total## immigrant_share -2.661032 -2.575303 -2.650849## inhabitants -10.629243 -7.405056 -10.041695## ## Simulated p-values:## Direct Indirect Total ## immigrant_share 0.0077901 0.010015 0.008029## inhabitants < 2.22e-16 1.3101e-13 < 2e-16
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
R
can serve as a full-blown Geographic Information System (GIS)Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Other map types such as
GIS techniques, such as
More Advanced Spatial(-temporal) Modeling
More data sources...
Check out gganimate
Some more information:
Some more information:
Some random examples:
Missing values in spatial regression models do produce similar problems as in ordinary regression analysis
However, the issue gets a bit more severe as the observations interdependent
Thus, it might be a good idea to think of methods to navigate this bias.
# ~10% missing valuesmissing_index <- sample( 1:nrow(election_results), round(nrow(election_results) * .1, 0) )election_results_missing <- election_resultselection_results_missing$afd_share[missing_index] <- NA# list-wise deletionelection_results_missing <- na.omit(election_results_missing)tm_shape(election_results_missing) + tm_fill("afd_share", palette = "viridis")
queen_neighborhoods_missing <- spdep::poly2nb(election_results_missing, queen = TRUE)queen_W_missing <- spdep::nb2listw(queen_neighborhoods_missing, style = "W", zero.policy = TRUE)spatial_lag_y_model_missing <- spatialreg::lagsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results_missing, listw = queen_W_missing, zero.policy = TRUE )
spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05450334 -0.07883944 -0.13334278## inhabitants -0.03918632 -0.05668327 -0.09586959
spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578
The way how to deal with missing data in geospatial data depends on their general geometric structure. For points, there are established methods, such as interpolation.
Often these are somewhat ways of aggregating data, which does not help in our case. I'd say that good old imputation techniques might also help:
# ~10% missing valuesmissing_index <- sample( 1:nrow(election_results), round(nrow(election_results) * .1, 0) )election_results_missing <- election_resultselection_results_missing$afd_share[missing_index] <- NAelection_results_missing <- election_results_missing %>% sf::st_drop_geometry() %>% mice::mice(method = "norm.predict", m = 1) %>% mice::complete() %>% dplyr::left_join( election_results_missing %>% dplyr::select(-afd_share, -immigrant_share, -inhabitants) ) %>% sf::st_as_sf()
## ## iter imp variable## 1 1 afd_share## 2 1 afd_share## 3 1 afd_share## 4 1 afd_share## 5 1 afd_share
queen_neighborhoods_missing <- spdep::poly2nb(election_results_missing, queen = TRUE)queen_W_missing <- spdep::nb2listw(queen_neighborhoods_missing, style = "W")spatial_lag_y_model_missing <- spatialreg::lagsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results_missing, listw = queen_W_missing )
spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.04834610 -0.06855918 -0.1169053## inhabitants -0.04148773 -0.05883338 -0.1003211
spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
GESIS Workshop
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Econometrics could be reduced to using statistics to model (complex) theories ...
Therefore, spatial econometrics combine spatial analysis and econometrics
What is the data generation process?
There are at least two common mechanisms we are interested in spatial econometrics
Diffusion
Spillover
We have to think about theories and mechanisms and how they translate into spatial effects and the data generation process.
That said, there are tests to check for the specific data generation process at hand, but they are not recommended to be used naively.
Space can be important in our analysis in two ways.
We can address both of these different perspectives in our analysis with spatial econometric methods.
Linear Regression:
Y=Xβ+ϵ
Linear Regression:
Y=Xβ+ϵ
Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):
Y=ρWY+Xβ+ϵ
Linear Regression:
Y=Xβ+ϵ
Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):
Y=ρWY+Xβ+ϵ
Spatial Lag X Model (SLX, Spillover):
Y=Xβ+WXθ+ϵ
Linear Regression:
Y=Xβ+ϵ
Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):
Y=ρWY+Xβ+ϵ
Spatial Lag X Model (SLX, Spillover):
Y=Xβ+WXθ+ϵ
Spatial Error Model (SEM):
Y=Xβ+u u=λWu+ϵ
Spatial Durbin Model:
Y=ρWY+Xβ+WXθ+ϵ
Spatial Durbin Error Model:
Y=Xβ+WXθ+u u=λWu+ϵ
Combined Spatial Autocorrelation Model:
Y=ρWY+Xβ+u u=λWu+ϵ
Manski Model:
Y=ρWY+WXθ+Xβ+u u=λWu+ϵ
Source:Tenor
There are a lot of models you could estimate to explain spatial autocorrelation. And there's a vast body of literature on what's the best choice for which application.
We'd explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.
In this session, we will only estimate Spatial Lag Y and X and Spatial Error Models.
We will use the same example as in the previous session. But this time, we will actually test if one of our spatial regression models helps investigating the data generation process any further. We may ask:
It might also be a good idea to control for inhabitant numbers within the voting districts.
linear_regression <- lm(afd_share ~ immigrant_share + inhabitants, data = election_results)summary(linear_regression)
## ## Call:## lm(formula = afd_share ~ immigrant_share + inhabitants, data = election_results)## ## Residuals:## Min 1Q Median 3Q Max ## -15.010 -3.397 -0.232 2.790 25.032 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.737242 0.579582 47.857 < 2e-16 ***## immigrant_share -0.097675 0.026150 -3.735 0.000207 ***## inhabitants -0.079595 0.003812 -20.879 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 4.843 on 540 degrees of freedom## Multiple R-squared: 0.4822, Adjusted R-squared: 0.4803 ## F-statistic: 251.4 on 2 and 540 DF, p-value: < 2.2e-16
To estimate a spatial regression we, once again, have to construct a spatial weight as in the analysis of spatial autocorrelation. In fact, we'll use the same approach as before.
queen_neighborhoods <- spdep::poly2nb(election_results, queen = TRUE)queen_W <- spdep::nb2listw(queen_neighborhoods, style = "W")
spatial_error_model <- spatialreg::errorsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results, listw = queen_W )summary(spatial_error_model)
## ## Call:## spatialreg::errorsarlm(formula = afd_share ~ immigrant_share + ## inhabitants, data = election_results, listw = queen_W)## ## Residuals:## Min 1Q Median 3Q Max ## -9.60213 -2.38063 -0.40782 1.97417 25.55441 ## ## Type: error ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 22.8185498 0.9398113 24.2799 < 2.2e-16## immigrant_share -0.0806095 0.0281025 -2.8684 0.004125## inhabitants -0.0337644 0.0045643 -7.3974 1.388e-13## ## Lambda: 0.75749, LR test value: 216.39, p-value: < 2.22e-16## Asymptotic standard error: 0.033094## z-value: 22.889, p-value: < 2.22e-16## Wald statistic: 523.9, p-value: < 2.22e-16## ## Log likelihood: -1517.349 for error model## ML residual variance (sigma squared): 13.532, (sigma: 3.6785)## Number of observations: 543 ## Number of parameters estimated: 5 ## AIC: NA (not available for weighted model), (AIC for lm: 3259.1)
spatial_lag_x_model <- spatialreg::lmSLX( afd_share ~ immigrant_share + inhabitants, data = election_results, listw = queen_W )summary(spatial_lag_x_model)
## ## Call:## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), ## data = as.data.frame(x), weights = weights)## ## Residuals:## Min 1Q Median 3Q Max ## -10.4243 -3.0311 -0.1935 2.4388 25.0694 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.649157 0.671665 45.632 < 2e-16 ***## immigrant_share -0.069702 0.034623 -2.013 0.0446 * ## inhabitants -0.026439 0.005841 -4.526 7.4e-06 ***## lag.immigrant_share -0.026168 0.048127 -0.544 0.5869 ## lag.inhabitants -0.085389 0.007656 -11.153 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 4.364 on 538 degrees of freedom## Multiple R-squared: 0.5811, Adjusted R-squared: 0.578 ## F-statistic: 186.6 on 4 and 538 DF, p-value: < 2.2e-16
spatial_lag_y_model <- spatialreg::lagsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results, listw = queen_W)summary(spatial_lag_y_model)
## ## Call:## spatialreg::lagsarlm(formula = afd_share ~ immigrant_share + ## inhabitants, data = election_results, listw = queen_W)## ## Residuals:## Min 1Q Median 3Q Max ## -10.17786 -2.27359 -0.29956 1.98212 24.26683 ## ## Type: lag ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 10.2465884 0.9782773 10.4741 < 2.2e-16## immigrant_share -0.0527021 0.0196904 -2.6765 0.007439## inhabitants -0.0330830 0.0034265 -9.6551 < 2.2e-16## ## Rho: 0.66446, LR test value: 261.11, p-value: < 2.22e-16## Asymptotic standard error: 0.03489## z-value: 19.045, p-value: < 2.22e-16## Wald statistic: 362.69, p-value: < 2.22e-16## ## Log likelihood: -1494.985 for lag model## ML residual variance (sigma squared): 12.992, (sigma: 3.6045)## Number of observations: 543 ## Number of parameters estimated: 5 ## AIC: 3000, (AIC for lm: 3259.1)## LM test for residual autocorrelation## test value: 21.043, p-value: 4.4919e-06
AIC(spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)
## df AIC## spatial_error_model 5 3044.697## spatial_lag_x_model 6 3147.995## spatial_lag_y_model 5 2999.971
spdep::lm.LMtests(linear_regression, queen_W, test = c("LMerr", "LMlag"))
## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = afd_share ~ immigrant_share +## inhabitants, data = election_results)## weights: queen_W## ## LMerr = 198.29, df = 1, p-value < 2.2e-16## ## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = afd_share ~ immigrant_share +## inhabitants, data = election_results)## weights: queen_W## ## LMlag = 299.73, df = 1, p-value < 2.2e-16
Let's stick to our theory, shall we?
Unfortunately, in case of a Spatial Lag Y Model the spatial parameter ρ only tells us that the effect is (statistically) significant -- or not.
Luckily, there's a method to decompose the spatial effects into direct, indirect and total effects: estimating impacts
R
This time, let's start with the Spatial Lag Y Model:
spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578
Compare it to the 'simple' regression output:
coef(spatial_lag_y_model)
## rho (Intercept) immigrant_share inhabitants ## 0.66445817 10.24658839 -0.05270212 -0.03308301
spatialreg::impacts(spatial_lag_x_model, listw = queen_W)
## Impact measures (SlX, glht):## Direct Indirect Total## immigrant_share -0.06970227 -0.02616764 -0.09586991## inhabitants -0.02643886 -0.08538884 -0.11182770
Compare it to the 'simple' regression output:
coef(spatial_lag_x_model)
## (Intercept) immigrant_share inhabitants ## 30.64915652 -0.06970227 -0.02643886 ## lag.immigrant_share lag.inhabitants ## -0.02616764 -0.08538884
spatialreg::impacts(spatial_lag_y_model, listw = queen_W, R = 500) %>% summary(zstats = TRUE, short = TRUE)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578## ========================================================## Simulation results ( variance matrix):## ========================================================## Simulated standard errors## Direct Indirect Total## immigrant_share 0.022390179 0.038273065 0.059658509## inhabitants 0.003535946 0.008409926 0.009944576## ## Simulated z-values:## Direct Indirect Total## immigrant_share -2.661032 -2.575303 -2.650849## inhabitants -10.629243 -7.405056 -10.041695## ## Simulated p-values:## Direct Indirect Total ## immigrant_share 0.0077901 0.010015 0.008029## inhabitants < 2.22e-16 1.3101e-13 < 2e-16
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
R
can serve as a full-blown Geographic Information System (GIS)Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Main Messages
Other map types such as
GIS techniques, such as
More Advanced Spatial(-temporal) Modeling
More data sources...
Check out gganimate
Some more information:
Some more information:
Some random examples:
Missing values in spatial regression models do produce similar problems as in ordinary regression analysis
However, the issue gets a bit more severe as the observations interdependent
Thus, it might be a good idea to think of methods to navigate this bias.
# ~10% missing valuesmissing_index <- sample( 1:nrow(election_results), round(nrow(election_results) * .1, 0) )election_results_missing <- election_resultselection_results_missing$afd_share[missing_index] <- NA# list-wise deletionelection_results_missing <- na.omit(election_results_missing)tm_shape(election_results_missing) + tm_fill("afd_share", palette = "viridis")
queen_neighborhoods_missing <- spdep::poly2nb(election_results_missing, queen = TRUE)queen_W_missing <- spdep::nb2listw(queen_neighborhoods_missing, style = "W", zero.policy = TRUE)spatial_lag_y_model_missing <- spatialreg::lagsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results_missing, listw = queen_W_missing, zero.policy = TRUE )
spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05450334 -0.07883944 -0.13334278## inhabitants -0.03918632 -0.05668327 -0.09586959
spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578
The way how to deal with missing data in geospatial data depends on their general geometric structure. For points, there are established methods, such as interpolation.
Often these are somewhat ways of aggregating data, which does not help in our case. I'd say that good old imputation techniques might also help:
# ~10% missing valuesmissing_index <- sample( 1:nrow(election_results), round(nrow(election_results) * .1, 0) )election_results_missing <- election_resultselection_results_missing$afd_share[missing_index] <- NAelection_results_missing <- election_results_missing %>% sf::st_drop_geometry() %>% mice::mice(method = "norm.predict", m = 1) %>% mice::complete() %>% dplyr::left_join( election_results_missing %>% dplyr::select(-afd_share, -immigrant_share, -inhabitants) ) %>% sf::st_as_sf()
## ## iter imp variable## 1 1 afd_share## 2 1 afd_share## 3 1 afd_share## 4 1 afd_share## 5 1 afd_share
queen_neighborhoods_missing <- spdep::poly2nb(election_results_missing, queen = TRUE)queen_W_missing <- spdep::nb2listw(queen_neighborhoods_missing, style = "W")spatial_lag_y_model_missing <- spatialreg::lagsarlm( afd_share ~ immigrant_share + inhabitants, data = election_results_missing, listw = queen_W_missing )
spatialreg::impacts(spatial_lag_y_model_missing, listw = queen_W_missing)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.04834610 -0.06855918 -0.1169053## inhabitants -0.04148773 -0.05883338 -0.1003211
spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
## Impact measures (lag, exact):## Direct Indirect Total## immigrant_share -0.05948993 -0.09757580 -0.15706572## inhabitants -0.03734396 -0.06125182 -0.09859578