Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
We've done some
We've seen that geospatial data are
However, geospatial data can be interesting on their own for social science studies!
[E]verything is related to everything else, but near things are more related than distant things (Tobler 1970, p. 236)
This means nearby geographical regions, institutions, or people are more similar to each other or do have a stronger influence on each other.
What we get is an interdependent system.
Tobler, W. R. (1970). A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, 46, 234–240. https://doi.org/10.2307/143141
Tobler's law is the fundamental principle of doing spatial analysis. We want to know
Say, we are interested in AfD voting outcomes in relation to ethnic compositions of neighborhoods
voting_districts <- sf::st_read("./data/Stimmbezirk.shp") |> sf::st_transform(3035) |> dplyr::transmute(Stimmbezirk = as.numeric(nummer))
## Reading layer `Stimmbezirk' from data source ## `C:\Users\mueller2\a_talks_presentations\gesis-workshop-geospatial-techniques-R-2024\data\Stimmbezirk.shp' using driver `ESRI Shapefile'## Simple feature collection with 543 features and 14 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 343914.7 ymin: 5632759 xmax: 370674.3 ymax: 5661475## Projected CRS: ETRS89 / UTM zone 32N
head(voting_districts, 2)
## Simple feature collection with 2 features and 1 field## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk geometry## 1 10205 MULTIPOLYGON (((4105588 309...## 2 10213 MULTIPOLYGON (((4106748 309...
afd_votes <- glue::glue( "https://www.stadt-koeln.de/wahlen/bundestagswahl/09-2021/praesentation/\\ Open-Data-Bundestagswahl476.csv" ) |> readr::read_csv2() |> dplyr::transmute(Stimmbezirk = `gebiet-nr`, afd_share = (F1 / F) * 100)head(afd_votes, 2)
## # A tibble: 2 × 2## Stimmbezirk afd_share## <dbl> <dbl>## 1 10101 13.1## 2 10102 11.4
election_results <- dplyr::left_join( voting_districts, afd_votes, by = "Stimmbezirk" )head(election_results, 2)
## Simple feature collection with 2 features and 2 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk afd_share geometry## 1 10205 9.74026 MULTIPOLYGON (((4105588 309...## 2 10213 11.76471 MULTIPOLYGON (((4106748 309...
tm_shape(election_results) + tm_polygons( "afd_share", palette = "viridis" )
immigrants_cologne <- z11::z11_get_100m_attribute(STAATSANGE_KURZ_2) |> terra::crop(election_results) |> terra::mask(election_results)inhabitants_cologne <- z11::z11_get_100m_attribute(Einwohner) |> terra::crop(election_results) |> terra::mask(election_results)immigrant_share_cologne <- (immigrants_cologne / inhabitants_cologne) * 100
tm_shape(immigrant_share_cologne) + tm_raster(palette = "-viridis")
As the voting (vector) data is different to the Census raster data we cannot use simple ID matching like before
terra::extract()
exactextractr::exact_extract()
!election_results <- election_results |> dplyr::mutate( immigrant_share = exactextractr::exact_extract( immigrant_share_cologne, election_results, 'mean', progress = FALSE ), inhabitants = exactextractr::exact_extract( inhabitants_cologne, election_results, 'mean', progress = FALSE ) )head(election_results, 2)
## Simple feature collection with 2 features and 4 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk afd_share geometry immigrant_share inhabitants## 1 10205 9.74026 MULTIPOLYGON (((4105588 309... 15.00740 140.1017## 2 10213 11.76471 MULTIPOLYGON (((4106748 309... 10.66915 107.6079
tm_shape(election_results) + tm_polygons( "immigrant_share", palette = "-viridis" )
We now have to ask
queens_neighborhoods <- spdep::poly2nb( election_results, queen = TRUE )summary(queens_neighborhoods)
## Neighbour list object:## Number of regions: 543 ## Number of nonzero links: 2978 ## Percentage nonzero weights: 1.010009 ## Average number of links: 5.484346 ## Link number distribution:## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ## 2 12 53 107 136 102 58 30 21 13 4 2 2 1 ## 2 least connected regions:## 69 196 with 1 link## 1 most connected region:## 249 with 14 links
queens_neighborhoods |> spdep::nb2lines( coords = sf::st_as_sfc(election_results), as_sf = TRUE ) |> tm_shape() + tm_dots() + tm_lines()
Unfortunately, we are yet done with creating the links between neighborhoods. What we receive is, in principle, a huge matrix with connected observations.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]## 1 0 0 0 0 0 0 0 0 0 0## 2 0 0 0 0 0 0 0 1 1 0## 3 0 0 0 1 1 0 0 0 1 0## 4 0 0 1 0 1 1 0 0 0 0## 5 0 0 1 1 0 1 0 0 0 0## 6 0 0 0 1 1 0 1 1 0 0## 7 0 0 0 0 0 1 0 1 1 0## 8 0 1 0 0 0 1 1 0 1 0## 9 0 1 1 0 0 0 1 1 0 0## 10 0 0 0 0 0 0 0 0 0 0
That's nothing we could plug into a statistical model, such as a regression or the like (see next session).
Normalization is the process of creating actual spatial weights. There is a huge dispute on how to do it (Neumayer & Plümper, 2016). But nobody questions whether it should be done in the first place since, among others, it restricts the parameter space of the weights.
non-normalized
## [,1] [,2] [,3] [,4] [,5]## 1 0 0 0 0 0## 2 0 0 0 0 0## 3 0 0 0 1 1## 4 0 0 1 0 1## 5 0 0 1 1 0
## [1] 0 0 2 2 2
normalized
## [,1] [,2] [,3] [,4] [,5]## 1 0 0 0.0000000 0.000 0.0000000## 2 0 0 0.0000000 0.000 0.0000000## 3 0 0 0.0000000 0.125 0.1250000## 4 0 0 0.3333333 0.000 0.3333333## 5 0 0 0.2500000 0.250 0.0000000
## [1] 0.0000000 0.0000000 0.2500000 0.6666667 0.5000000
Neumayer, E., & Plümper, T. (2016). W. Political Science Research and Methods, 4(01), 175–193. https://doi.org/10.1017/psrm.2014.40
One of the disputed but at the same time standard procedures is row-normalization. It divides all individual weights (=connections between spatial units) wij by the row-wise sum of of all other weights:
W=∑jwij∑jwij
An alternative would be minmax-normalization:
W=∑jwij−min(wij)max(wij)−min(wij)
queens_W <- spdep::nb2listw(queens_neighborhoods, style = "W")summary(queens_W)
## Characteristics of weights list object:## Neighbour list object:## Number of regions: 543 ## Number of nonzero links: 2978 ## Percentage nonzero weights: 1.010009 ## Average number of links: 5.484346 ## Link number distribution:## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ## 2 12 53 107 136 102 58 30 21 13 4 2 2 1 ## 2 least connected regions:## 69 196 with 1 link## 1 most connected region:## 249 with 14 links## ## Weights style: W ## Weights constants summary:## n nn S0 S1 S2## W 543 294849 543 211.1469 2261.598
I=N∑Ni=1∑Nj=1wij∑Ni=1∑Nj=1wij(xi−ˉx)(xj−ˉx)∑Ni=1(xi−ˉx)2
Most and foremost, Moran's I makes use of the previously created weights between all spatial units pairs wij. It weights deviations from an overall mean value of connected pairs according to the strength of the modeled spatial relations. Moran's I can be interpreted as some sort of a correlation coefficient (-1 = perfect negative spatial autocorrelation; +1 = perfect positive spatial autocorrelation).
spdep
spdep::moran.test( election_results$immigrant_share, listw = queens_W)
## ## Moran I test under randomisation## ## data: election_results$immigrant_share ## weights: queens_W ## ## Moran I statistic standard deviate = 20.428, p-value < 2.2e-16## alternative hypothesis: greater## sample estimates:## Moran I statistic Expectation Variance ## 0.5408375504 -0.0018450185 0.0007057415
Moran's I is a global statistic for spatial autocorrelation. It can produce issues when there are only local clusters of spatial interdependence in the data. An alternative is the use of Geary's C
:
C=(N−1)∑i∑jwij(xi−xj)22∑Ni=1∑Nj=1wij∑i(xi−ˉx)2
As you can see, in the numerator, the average value ˉx is not as prominent as in Moran's I. Geary's C only produces values between 0 and 2 (value near 0 = positive spatial autocorrelation; 1 = no spatial autocorrelation; values near 2 = negative spatial autocorrelation).
spdep
spdep::geary.test( election_results$immigrant_share, listw = queens_W)
## ## Geary C test under randomisation## ## data: election_results$immigrant_share ## weights: queens_W ## ## Geary C statistic standard deviate = 17.355, p-value < 2.2e-16## alternative hypothesis: Expectation greater than statistic## sample estimates:## Geary C statistic Expectation Variance ## 0.442840220 1.000000000 0.001030604
sfdep
packageThe sfdep
package provides a more tidyverse
-compliant syntax to spatial weights. See:
election_results <- election_results |> dplyr::mutate( neighbors = sfdep::st_contiguity(election_results), # queen neighborhoods by default weights = sfdep::st_weights(neighbors) )head(election_results, 2)
## Simple feature collection with 2 features and 6 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk afd_share geometry immigrant_share inhabitants neighbors## 1 10205 9.74026 MULTIPOLYGON (((4105588 309... 15.00740 140.1017 16, 18, 22, 25, 458, 459, 460, 462## 2 10213 11.76471 MULTIPOLYGON (((4106748 309... 10.66915 107.6079 8, 9, 11, 48, 49## weights## 1 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125## 2 0.2, 0.2, 0.2, 0.2, 0.2
library(magrittr)election_results %$% sfdep::global_moran_test(immigrant_share, neighbors, weights)
## ## Moran I test under randomisation## ## data: x ## weights: listw ## ## Moran I statistic standard deviate = 20.428, p-value < 2.2e-16## alternative hypothesis: greater## sample estimates:## Moran I statistic Expectation Variance ## 0.5408375504 -0.0018450185 0.0007057415
election_results %$% sfdep::global_c_test(immigrant_share, neighbors, weights)
## ## Geary C test under randomisation## ## data: x ## weights: listw ## ## Geary C statistic standard deviate = 17.355, p-value < 2.2e-16## alternative hypothesis: Expectation greater than statistic## sample estimates:## Geary C statistic Expectation Variance ## 0.442840220 1.000000000 0.001030604
The reason why we show you the sfdep
package is that it provides nice functions to calculate local measures of spatial autocorrelation. One popular choice are the estimation of Local Indicators of Spatial Autocorrelation (i.e., LISA clusters). In the most straightforward way they can be interpreted as case-specific indicators of spatial autocorrelation:
Ii=xi−ˉx∑Ni−1(xi−ˉx)2NN∑j=1wij(xj−ˉx)
sfdep
lisa <- election_results |> dplyr::mutate( lisa = sfdep::local_moran(afd_share, neighbors, weights) ) |> tidyr::unnest()head(lisa, 2)
## Simple feature collection with 2 features and 18 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3095603 xmax: 4105588 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## # A tibble: 2 × 19## Stimmbezirk afd_share geometry immigrant_share inhabitants neighbors weights ii eii var_ii z_ii p_ii## <dbl> <dbl> <MULTIPOLYGON [m]> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 10205 9.74 (((4105588 3096207, 4105571 3095… 15.0 140. 16 0.125 1.13 -0.00889 0.204 2.52 0.0116## 2 10205 9.74 (((4105588 3096207, 4105571 3095… 15.0 140. 18 0.125 1.13 -0.00889 0.204 2.52 0.0116## # ℹ 7 more variables: p_ii_sim <dbl>, p_folded_sim <dbl>, skewness <dbl>, kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>
tm_shape(lisa) + tm_fill("afd_share", midpoint = NA, palette = "viridis")
tm_shape(lisa) + tm_fill("ii", midpoint = NA, palette = "viridis")
lisa_bivariate <- election_results |> dplyr::mutate( lisa = sfdep::local_moran_bv( afd_share, immigrant_share, neighbors, weights ) ) |> tidyr::unnest()tm_shape(lisa_bivariate) + tm_fill( "Ib", midpoint = NA, palette = "viridis" )
You now know how to
There's way more, particularly when it comes to
Now we know our data are spatially autocorrelated. Let's try to find out why this is the case via some spatial econometrics
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Day | Time | Title |
---|---|---|
April 23 | 10:00-11:30 | Introduction to GIS |
April 23 | 11:45-13:00 | Vector Data |
April 23 | 13:00-14:00 | Lunch Break |
April 23 | 14:00-15:30 | Mapping |
April 23 | 15:45-17:00 | Raster Data |
April 24 | 09:00-10:30 | Advanced Data Import & Processing |
April 24 | 10:45-12:00 | Applied Data Wrangling & Linking |
April 24 | 12:00-13:00 | Lunch Break |
April 24 | 13:00-14:30 | Investigating Spatial Autocorrelation |
April 24 | 14:45-16:00 | Spatial Econometrics & Outlook |
We've done some
We've seen that geospatial data are
However, geospatial data can be interesting on their own for social science studies!
[E]verything is related to everything else, but near things are more related than distant things (Tobler 1970, p. 236)
This means nearby geographical regions, institutions, or people are more similar to each other or do have a stronger influence on each other.
What we get is an interdependent system.
Tobler, W. R. (1970). A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, 46, 234–240. https://doi.org/10.2307/143141
Tobler's law is the fundamental principle of doing spatial analysis. We want to know
Say, we are interested in AfD voting outcomes in relation to ethnic compositions of neighborhoods
voting_districts <- sf::st_read("./data/Stimmbezirk.shp") |> sf::st_transform(3035) |> dplyr::transmute(Stimmbezirk = as.numeric(nummer))
## Reading layer `Stimmbezirk' from data source ## `C:\Users\mueller2\a_talks_presentations\gesis-workshop-geospatial-techniques-R-2024\data\Stimmbezirk.shp' using driver `ESRI Shapefile'## Simple feature collection with 543 features and 14 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 343914.7 ymin: 5632759 xmax: 370674.3 ymax: 5661475## Projected CRS: ETRS89 / UTM zone 32N
head(voting_districts, 2)
## Simple feature collection with 2 features and 1 field## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk geometry## 1 10205 MULTIPOLYGON (((4105588 309...## 2 10213 MULTIPOLYGON (((4106748 309...
afd_votes <- glue::glue( "https://www.stadt-koeln.de/wahlen/bundestagswahl/09-2021/praesentation/\\ Open-Data-Bundestagswahl476.csv" ) |> readr::read_csv2() |> dplyr::transmute(Stimmbezirk = `gebiet-nr`, afd_share = (F1 / F) * 100)head(afd_votes, 2)
## # A tibble: 2 × 2## Stimmbezirk afd_share## <dbl> <dbl>## 1 10101 13.1## 2 10102 11.4
election_results <- dplyr::left_join( voting_districts, afd_votes, by = "Stimmbezirk" )head(election_results, 2)
## Simple feature collection with 2 features and 2 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk afd_share geometry## 1 10205 9.74026 MULTIPOLYGON (((4105588 309...## 2 10213 11.76471 MULTIPOLYGON (((4106748 309...
tm_shape(election_results) + tm_polygons( "afd_share", palette = "viridis" )
immigrants_cologne <- z11::z11_get_100m_attribute(STAATSANGE_KURZ_2) |> terra::crop(election_results) |> terra::mask(election_results)inhabitants_cologne <- z11::z11_get_100m_attribute(Einwohner) |> terra::crop(election_results) |> terra::mask(election_results)immigrant_share_cologne <- (immigrants_cologne / inhabitants_cologne) * 100
tm_shape(immigrant_share_cologne) + tm_raster(palette = "-viridis")
As the voting (vector) data is different to the Census raster data we cannot use simple ID matching like before
terra::extract()
exactextractr::exact_extract()
!election_results <- election_results |> dplyr::mutate( immigrant_share = exactextractr::exact_extract( immigrant_share_cologne, election_results, 'mean', progress = FALSE ), inhabitants = exactextractr::exact_extract( inhabitants_cologne, election_results, 'mean', progress = FALSE ) )head(election_results, 2)
## Simple feature collection with 2 features and 4 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk afd_share geometry immigrant_share inhabitants## 1 10205 9.74026 MULTIPOLYGON (((4105588 309... 15.00740 140.1017## 2 10213 11.76471 MULTIPOLYGON (((4106748 309... 10.66915 107.6079
tm_shape(election_results) + tm_polygons( "immigrant_share", palette = "-viridis" )
We now have to ask
queens_neighborhoods <- spdep::poly2nb( election_results, queen = TRUE )summary(queens_neighborhoods)
## Neighbour list object:## Number of regions: 543 ## Number of nonzero links: 2978 ## Percentage nonzero weights: 1.010009 ## Average number of links: 5.484346 ## Link number distribution:## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ## 2 12 53 107 136 102 58 30 21 13 4 2 2 1 ## 2 least connected regions:## 69 196 with 1 link## 1 most connected region:## 249 with 14 links
queens_neighborhoods |> spdep::nb2lines( coords = sf::st_as_sfc(election_results), as_sf = TRUE ) |> tm_shape() + tm_dots() + tm_lines()
Unfortunately, we are yet done with creating the links between neighborhoods. What we receive is, in principle, a huge matrix with connected observations.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]## 1 0 0 0 0 0 0 0 0 0 0## 2 0 0 0 0 0 0 0 1 1 0## 3 0 0 0 1 1 0 0 0 1 0## 4 0 0 1 0 1 1 0 0 0 0## 5 0 0 1 1 0 1 0 0 0 0## 6 0 0 0 1 1 0 1 1 0 0## 7 0 0 0 0 0 1 0 1 1 0## 8 0 1 0 0 0 1 1 0 1 0## 9 0 1 1 0 0 0 1 1 0 0## 10 0 0 0 0 0 0 0 0 0 0
That's nothing we could plug into a statistical model, such as a regression or the like (see next session).
Normalization is the process of creating actual spatial weights. There is a huge dispute on how to do it (Neumayer & Plümper, 2016). But nobody questions whether it should be done in the first place since, among others, it restricts the parameter space of the weights.
non-normalized
## [,1] [,2] [,3] [,4] [,5]## 1 0 0 0 0 0## 2 0 0 0 0 0## 3 0 0 0 1 1## 4 0 0 1 0 1## 5 0 0 1 1 0
## [1] 0 0 2 2 2
normalized
## [,1] [,2] [,3] [,4] [,5]## 1 0 0 0.0000000 0.000 0.0000000## 2 0 0 0.0000000 0.000 0.0000000## 3 0 0 0.0000000 0.125 0.1250000## 4 0 0 0.3333333 0.000 0.3333333## 5 0 0 0.2500000 0.250 0.0000000
## [1] 0.0000000 0.0000000 0.2500000 0.6666667 0.5000000
Neumayer, E., & Plümper, T. (2016). W. Political Science Research and Methods, 4(01), 175–193. https://doi.org/10.1017/psrm.2014.40
One of the disputed but at the same time standard procedures is row-normalization. It divides all individual weights (=connections between spatial units) wij by the row-wise sum of of all other weights:
W=∑jwij∑jwij
An alternative would be minmax-normalization:
W=∑jwij−min(wij)max(wij)−min(wij)
queens_W <- spdep::nb2listw(queens_neighborhoods, style = "W")summary(queens_W)
## Characteristics of weights list object:## Neighbour list object:## Number of regions: 543 ## Number of nonzero links: 2978 ## Percentage nonzero weights: 1.010009 ## Average number of links: 5.484346 ## Link number distribution:## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ## 2 12 53 107 136 102 58 30 21 13 4 2 2 1 ## 2 least connected regions:## 69 196 with 1 link## 1 most connected region:## 249 with 14 links## ## Weights style: W ## Weights constants summary:## n nn S0 S1 S2## W 543 294849 543 211.1469 2261.598
I=N∑Ni=1∑Nj=1wij∑Ni=1∑Nj=1wij(xi−ˉx)(xj−ˉx)∑Ni=1(xi−ˉx)2
Most and foremost, Moran's I makes use of the previously created weights between all spatial units pairs wij. It weights deviations from an overall mean value of connected pairs according to the strength of the modeled spatial relations. Moran's I can be interpreted as some sort of a correlation coefficient (-1 = perfect negative spatial autocorrelation; +1 = perfect positive spatial autocorrelation).
spdep
spdep::moran.test( election_results$immigrant_share, listw = queens_W)
## ## Moran I test under randomisation## ## data: election_results$immigrant_share ## weights: queens_W ## ## Moran I statistic standard deviate = 20.428, p-value < 2.2e-16## alternative hypothesis: greater## sample estimates:## Moran I statistic Expectation Variance ## 0.5408375504 -0.0018450185 0.0007057415
Moran's I is a global statistic for spatial autocorrelation. It can produce issues when there are only local clusters of spatial interdependence in the data. An alternative is the use of Geary's C
:
C=(N−1)∑i∑jwij(xi−xj)22∑Ni=1∑Nj=1wij∑i(xi−ˉx)2
As you can see, in the numerator, the average value ˉx is not as prominent as in Moran's I. Geary's C only produces values between 0 and 2 (value near 0 = positive spatial autocorrelation; 1 = no spatial autocorrelation; values near 2 = negative spatial autocorrelation).
spdep
spdep::geary.test( election_results$immigrant_share, listw = queens_W)
## ## Geary C test under randomisation## ## data: election_results$immigrant_share ## weights: queens_W ## ## Geary C statistic standard deviate = 17.355, p-value < 2.2e-16## alternative hypothesis: Expectation greater than statistic## sample estimates:## Geary C statistic Expectation Variance ## 0.442840220 1.000000000 0.001030604
sfdep
packageThe sfdep
package provides a more tidyverse
-compliant syntax to spatial weights. See:
election_results <- election_results |> dplyr::mutate( neighbors = sfdep::st_contiguity(election_results), # queen neighborhoods by default weights = sfdep::st_weights(neighbors) )head(election_results, 2)
## Simple feature collection with 2 features and 6 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3094313 xmax: 4107089 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## Stimmbezirk afd_share geometry immigrant_share inhabitants neighbors## 1 10205 9.74026 MULTIPOLYGON (((4105588 309... 15.00740 140.1017 16, 18, 22, 25, 458, 459, 460, 462## 2 10213 11.76471 MULTIPOLYGON (((4106748 309... 10.66915 107.6079 8, 9, 11, 48, 49## weights## 1 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125## 2 0.2, 0.2, 0.2, 0.2, 0.2
library(magrittr)election_results %$% sfdep::global_moran_test(immigrant_share, neighbors, weights)
## ## Moran I test under randomisation## ## data: x ## weights: listw ## ## Moran I statistic standard deviate = 20.428, p-value < 2.2e-16## alternative hypothesis: greater## sample estimates:## Moran I statistic Expectation Variance ## 0.5408375504 -0.0018450185 0.0007057415
election_results %$% sfdep::global_c_test(immigrant_share, neighbors, weights)
## ## Geary C test under randomisation## ## data: x ## weights: listw ## ## Geary C statistic standard deviate = 17.355, p-value < 2.2e-16## alternative hypothesis: Expectation greater than statistic## sample estimates:## Geary C statistic Expectation Variance ## 0.442840220 1.000000000 0.001030604
The reason why we show you the sfdep
package is that it provides nice functions to calculate local measures of spatial autocorrelation. One popular choice are the estimation of Local Indicators of Spatial Autocorrelation (i.e., LISA clusters). In the most straightforward way they can be interpreted as case-specific indicators of spatial autocorrelation:
Ii=xi−ˉx∑Ni−1(xi−ˉx)2NN∑j=1wij(xj−ˉx)
sfdep
lisa <- election_results |> dplyr::mutate( lisa = sfdep::local_moran(afd_share, neighbors, weights) ) |> tidyr::unnest()head(lisa, 2)
## Simple feature collection with 2 features and 18 fields## Geometry type: MULTIPOLYGON## Dimension: XY## Bounding box: xmin: 4104887 ymin: 3095603 xmax: 4105588 ymax: 3096356## Projected CRS: ETRS89-extended / LAEA Europe## # A tibble: 2 × 19## Stimmbezirk afd_share geometry immigrant_share inhabitants neighbors weights ii eii var_ii z_ii p_ii## <dbl> <dbl> <MULTIPOLYGON [m]> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 10205 9.74 (((4105588 3096207, 4105571 3095… 15.0 140. 16 0.125 1.13 -0.00889 0.204 2.52 0.0116## 2 10205 9.74 (((4105588 3096207, 4105571 3095… 15.0 140. 18 0.125 1.13 -0.00889 0.204 2.52 0.0116## # ℹ 7 more variables: p_ii_sim <dbl>, p_folded_sim <dbl>, skewness <dbl>, kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>
tm_shape(lisa) + tm_fill("afd_share", midpoint = NA, palette = "viridis")
tm_shape(lisa) + tm_fill("ii", midpoint = NA, palette = "viridis")
lisa_bivariate <- election_results |> dplyr::mutate( lisa = sfdep::local_moran_bv( afd_share, immigrant_share, neighbors, weights ) ) |> tidyr::unnest()tm_shape(lisa_bivariate) + tm_fill( "Ib", midpoint = NA, palette = "viridis" )
You now know how to
There's way more, particularly when it comes to
Now we know our data are spatially autocorrelated. Let's try to find out why this is the case via some spatial econometrics