Exercise 2_2_1: Spatial Joins

1

In the addon of the session, we counted the charging stations located within North-Rhine Westphalia (NRW). Still, we did not show how to get a point layer of NRW charging stations (“charger_nrw”).

Subset the data file yourself by relying on the spatial information of the file charging_points_ger.csv and a polygon of NRW. There are two ways to achieve this. How many chargers are located within NRW?

Clues

You need two datasets for that: the point layer charging_points_ger.csv (remember to adjust the crs) in the ./data folder and polygons of NRW. For the latter, you can again use the osmdata syntax.

Clues

There are two functions you can explore: sf::st_join and sf::st_intersection(). The default of sf::st_join() will leave you with a ‘left-join’ and return a data object with all chargers and matching district information for those that are located within NRW. You can reset the option to perform an ‘inner-join’ and keep only the observation that lay within the predefined area (sf::st_join(x , y, join = "", left = FALSE)).

solution

# load charger
charger_ger <- 
  # Read charging station points datae
  readr::read_delim("./data/charging_points_ger.csv", 
                                 delim = ";") %>%
  # Filter out rows with missing longitude or latitude
  dplyr::filter(!is.na(longitude) & !is.na(latitude)) %>%
  # Convert data frame to sf object
  sf::st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
  # Reproject the spatial data to the desired CRS (Coordinate Reference System)
  sf::st_transform(crs = 3035)

## Rows: 60560 Columns: 7
## ── Column specification ───────────────────
## Delimiter: ";"
## chr (3): operator, federal_state, type
## dbl (4): latitude, longitude, power_kw,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#  use the OSM function
nrw <-
  osmdata::getbb(
    "Nordrhein-Westfalen", 
    format_out = "sf_polygon"
  ) %>% 
  .$multipolygon %>% 
  sf::st_transform(3035)

# option 1
charger_nrw <- 
  charger_ger %>% 
  # Subset point data to sampling area
  sf::st_intersection(nrw)

# option 2
# spatial join
charger_nrw <-
  charger_ger %>%  
  sf::st_join(
    # point layer nrw
    nrw, 
    # chose intersect or within
    join = sf::st_intersects,
    # option FALSE will 
    # keep only the chargers 
    # which could be joined
    left = FALSE
  )
 
nrow(charger_nrw)

## [1] 11081

# 11081 chargers in NRW

2

Did the operationalization of train station accessibility convince you? The INKAR database offers another approach: Proportion of residents with max. 1000m linear distance to the nearest public transport stop in the district. We have everything needed to create this indicator on a smaller scale as well. What is the mean share of residents with max 1000m linear distance to the nearest train station in a 5km neighborhood of our fake respondents?

You can run the code below to load all the data you need.

nrw <-
  osmdata::getbb(
    "Nordrhein-Westfalen", 
    format_out = "sf_polygon"
  ) %>% 
  .$multipolygon %>% 
  sf::st_transform(3035) 

set.seed(1234)

fake_coordinates <-
  sf::st_sample(nrw, 1000) %>% 
  sf::st_sf() %>% 
  dplyr::mutate(
    id_2 = 
      stringi::stri_rand_strings(10000, 10) %>% 
      sample(1000, replace = FALSE)
  )

nrw_pt_trainstops <- sf::st_read("./data/nrw_pt_osmtrainstops.shp", crs = 3035)

inhabitants_ger <-
  z11::z11_get_100m_attribute(Einwohner)

Clues

As always, there are several ways to do this. Anne tried to keep the workflow as close to the functions taught in this course as possible and suggests the following steps:

Create a point layer with the centroids of all grids in NRW based on the z11 population layer.
Calculate the distance to the next train station for each grid.
Create a column that equals 0 if the distance to the next train station is >1000m and contains the number of inhabitants if <1000.
Rasterize the sf data object to receive two raster objects: number of inhabitants and number of inhabitants with a max 1000m distance to the next train station.
Calculate the mean for the 5km buffer for each “respondent” for each raster.
Calculate the share.

Clues

In the add-on slides of the raster session, Stefan introduced and gave some information on how to transform the raster to points (and back). To get a sf point layer for the raster object you can use terra::as.points() %>% sf::st_as_sf(). To rasterize the object, you can use terra::rast(vals = .$colname, resolution = 100).

solution

# Extract inhabitants points for North Rhine-Westphalia (NRW)
nrw_pt_inhabitants <- 
  inhabitants_ger %>% 
  terra::crop(., nrw) %>% 
  # convert to points
  terra::as.points() %>%   
  # convert to sf object
  sf::st_as_sf() 

# Find the nearest train station for each grid centroid
nearest_station <- 
  sf::st_nearest_feature(nrw_pt_inhabitants, nrw_pt_trainstops)

# Calculate distances from inhabitants to the nearest train station
distances <- sf::st_distance(nrw_pt_inhabitants, 
                             nrw_pt_trainstops[nearest_station,], 
                             by_element = TRUE) 

# Create a raster representing population with train access within 1000m for NRW 
nrw_rast_inhabitants_access <- 
  nrw_pt_inhabitants %>% 
  dplyr::mutate(train_access = ifelse(as.numeric(distances) <= 1000, lyr.1, 0)) %>% 
  terra::rast(vals = .$train_access, resolution = 100) 

# Create a raster representing population  for NRW
# NOTE: You could also use the 'old' raster. However, by converting the
# data the extent of the raster layer might not fit anymore and needs adjustments.
nrw_rast_inhabitants <- 
  nrw_pt_inhabitants %>% 
  terra::rast(vals = .$lyr.1, resolution = 100) 

# Extract population within 5km buffers around each respondent
population_buffers <- 
  terra::extract(
    nrw_rast_inhabitants, 
    fake_coordinates %>% 
      sf::st_buffer(5000) %>% 
      terra::vect(), 
    fun = mean,
    na.rm = TRUE
  )

# Extract population within 5km buffers considering train access within 1000m
population_access_buffers <- terra::extract(
  nrw_rast_inhabitants_access, 
  fake_coordinates %>% 
    sf::st_buffer(5000) %>% 
    terra::vect(), 
  fun = mean,
  na.rm = TRUE
)

# Combine population data with train access information
linked_df <- 
  fake_coordinates %>% 
  dplyr::mutate(
    population = population_buffers[[2]],
    population_trainaccess = population_access_buffers[[2]],
    share_access = population_access_buffers[[2]] / population_buffers[[2]]
)

# Summary of the new data
summary(linked_df)

##           geometry        id_2          
##  POINT        :1000   Length:1000       
##  epsg:3035    :   0   Class :character  
##  +proj=laea...:   0   Mode  :character  
##                                         
##                                         
##                                         
##    population    population_trainaccess
##  Min.   :18.71   Min.   : 3.409        
##  1st Qu.:22.20   1st Qu.: 5.301        
##  Median :28.78   Median : 8.328        
##  Mean   :28.42   Mean   : 8.204        
##  3rd Qu.:33.67   3rd Qu.:10.704        
##  Max.   :41.75   Max.   :16.084        
##   share_access   
##  Min.   :0.1698  
##  1st Qu.:0.2404  
##  Median :0.2854  
##  Mean   :0.2795  
##  3rd Qu.:0.3174  
##  Max.   :0.3879

Exercise 2_2_1: Spatial Joins

Stefan Jünger & Anne-Kathrin Stroppe

Introduction to Geospatial Techniques for Social Scientists in R

1

Clues

Clues

solution

2

Clues

Clues

solution