Introduction to Geospatial Techniques for Social Scientists in R

.title[
# Introduction to Geospatial Techniques for Social Scientists in R
]
.subtitle[
## Applied Data Wrangling
]
.author[
### Stefan Jünger & Anne-Kathrin Stroppe
]
.institute[
### <p>GESIS Workshop</p>
]
.date[
### April 24, 2024
]

---

---

## Now

<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:left;"> Time </th>
   <th style="text-align:left;"> Title </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;color: gray !important;"> April 23 </td>
   <td style="text-align:left;color: gray !important;"> 10:00-11:30 </td>
   <td style="text-align:left;font-weight: bold;"> Introduction to GIS </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;"> April 23 </td>
   <td style="text-align:left;color: gray !important;"> 11:45-13:00 </td>
   <td style="text-align:left;font-weight: bold;"> Vector Data </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;color: gray !important;"> April 23 </td>
   <td style="text-align:left;color: gray !important;color: gray !important;"> 13:00-14:00 </td>
   <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;"> April 23 </td>
   <td style="text-align:left;color: gray !important;"> 14:00-15:30 </td>
   <td style="text-align:left;font-weight: bold;"> Mapping </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> April 23 </td>
   <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> 15:45-17:00 </td>
   <td style="text-align:left;font-weight: bold;border-bottom: 1px solid"> Raster Data </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;"> April 24 </td>
   <td style="text-align:left;color: gray !important;"> 09:00-10:30 </td>
   <td style="text-align:left;font-weight: bold;"> Advanced Data Import &amp; Processing </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;background-color: yellow !important;"> April 24 </td>
   <td style="text-align:left;color: gray !important;background-color: yellow !important;"> 10:45-12:00 </td>
   <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Applied Data Wrangling &amp; Linking </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;color: gray !important;"> April 24 </td>
   <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00-13:00 </td>
   <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;"> April 24 </td>
   <td style="text-align:left;color: gray !important;"> 13:00-14:30 </td>
   <td style="text-align:left;font-weight: bold;"> Investigating Spatial Autocorrelation </td>
  </tr>
  <tr>
   <td style="text-align:left;color: gray !important;"> April 24 </td>
   <td style="text-align:left;color: gray !important;"> 14:45-16:00 </td>
   <td style="text-align:left;font-weight: bold;"> Spatial Econometrics &amp; Outlook </td>
  </tr>
</tbody>
</table>

---

## What Are Georeferenced Data?

.pull-left[
</br>
Data with a direct spatial reference `$\rightarrow$` **geo-coordinates**
- Information about geometries
- Optional: Content in relation to the geometries
]

.pull-right[
<img src="data:image/png;base64,#../img/fig_geometries.png" width="85%" style="display: block; margin: auto;" />

.tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019]
]

---

## Georeferenced Survey Data

Survey data enriched with geo-coordinates (or other direct spatial references)

</br>

</br>

.center[**With georeferenced survey data, we can analyze interactions between individual behaviors and attitudes and the environment.**]

---

## An Example Workflow

.pull-left[
From the addresses to analyses with georeferenced survey data, several steps and challenges along the way. We will talk about:

- Data Protection & Data Access
- Geocoding 
- Spatial Data Linking
- Applied Examples
]

.pull-right[
<img src="data:image/png;base64,#../img/varreport.png" width="75%" style="display: block; margin: auto;" />
]

---

## Data Protection

</br>
</br>
That‘s one of the biggest issues
- Explicit spatial references increase the risk of re-identifying anonymized survey respondents
- Can occur during the processing of data but also during the analysis

</br>

---

## Data Availability

.pull-left[
Geospatial Data
- Often de-centralized distributed 
- Fragmented data landscape, at least in Germany

Georeferenced Survey Data
- Primarily, survey data
- Depends on documentation
- Access difficult due to data protection restrictions
]

.pull-right[
<img src="data:image/png;base64,#../img/data_availability.png" width="75%" style="display: block; margin: auto;" />
.right[.tinyisher[
https://www.eea.europa.eu/data-and-maps
https://datasearch.gesis.org/
https://datasetsearch.research.google.com/
]]
]

---

## Distribution & Re-Identification Risk

Even without (in)direct spatial references, data may still be sensitive
- Geospatial attributes add new information to existing data
- Maybe part of general data privacy checks, but we may not distribute these data as is

.pull-right[
<img src="data:image/png;base64,#../img/safe_room.png" width="825" style="display: block; margin: auto;" />
.right[.tinyisher[https://www.gesis.org/en/services/processing-and-analyzing-data/guest-research-stays/secure-data-center-sdc]]
]

---

## Legal Regulations in Data Processing

.pull-left[
Storing personal information such as addresses in the same place as actual survey attributes is not allowed in Germany
- Projects keep them in separate locations
- Can only be matched with a correspondence table
- Necessary to conduct data linking
]

.pull-right[
<img src="data:image/png;base64,#../img/fig_linking_workflow_simple.png" width="949" style="display: block; margin: auto;" />

---

## Geocoding

Geocoding is the conversion of indirect spatial references (e.g., addresses) into direct spatial references (e.g., coordinates)

However, conducting this procedure is a bit tricky (not only in R). Many services are either

- expensive (at least they cost money or have other restrictions)
- probably not data protection friendly (Hey Google)
- or both

---

## OSM Is Your Friend

We can use the Nominatim API from OSM for geocoding of at least a couple of addresses

```r
library(tibble)
library(tidygeocoder)

leibniz_addresses <-
  tibble::tribble(
    ~street, ~housenumber, ~zip_code, ~place, ~institute,
    "B 2", "1", "68159", "Mannheim", "GESIS",
    "Unter Sachsenhausen", "6-8",  "50667", "Köln", "GESIS",
    "Kellnerweg", "4", "37077", "Göttingen", "DPZ",
    "Reichsstr.", "4-6", "04109",  "Leipzig", "GWZO",
    "Schöneckstraße", "6", "79104", "Freiburg", "KIS",
    "Albert-Einstein-Straße", "29a", "18059", "Rostock", "LIKAT",
    "L7", "1", "68161", "Mannheim", "ZEW",
    "Müggelseedamm", "310", "12587", "Berlin", "IGB",
    "Campus D2", "2", "66123", "Saarbrücken", "INM",
    "Eberswalder Straße", "84", "15374", "Müncheberg (Mark)", "ZALF"
  ) |> 
  dplyr::mutate(whole_address = paste(street, housenumber, zip_code, place))
```

---

## Run the Geocoding

```r
leibniz_addresses <-
  tidygeocoder::geocode(
    leibniz_addresses,
    address = whole_address
  )

leibniz_addresses
```

```
## # A tibble: 10 × 8
##    street         housenumber zip_code place institute
##    <chr>          <chr>       <chr>    <chr> <chr>    
##  1 B 2            1           68159    Mann… GESIS    
##  2 Unter Sachsen… 6-8         50667    Köln  GESIS    
##  3 Kellnerweg     4           37077    Gött… DPZ      
##  4 Reichsstr.     4-6         04109    Leip… GWZO     
##  5 Schöneckstraße 6           79104    Frei… KIS      
##  6 Albert-Einste… 29a         18059    Rost… LIKAT    
##  7 L7             1           68161    Mann… ZEW      
##  8 Müggelseedamm  310         12587    Berl… IGB      
##  9 Campus D2      2           66123    Saar… INM      
## 10 Eberswalder S… 84          15374    Münc… ZALF     
## # ℹ 3 more variables: whole_address <chr>, lat <dbl>,
## #   long <dbl>
```

---

## Convert To `sf` Object And Plot

```r
leibniz_addresses_sf <-
  leibniz_addresses |> 
  dplyr::filter(!is.na(lat)) |> 
  sf::st_as_sf(coords = c("long", "lat"), crs = 4326)

tmaptools::read_osm(leibniz_addresses_sf, type = "esri-topo") |> 
  terra::rast() |> 
  tm_shape() +
  tm_rgb() +
  tm_shape(leibniz_addresses_sf) +
  tm_dots(size = 2, col = "red")
```
]

.pull-right[
<img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/conversion-and-plot-exec-1.png" style="display: block; margin: auto;" />
]

---

## Our Approach

We rely on a service offered by the Federal Agency of Cartography and Geodesy (BKG):

- Online interface and API for online geocoding
- Offline geocoding possible based on raw data
- But: Data and service are restricted

---

## `bkggeocoder`

R package `bkggeocoder` developed at GESIS for (offline) geocoding by Stefan and Jonas Lieth:

- Access via [Github](https://github.com/StefanJuenger/bkggeocoder)
- Introduction in the [Meet the Experts Talk](https://www.youtube.com/watch?v=ZnA21LyKK88&feature=youtu.be) by Stefan

]

.pull-right[
</br>
</br>
<img src="data:image/png;base64,#../img/bkggeocoder.png" width="65%" style="display: block; margin: auto;" />
]

---

## Spatial Linking

.pull-left[
The geocoding tool automatically retrieves point coordinates, administrative unit keys, and grid cell IDs.
Spatial joins based on coordinates for other units:

- constituencies
- administrative units across time (e.g., harmonized territorial status)

]

.tinyisher[Sources:  OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), Leibniz Institute of Ecological Urban and Regional Development (2018), Statistical Offices of the Federation and the Länder (2016), and German Environmental Agency / EIONET Central Data Repository (2016) / Jünger, 2019]
]

---

## Data Linking

Linking via ids most commonly used but comes with its own challenges (e.g., territorial status and land reforms? comparable units? heterogeneity within units?).

---

## Spatial Linking Methods (Examples) I

1:1
.tinyisher[sf::st_join]

<img src="data:image/png;base64,#../img/fig_linking_by_location_noise.png" width="75%" style="display: block; margin: auto;" />
]

Distances
.tinyisher[sf::st_distance]

<img src="data:image/png;base64,#../img/fig_linking_distance_noise_appI.png" width="75%" style="display: block; margin: auto;" />
]

.tinyisher[Sources: German Environmental Agency / EIONET Central Data Repository (2016) and OpenStreetMap / GEOFABRIK (2018) / Jünger, 2019]

---

## Spatial Linking Methods (Examples) II

Filter methods
.tinyisher[sf::st_filter or terra::vect(. , filter = )]

]

Buffer zones
.tinyisher[sf::st_buffer (combined with terra::vect())]

<img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="75%" style="display: block; margin: auto;" />
]

.tinyisher[Sources: Leibniz Institute of Ecological Urban and Regional Development (2018) and Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019]

---

## Cheatsheet: Spatial Operations

An overview of spatial operations using the `sf` package can be accessed [here](https://ugoproto.github.io/ugo_r_doc/pdf/sf.pdf).

---

## Data Aggregation

If you want to aggregate attributes and geometries of a shapefile, you can rely on `st_combine(x)` , `st_union(x,y)` and `st_intersection(x,y)` to combine shapefiles, resolve borders and return the intersection of two shapefiles.

For raster data, you can aggregate with the function `terra::aggregate()`(if you have matching raster files) in combination with `terra::resample()` (if your raster files don't match).

To deal with spatial misalignment:
- [`smile` package](https://lcgodoy.me/smile/)
- [`areal` package](https://chris-prener.github.io/areal/)

---

## Data Aggregation

```r
german_districts <-
  sf::read_sf("./data/VG250_KRS.shp") %>% 
  sf::st_transform(3035) %>% 
  dplyr::mutate(federal_state =
                  as.numeric(stringr::str_sub(AGS,1,2)))

german_states <-
  german_districts %>% 
  dplyr::group_by(federal_state) %>% 
  dplyr::summarize(geometry = 
                     st_union(geometry))

tm_shape(german_states) + 
  tm_borders()
```
]

.pull-right[
<img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/aggregate-data-map-1.png" style="display: block; margin: auto;" />
]

---

## Fake Research Question

.pull-left[
Say we're interested in the impact of neighbourhood characteristics (e.g. mobility infrastructure) on individual-level attitudes towards energy transition.

We plan to conduct a survey in the state of North-Rhine Westphalia.
]

.pull-right[
</br>
<img src="data:image/png;base64,#../img/4iq3kg.jpg" width="813" style="display: block; margin: auto;" />
.center[.tinyisher[https://imgflip.com/memegenerator/Trump-Bill-Signing]
]
]

---

## Our Sample Area: NRW's Boundaries

```r
sampling_area <-
  osmdata::getbb(
    "Nordrhein-Westfalen", 
    format_out = "sf_polygon"
  ) %>% 
  .$multipolygon %>% 
  sf::st_transform(3035) 
```
]

```r
tm_shape(sampling_area) +
  tm_borders() 
```

<img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/nrw-map-1.png" style="display: block; margin: auto;" />
]

---

## A Fake-Life Application

We can draw a fake sample this way and also add an identifier for the respondents:

```r
set.seed(1234)
```

```r
fake_coordinates <-
  sf::st_sample(sampling_area, 1000) %>% 
  sf::st_sf() %>% 
  dplyr::mutate(
    id_2 = 
      stringi::stri_rand_strings(10000, 10) %>% 
      sample(1000, replace = FALSE)
  )
```
]

```r
tm_shape(sampling_area) +
  tm_borders() +
  tm_shape(fake_coordinates) +
  tm_dots()
```

<img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/map-osm-coordinates-1.png" style="display: block; margin: auto;" />
]

---

## Correspondence Table

As in any survey that deals with addresses, we need a correspondence table of the distinct identifiers.

```r
correspondence_table <-
  dplyr::bind_cols(
    id = 
      stringi::stri_rand_strings(10000, 10) %>% 
      sample(1000, replace = FALSE),
    id_2 = fake_coordinates$id_2
  )

correspondence_table
```

```
## # A tibble: 1,000 × 2
##    id         id_2      
##    <chr>      <chr>     
##  1 ubsHG5McEM ihkXs9ejBD
##  2 WN7Ih0Y5Rz bN2W1BpZKx
##  3 lmLdwfl3cu wAbDkMovWz
##  4 uq2Rb6Dj2w R4eIulul4z
##  5 y7eYFQSuP3 XOvF2ZuGg1
##  6 UxERvtP2Kx EPuILKVeoq
##  7 67N3O8FPyO 39TfAAxmme
##  8 I0AUhXMPkD 2tolhpgrNl
##  9 41h2EPFU1S nGgofAl6iC
## 10 9YaVHR70jt 4sXgiH1ydA
## # ℹ 990 more rows
```

---

## Conduct the Survey

We ask respondents for some standard sociodemographics. 
But we also include an item from the [GLES Panel](https://doi.org/10.4232/1.14114) on energy transformation: 
"From 2030, no more new cars with petrol or diesel engines are to be registered in Germany. How much do you agree?" (entrans). 
Since we cannot share the actual data, we created fake data using the [`faux` package](https://cran.r-project.org/web/packages/faux/index.html).

```r
fake_survey_data <- 
  dplyr::bind_cols(
    id = correspondence_table$id,
    age = sample(18:100, 1000, replace = TRUE),
    gender = 
      sample(1:2, 1000, replace = TRUE) %>% 
      as.factor(),
    education =
      sample(1:4, 1000, replace = TRUE) %>% 
      as.factor(),
    income =
      sample(100:10000, 1000, replace = TRUE),
    entrans = secret_variable_we_are_hiding_from_you
  )
```

---

## Survey Data Structure

```r
fake_survey_data 
```

```
## # A tibble: 1,000 × 6
##    id           age gender education income entrans
##    <chr>      <int> <fct>  <fct>      <int>   <dbl>
##  1 ubsHG5McEM    72 2      1           6061    69.9
##  2 WN7Ih0Y5Rz    49 1      3           4548    50.6
##  3 lmLdwfl3cu    84 1      4           6850    45.0
##  4 uq2Rb6Dj2w    90 1      4           1186    55.0
##  5 y7eYFQSuP3    88 2      2           5888    61.4
##  6 UxERvtP2Kx    58 2      1           9210    59.5
##  7 67N3O8FPyO    90 2      3            789    52.4
##  8 I0AUhXMPkD    45 1      4           1925    49.8
##  9 41h2EPFU1S    36 1      3           9587    55.5
## 10 9YaVHR70jt    98 2      2           4455    49.7
## # ℹ 990 more rows
```

---

## What could explain our ?

*Access to charging infrastructure*
> Better access to charging infrastructure,  higher support for energy transformation.

*Alternative means of transport*
> Better access to public transportation, higher support for energy transformation.

*Rural-urban divide*
> Higher population density, higher support for energy transformation.

---

## District-level Data

We already have most of our information and/or created the indicator on the district level yesterday.
Let's load the respective data, reduce it to NRW, and have a look.

```r
sampling_area_attributes <-
  # load district shapefile
  sf::read_sf("./data/VG250_KRS.shp") %>% 
  # transform crs
  sf::st_transform(3035) %>% 
  # some data cleaning
  dplyr::mutate(district_id = as.numeric(AGS)) %>% 
  dplyr::select(district_id) %>% 
  # reduce to area of nrw: x intersects with y
  sf::st_join(.,
              sampling_area, 
              join = sf::st_intersects, 
              # keep only districts that are intersecting
              left = FALSE) %>% 
  # add attribute table
  dplyr::left_join(. , 
                   readr::read_delim("./data/attributes_districts.csv",
                                     delim = ";"), 
                   by = "district_id") 
```

---

## District Operationalization

*Access to charging infrastructure*
> Charging stations per 1000 inhabitants in a district

*Alternative means of transport*
> Distance to public transportation in a district

*Rural-urban divide*
> Population Density in a district

---

## Access to charging infrastructure

Luckily, we already calculated this yesterday!

```r
sampling_area_attributes <- 
  charger_nrw %>%
  # spatial join district ids
  sf::st_join(sampling_area_attributes %>% dplyr::select(district_id), 
              join = sf::st_within) %>%
  # Group by district ID
  dplyr::group_by(district_id) %>%
  # Summarize the number of chargers in each district
  dplyr::summarise(charger_count = n()) %>%
  # Drop geometry column
  sf::st_drop_geometry() %>%
  # Left join with sampling area attributes
  left_join(sampling_area_attributes, ., by = "district_id") %>%
  # Calculate charger density per 1000 population
  dplyr::mutate(charger_dens = (charger_count * 1000) / population)
```

---

## Alternative means of transport

We got that Information from the INKAR database: Population-weighted linear distance to the nearest public transport stop with at least 20 departures per day.

<img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/transport-map-disp-1.png" width="50%" height="50%" style="display: block; margin: auto;" />
]

---

## Rural-urban divide

Our attribute table contains the number of inhabitants per district but not the population density.
Therefore, we need to calculate the area of the district.

```r
# calculate area of districts
# areas will always be calculated
# in units according to the CRS 
sf::st_area(sampling_area_attributes) %>% 
  head(4)
```

```
## Units: [m^2]
## [1] 1269653841 1991051810  797826703  694472630
```

```r
sampling_area_attributes %>% 
  sf::st_transform(4326) %>% 
  sf::st_area(.) %>% 
  head(4)
```

```
## Units: [m^2]
## [1] 1264848801 1983072559  794743475  691822618
```

---

## Population Density

All left to do is a simple mutation:

```r
# calculation population density
sampling_area_attributes <-
  sampling_area_attributes %>% 
  # calculate area of districts (areas will always
  # be calculated in units according to the CRS )
  dplyr::mutate(area = sf::st_area(.)) %>% 
  # change unit to square kilometers
  dplyr::mutate(area_km2 = units::set_units
                (area, km^2)) %>% 
  # recode variable as numeric
  dplyr::mutate(area_km2 = as.numeric
                (area_km2)) %>% 
  # calculate population density
  dplyr::mutate(pop_dens = population/
                  area_km2)
```
]

.pull-right[
<img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />
]

---

## Respondents in Districts

We have population density on the district level.  Since our analysis focuses on the individual level, we can spatially join the information to our fake respondents' coordinates.

```r
district_linked_df <-
  sampling_area_attributes %>%
  # keeping just the variables we want
  dplyr::select(charger_dens,publictransport_meandist, pop_dens) %>% 
  # since we want to join district to
  # respondent defining coordinates first
  sf::st_join(fake_coordinates,
              # district data second
              . ,
              # some points may lie on the border
              # choosing intersects 
              join = sf::st_intersects) %>% 
  # drop our coordinates for data protection
  sf::st_drop_geometry()
```

---

## Respondents in Districts

```r
head(district_linked_df, 5)
```

```
##         id_2 charger_dens publictransport_meandist
## 1 ihkXs9ejBD    0.6780067                    301.3
## 2 bN2W1BpZKx    0.4959913                    482.4
## 3 wAbDkMovWz    0.7682106                    368.7
## 4 R4eIulul4z    0.4959913                    482.4
## 5 XOvF2ZuGg1    0.5132439                    408.3
##   pop_dens
## 1 533.8628
## 2 214.0663
## 3 133.4900
## 4 214.0663
## 5 188.8392
```

---

## Too boring? Let's scale it down!

We have our nice fake coordinates, and we know that we also have variations in some districts (e.g., Cologne) concerning e-car mobility.
So, let's try to operationalize the variables on a smaller level of aggregation.

*Access to charging infrastructure*
> Charging stations in a 5000m buffer

*Alternative means of transport*
> Distance to the closest train stop

*Rural-urban divide*
> Population in a 5000m buffer

---

## Charging stations in 5000m Buffer

The procedure for calculating the number of chargers in a 5km buffer is very similar to calculating the chargers in a district.

```r
# Create 5000m buffers around the fake coordinates
buffers <- 
  fake_coordinates %>%
  sf::st_buffer(dist = 5000)

# Perform intersection between buffers and points_sf
inter <- 
  sf::st_intersects(buffers, charger_nrw)

# Count points within each buffer
coordinate_linked_df <- 
  fake_coordinates %>%
  mutate(num_charger = lengths(inter))
```

---

## Distance Calculation I

To measure access to alternative transportation (e.g., public transport), we want to measure each respondent's distance to the closest train station. 
We can get the train station points from OSM.

```r
nrw_pt_stops <-
  osmdata::getbb(
    "Nordrhein-Westfalen" 
  ) %>% 
  osmdata::opq(timeout = 25*100) %>% 
  osmdata::add_osm_feature(key = "public_transport", value = "stop_position") %>% 
  osmdata::osmdata_sf()

nrw_pt_stops <-
  nrw_pt_stops$osm_points %>%  
  tibble::as_tibble() %>%  
  sf::st_as_sf() %>%  
  sf::st_transform(3035)

nrw_pt_trainstops <-
  nrw_pt_stops %>% 
  dplyr::filter(train == "yes") %>% 
  dplyr::select()

# takes a while, so sneaky preparation
nrw_pt_trainstops <- sf::st_read("./data/nrw_pt_osmtrainstops.shp", crs = 3035)
```

---

## Distance Calculation II

`sf::st_distance()` will calculate between **all** respondents and **all** train stations resulting in a matrix with 2,710,000 objects (1,000 respondent * 2,710 stations). 
We can make our lives easier by first identifying the nearest station and then calculating the distance.

```r
# Find the nearest charging station 
nearest_station <- 
  sf::st_nearest_feature(fake_coordinates, 
                         nrw_pt_trainstops)

# Calculate the distance between each point in
# fake_coordinates & its nearest charging station
distances <-
  sf::st_distance(fake_coordinates, 
                 nrw_pt_trainstops[nearest_station,], 
                  by_element = TRUE)

# add a column for the distances
coordinate_linked_df  <- 
  coordinate_linked_df %>%
  mutate(
    # Calculate distances in kilometers 
    dist_km = as.numeric(distances) / 1000) 
```
]

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0833  2.1552  3.9025  4.5746  6.1253 15.5141
```
]

---

## Population Buffers

...and we're not yet done: we still need the population in the neighborhood. Let's calculate buffers of 5000 meters and add the population mean values to our dataset.

```r
# download data & extract information
inhabitants_nrw <-
  z11::z11_get_100m_attribute(Einwohner) %>% 
  terra::crop(. , sampling_area)

# spatially link "on the fly"
population_buffers <- 
  terra::extract(
    inhabitants_nrw, 
    fake_coordinates %>% 
      sf::st_buffer(5000) %>% 
      terra::vect(), 
    fun = mean,
    na.rm = TRUE
  )

# link with data 
coordinate_linked_df <-
  coordinate_linked_df %>% 
  dplyr::mutate(population_buffer = population_buffers[[2]])
```

---

## Join with Survey

I hope you're not tired of joining data tables.
Since we care a tiny bit more about data protection than others, we have yet another joining task left: joining the information we received using our (protected) fake coordinates to the actual survey data via the correspondence table.

```r
# last joins for now
fake_survey_data_spatial <-
  # first join the id
  dplyr::left_join(
    correspondence_table, 
    district_linked_df, 
    by = "id_2"
  ) %>% 
  dplyr::left_join(
    ., 
    coordinate_linked_df, 
    by = "id_2"
  ) %>% 
  # drop the fake_coordinate id
  dplyr::select(-id_2) %>% 
  # join the survey data
  dplyr::left_join(
    fake_survey_data,
    by = "id"
  ) 
```
]

.pull-right[
<img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/correlation-plot-1.png" width="75%" style="display: block; margin: auto;" />

]

---

[Exercise](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/exercises/2_2_1_Spatial_Joins.html)

[Solution](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/exercises/2_2_1_Spatial_Joins.html)

---

## Addon-slides: Example Studies

---

## Environmental inequalities (Jünger, 2021)

> Is income associated with fewer environmental disadvantages, and are there differences between German people and people with a migration background?

.pull-left[
.small[
Theoretical Framework
- Social and Ethnic Inequalities (Crowder & Downey, 2010)
- Place Stratification (Lersch, 2013)

Data
- GGSS 2016 & 2018
- soil sealing & green spaces
]
]

.pull-right[
<img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="65%" style="display: block; margin: auto;" />

---

## Results

.tinyisher[Data source: GGSS 2016 & 2018; N = 6,117; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, household size, german region and survey year interaction, inhabitant size of the municipality, and distance to municipality administration]

---

## Attitudes towards minorities (Jünger & Schaeffer, 2022)

> Do people who live in ethnic homogenous neighborhoods that are close to ethnic diverse ones have more negative attitudes towards minorities?

.pull-left[
.small[
Theoretical Framework
- Contact Theory (Allport, 1954)
- Ethnic Competition (Stephan et al., 2009)

Data
- GGSS 2016
- German Census 2011
]
]

.pull-right[
<img src="data:image/png;base64,#../img/Abb1.png" width="65%" style="display: block; margin: auto;" />

---

## Results

.tinyisher[Data source: GGSS 2016; N = 1,689; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, income, unemployment, homeownership, immigrants and inhabitants in the neighborhood, inhabitant size of the municipality, german region]

---

## Left Behind by the State? (Stroppe, 2023)

> Are political trust levels affected by the accessibility of public services and infrastructures for citizens?

.pull-left[
.small[
Theoretical Framework
- Political Performance-Trust Link (Easton 1965, Hetherington 2005)
- Context condition low-intensity information cue (Cho & Rudolph 2008)

Data
- GGSS 2018
- hospital, school, train station (distance measures)
- municipality data
]
]

.pull-right[
<br>
<img src="data:image/png;base64,#../img/meandist_trains.PNG" width="65%" style="display: block; margin: auto;" />

.tinyisher[Federal Statistical Office 2019, Deutsche Bahn 2017 and GeoBasis-DE / BKG 2022 / Stroppe, 2023]
]

---

## Results

.tinyisher[Data source: GGSS 2018 and Federal Statistical Office 2017. N = 3030,  Groups = 152 (Municipalities). Fitted Models: OLS multi-level random effect models. Individual-level controls: income, gender, education, age, personal trust, political interest. Municipality level controls: population density and unemployment. Dependent variable: Trust in government. Survey weights are applied.]

---

layout: false
class: center
background-image: url(data:image/png;base64,#../assets/img/the_end.png)
background-size: cover

.left-column[
</br>
<img src="data:image/png;base64,#../img/Anne.png" width="75%" style="display: block; margin: auto;" />

]
.right-column[
.left[.small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">
  <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path>
</svg> [anne-kathrin.stroppe@gesis.org](mailto:anne-kathrin.stroppe@gesis.org)]
.small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">
  <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path>
</svg> [`@astroppe`](https://twitter.com/stroppann)]
.small[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">
  <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path>
</svg> [`stroppann`](https://github.com/stroppann)]
.small[<svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">
  <path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"></path>
</svg> [`NA`](NA)]]
]