class: center, middle, inverse, title-slide .title[ # Introduction to Geospatial Techniques for Social Scientists in R ] .subtitle[ ## Applied Data Wrangling ] .author[ ### Stefan Jünger & Anne-Kathrin Stroppe ] .institute[ ###
GESIS Workshop
] .date[ ### April 24, 2024 ] --- layout: true --- ## Now <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Title </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> April 23 </td> <td style="text-align:left;color: gray !important;"> 10:00-11:30 </td> <td style="text-align:left;font-weight: bold;"> Introduction to GIS </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 23 </td> <td style="text-align:left;color: gray !important;"> 11:45-13:00 </td> <td style="text-align:left;font-weight: bold;"> Vector Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> April 23 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 13:00-14:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 23 </td> <td style="text-align:left;color: gray !important;"> 14:00-15:30 </td> <td style="text-align:left;font-weight: bold;"> Mapping </td> </tr> <tr> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> April 23 </td> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> 15:45-17:00 </td> <td style="text-align:left;font-weight: bold;border-bottom: 1px solid"> Raster Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;"> 09:00-10:30 </td> <td style="text-align:left;font-weight: bold;"> Advanced Data Import & Processing </td> </tr> <tr> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> April 24 </td> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> 10:45-12:00 </td> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Applied Data Wrangling & Linking </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00-13:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;"> 13:00-14:30 </td> <td style="text-align:left;font-weight: bold;"> Investigating Spatial Autocorrelation </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;"> 14:45-16:00 </td> <td style="text-align:left;font-weight: bold;"> Spatial Econometrics & Outlook </td> </tr> </tbody> </table> --- ## What Are Georeferenced Data? .pull-left[ </br> Data with a direct spatial reference `\(\rightarrow\)` **geo-coordinates** - Information about geometries - Optional: Content in relation to the geometries ] .pull-right[ <img src="data:image/png;base64,#../img/fig_geometries.png" width="85%" style="display: block; margin: auto;" /> .tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019] ] --- ## Georeferenced Survey Data Survey data enriched with geo-coordinates (or other direct spatial references) </br> <img src="data:image/png;base64,#../img/geo_surveys.png" width="85%" style="display: block; margin: auto;" /> </br> .center[**With georeferenced survey data, we can analyze interactions between individual behaviors and attitudes and the environment.**] --- ## An Example Workflow .pull-left[ From the addresses to analyses with georeferenced survey data, several steps and challenges along the way. We will talk about: - Data Protection & Data Access - Geocoding - Spatial Data Linking - Applied Examples ] .pull-right[ <img src="data:image/png;base64,#../img/varreport.png" width="75%" style="display: block; margin: auto;" /> ] --- ## Data Protection </br> </br> That‘s one of the biggest issues - Explicit spatial references increase the risk of re-identifying anonymized survey respondents - Can occur during the processing of data but also during the analysis </br> .center[**Affects all phases of research and data management!**] --- ## Data Availability .pull-left[ Geospatial Data - Often de-centralized distributed - Fragmented data landscape, at least in Germany Georeferenced Survey Data - Primarily, survey data - Depends on documentation - Access difficult due to data protection restrictions ] .pull-right[ <img src="data:image/png;base64,#../img/data_availability.png" width="75%" style="display: block; margin: auto;" /> .right[.tinyisher[ https://www.eea.europa.eu/data-and-maps https://datasearch.gesis.org/ https://datasetsearch.research.google.com/ ]] ] --- ## Distribution & Re-Identification Risk Even without (in)direct spatial references, data may still be sensitive - Geospatial attributes add new information to existing data - Maybe part of general data privacy checks, but we may not distribute these data as is .pull-left[ Safe Rooms / Secure Data Centers - Control access - Checks output ] .pull-right[ <img src="data:image/png;base64,#../img/safe_room.png" width="825" style="display: block; margin: auto;" /> .right[.tinyisher[https://www.gesis.org/en/services/processing-and-analyzing-data/guest-research-stays/secure-data-center-sdc]] ] --- ## Legal Regulations in Data Processing .pull-left[ Storing personal information such as addresses in the same place as actual survey attributes is not allowed in Germany - Projects keep them in separate locations - Can only be matched with a correspondence table - Necessary to conduct data linking ] .pull-right[ <img src="data:image/png;base64,#../img/fig_linking_workflow_simple.png" width="949" style="display: block; margin: auto;" /> .right[.tinyisher[Jünger, 2019]] ] --- ## Geocoding Geocoding is the conversion of indirect spatial references (e.g., addresses) into direct spatial references (e.g., coordinates) However, conducting this procedure is a bit tricky (not only in R). Many services are either - expensive (at least they cost money or have other restrictions) - probably not data protection friendly (Hey Google) - or both --- ## OSM Is Your Friend We can use the Nominatim API from OSM for geocoding of at least a couple of addresses ```r library(tibble) library(tidygeocoder) leibniz_addresses <- tibble::tribble( ~street, ~housenumber, ~zip_code, ~place, ~institute, "B 2", "1", "68159", "Mannheim", "GESIS", "Unter Sachsenhausen", "6-8", "50667", "Köln", "GESIS", "Kellnerweg", "4", "37077", "Göttingen", "DPZ", "Reichsstr.", "4-6", "04109", "Leipzig", "GWZO", "Schöneckstraße", "6", "79104", "Freiburg", "KIS", "Albert-Einstein-Straße", "29a", "18059", "Rostock", "LIKAT", "L7", "1", "68161", "Mannheim", "ZEW", "Müggelseedamm", "310", "12587", "Berlin", "IGB", "Campus D2", "2", "66123", "Saarbrücken", "INM", "Eberswalder Straße", "84", "15374", "Müncheberg (Mark)", "ZALF" ) |> dplyr::mutate(whole_address = paste(street, housenumber, zip_code, place)) ``` --- ## Run the Geocoding ```r leibniz_addresses <- tidygeocoder::geocode( leibniz_addresses, address = whole_address ) leibniz_addresses ``` ``` ## # A tibble: 10 × 8 ## street housenumber zip_code place institute ## <chr> <chr> <chr> <chr> <chr> ## 1 B 2 1 68159 Mann… GESIS ## 2 Unter Sachsen… 6-8 50667 Köln GESIS ## 3 Kellnerweg 4 37077 Gött… DPZ ## 4 Reichsstr. 4-6 04109 Leip… GWZO ## 5 Schöneckstraße 6 79104 Frei… KIS ## 6 Albert-Einste… 29a 18059 Rost… LIKAT ## 7 L7 1 68161 Mann… ZEW ## 8 Müggelseedamm 310 12587 Berl… IGB ## 9 Campus D2 2 66123 Saar… INM ## 10 Eberswalder S… 84 15374 Münc… ZALF ## # ℹ 3 more variables: whole_address <chr>, lat <dbl>, ## # long <dbl> ``` --- ## Convert To `sf` Object And Plot .pull-left[ ```r leibniz_addresses_sf <- leibniz_addresses |> dplyr::filter(!is.na(lat)) |> sf::st_as_sf(coords = c("long", "lat"), crs = 4326) tmaptools::read_osm(leibniz_addresses_sf, type = "esri-topo") |> terra::rast() |> tm_shape() + tm_rgb() + tm_shape(leibniz_addresses_sf) + tm_dots(size = 2, col = "red") ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/conversion-and-plot-exec-1.png" style="display: block; margin: auto;" /> ] --- ## Our Approach We rely on a service offered by the Federal Agency of Cartography and Geodesy (BKG): - Online interface and API for online geocoding - Offline geocoding possible based on raw data - But: Data and service are restricted --- ## `bkggeocoder` .pull-left[ R package `bkggeocoder` developed at GESIS for (offline) geocoding by Stefan and Jonas Lieth: - Access via [Github](https://github.com/StefanJuenger/bkggeocoder) - Introduction in the [Meet the Experts Talk](https://www.youtube.com/watch?v=ZnA21LyKK88&feature=youtu.be) by Stefan ] .pull-right[ </br> </br> <img src="data:image/png;base64,#../img/bkggeocoder.png" width="65%" style="display: block; margin: auto;" /> ] --- ## Spatial Linking .pull-left[ The geocoding tool automatically retrieves point coordinates, administrative unit keys, and grid cell IDs. Spatial joins based on coordinates for other units: - constituencies - administrative units across time (e.g., harmonized territorial status) ] .pull-right[ <img src="data:image/png;base64,#../img/fig_3d_.png" width="80%" style="display: block; margin: auto;" /> .tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), Leibniz Institute of Ecological Urban and Regional Development (2018), Statistical Offices of the Federation and the Länder (2016), and German Environmental Agency / EIONET Central Data Repository (2016) / Jünger, 2019] ] --- ## Data Linking Linking via ids most commonly used but comes with its own challenges (e.g., territorial status and land reforms? comparable units? heterogeneity within units?). <img src="data:image/png;base64,#../img/data_linking.png" width="75%" style="display: block; margin: auto;" /> --- ## Spatial Linking Methods (Examples) I .pull-left[ 1:1 .tinyisher[sf::st_join] <img src="data:image/png;base64,#../img/fig_linking_by_location_noise.png" width="75%" style="display: block; margin: auto;" /> ] .pull-right[ Distances .tinyisher[sf::st_distance] <img src="data:image/png;base64,#../img/fig_linking_distance_noise_appI.png" width="75%" style="display: block; margin: auto;" /> ] .tinyisher[Sources: German Environmental Agency / EIONET Central Data Repository (2016) and OpenStreetMap / GEOFABRIK (2018) / Jünger, 2019] --- ## Spatial Linking Methods (Examples) II .pull-left[ Filter methods .tinyisher[sf::st_filter or terra::vect(. , filter = )] <img src="data:image/png;base64,#../img/fig_linking_focal_immigrants.png" width="75%" style="display: block; margin: auto;" /> ] .pull-right[ Buffer zones .tinyisher[sf::st_buffer (combined with terra::vect())] <img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="75%" style="display: block; margin: auto;" /> ] .tinyisher[Sources: Leibniz Institute of Ecological Urban and Regional Development (2018) and Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019] --- ## Cheatsheet: Spatial Operations An overview of spatial operations using the `sf` package can be accessed [here](https://ugoproto.github.io/ugo_r_doc/pdf/sf.pdf). <img src="data:image/png;base64,#../img/cheatsheet.PNG" width="75%" style="display: block; margin: auto;" /> --- ## Data Aggregation If you want to aggregate attributes and geometries of a shapefile, you can rely on `st_combine(x)` , `st_union(x,y)` and `st_intersection(x,y)` to combine shapefiles, resolve borders and return the intersection of two shapefiles. For raster data, you can aggregate with the function `terra::aggregate()`(if you have matching raster files) in combination with `terra::resample()` (if your raster files don't match). To deal with spatial misalignment: - [`smile` package](https://lcgodoy.me/smile/) - [`areal` package](https://chris-prener.github.io/areal/) --- ## Data Aggregation .pull-left[ ```r german_districts <- sf::read_sf("./data/VG250_KRS.shp") %>% sf::st_transform(3035) %>% dplyr::mutate(federal_state = as.numeric(stringr::str_sub(AGS,1,2))) german_states <- german_districts %>% dplyr::group_by(federal_state) %>% dplyr::summarize(geometry = st_union(geometry)) tm_shape(german_states) + tm_borders() ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/aggregate-data-map-1.png" style="display: block; margin: auto;" /> ] --- ## Fake Research Question .pull-left[ Say we're interested in the impact of neighbourhood characteristics (e.g. mobility infrastructure) on individual-level attitudes towards energy transition. We plan to conduct a survey in the state of North-Rhine Westphalia. ] .pull-right[ </br> <img src="data:image/png;base64,#../img/4iq3kg.jpg" width="813" style="display: block; margin: auto;" /> .center[.tinyisher[https://imgflip.com/memegenerator/Trump-Bill-Signing] ] ] --- ## Our Sample Area: NRW's Boundaries .pull-left[ ```r sampling_area <- osmdata::getbb( "Nordrhein-Westfalen", format_out = "sf_polygon" ) %>% .$multipolygon %>% sf::st_transform(3035) ``` ] -- .pull-right[ ```r tm_shape(sampling_area) + tm_borders() ``` <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/nrw-map-1.png" style="display: block; margin: auto;" /> ] --- ## A Fake-Life Application .pull-left[ Let's sample 1,000 people to interview them about their lives. We can draw a fake sample this way and also add an identifier for the respondents: ```r set.seed(1234) ``` ```r fake_coordinates <- sf::st_sample(sampling_area, 1000) %>% sf::st_sf() %>% dplyr::mutate( id_2 = stringi::stri_rand_strings(10000, 10) %>% sample(1000, replace = FALSE) ) ``` ] -- .pull-right[ ```r tm_shape(sampling_area) + tm_borders() + tm_shape(fake_coordinates) + tm_dots() ``` <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/map-osm-coordinates-1.png" style="display: block; margin: auto;" /> ] --- ## Correspondence Table As in any survey that deals with addresses, we need a correspondence table of the distinct identifiers. ```r correspondence_table <- dplyr::bind_cols( id = stringi::stri_rand_strings(10000, 10) %>% sample(1000, replace = FALSE), id_2 = fake_coordinates$id_2 ) correspondence_table ``` ``` ## # A tibble: 1,000 × 2 ## id id_2 ## <chr> <chr> ## 1 ubsHG5McEM ihkXs9ejBD ## 2 WN7Ih0Y5Rz bN2W1BpZKx ## 3 lmLdwfl3cu wAbDkMovWz ## 4 uq2Rb6Dj2w R4eIulul4z ## 5 y7eYFQSuP3 XOvF2ZuGg1 ## 6 UxERvtP2Kx EPuILKVeoq ## 7 67N3O8FPyO 39TfAAxmme ## 8 I0AUhXMPkD 2tolhpgrNl ## 9 41h2EPFU1S nGgofAl6iC ## 10 9YaVHR70jt 4sXgiH1ydA ## # ℹ 990 more rows ``` --- ## Conduct the Survey We ask respondents for some standard sociodemographics. But we also include an item from the [GLES Panel](https://doi.org/10.4232/1.14114) on energy transformation: "From 2030, no more new cars with petrol or diesel engines are to be registered in Germany. How much do you agree?" (entrans). Since we cannot share the actual data, we created fake data using the [`faux` package](https://cran.r-project.org/web/packages/faux/index.html). ```r fake_survey_data <- dplyr::bind_cols( id = correspondence_table$id, age = sample(18:100, 1000, replace = TRUE), gender = sample(1:2, 1000, replace = TRUE) %>% as.factor(), education = sample(1:4, 1000, replace = TRUE) %>% as.factor(), income = sample(100:10000, 1000, replace = TRUE), entrans = secret_variable_we_are_hiding_from_you ) ``` --- ## Survey Data Structure ```r fake_survey_data ``` ``` ## # A tibble: 1,000 × 6 ## id age gender education income entrans ## <chr> <int> <fct> <fct> <int> <dbl> ## 1 ubsHG5McEM 72 2 1 6061 69.9 ## 2 WN7Ih0Y5Rz 49 1 3 4548 50.6 ## 3 lmLdwfl3cu 84 1 4 6850 45.0 ## 4 uq2Rb6Dj2w 90 1 4 1186 55.0 ## 5 y7eYFQSuP3 88 2 2 5888 61.4 ## 6 UxERvtP2Kx 58 2 1 9210 59.5 ## 7 67N3O8FPyO 90 2 3 789 52.4 ## 8 I0AUhXMPkD 45 1 4 1925 49.8 ## 9 41h2EPFU1S 36 1 3 9587 55.5 ## 10 9YaVHR70jt 98 2 2 4455 49.7 ## # ℹ 990 more rows ``` --- ## What could explain our ? *Access to charging infrastructure* > Better access to charging infrastructure, higher support for energy transformation. -- *Alternative means of transport* > Better access to public transportation, higher support for energy transformation. -- *Rural-urban divide* > Higher population density, higher support for energy transformation. --- ## District-level Data We already have most of our information and/or created the indicator on the district level yesterday. Let's load the respective data, reduce it to NRW, and have a look. ```r sampling_area_attributes <- # load district shapefile sf::read_sf("./data/VG250_KRS.shp") %>% # transform crs sf::st_transform(3035) %>% # some data cleaning dplyr::mutate(district_id = as.numeric(AGS)) %>% dplyr::select(district_id) %>% # reduce to area of nrw: x intersects with y sf::st_join(., sampling_area, join = sf::st_intersects, # keep only districts that are intersecting left = FALSE) %>% # add attribute table dplyr::left_join(. , readr::read_delim("./data/attributes_districts.csv", delim = ";"), by = "district_id") ``` --- ## District Operationalization *Access to charging infrastructure* > Charging stations per 1000 inhabitants in a district *Alternative means of transport* > Distance to public transportation in a district *Rural-urban divide* > Population Density in a district --- ## Access to charging infrastructure Luckily, we already calculated this yesterday! ```r sampling_area_attributes <- charger_nrw %>% # spatial join district ids sf::st_join(sampling_area_attributes %>% dplyr::select(district_id), join = sf::st_within) %>% # Group by district ID dplyr::group_by(district_id) %>% # Summarize the number of chargers in each district dplyr::summarise(charger_count = n()) %>% # Drop geometry column sf::st_drop_geometry() %>% # Left join with sampling area attributes left_join(sampling_area_attributes, ., by = "district_id") %>% # Calculate charger density per 1000 population dplyr::mutate(charger_dens = (charger_count * 1000) / population) ``` --- ## Alternative means of transport We got that Information from the INKAR database: Population-weighted linear distance to the nearest public transport stop with at least 20 departures per day. <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/transport-map-disp-1.png" width="50%" height="50%" style="display: block; margin: auto;" /> ] --- ## Rural-urban divide Our attribute table contains the number of inhabitants per district but not the population density. Therefore, we need to calculate the area of the district. ```r # calculate area of districts # areas will always be calculated # in units according to the CRS sf::st_area(sampling_area_attributes) %>% head(4) ``` ``` ## Units: [m^2] ## [1] 1269653841 1991051810 797826703 694472630 ``` ```r sampling_area_attributes %>% sf::st_transform(4326) %>% sf::st_area(.) %>% head(4) ``` ``` ## Units: [m^2] ## [1] 1264848801 1983072559 794743475 691822618 ``` --- ## Population Density All left to do is a simple mutation: .pull-left[ ```r # calculation population density sampling_area_attributes <- sampling_area_attributes %>% # calculate area of districts (areas will always # be calculated in units according to the CRS ) dplyr::mutate(area = sf::st_area(.)) %>% # change unit to square kilometers dplyr::mutate(area_km2 = units::set_units (area, km^2)) %>% # recode variable as numeric dplyr::mutate(area_km2 = as.numeric (area_km2)) %>% # calculate population density dplyr::mutate(pop_dens = population/ area_km2) ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] --- ## Respondents in Districts We have population density on the district level. Since our analysis focuses on the individual level, we can spatially join the information to our fake respondents' coordinates. ```r district_linked_df <- sampling_area_attributes %>% # keeping just the variables we want dplyr::select(charger_dens,publictransport_meandist, pop_dens) %>% # since we want to join district to # respondent defining coordinates first sf::st_join(fake_coordinates, # district data second . , # some points may lie on the border # choosing intersects join = sf::st_intersects) %>% # drop our coordinates for data protection sf::st_drop_geometry() ``` --- ## Respondents in Districts ```r head(district_linked_df, 5) ``` ``` ## id_2 charger_dens publictransport_meandist ## 1 ihkXs9ejBD 0.6780067 301.3 ## 2 bN2W1BpZKx 0.4959913 482.4 ## 3 wAbDkMovWz 0.7682106 368.7 ## 4 R4eIulul4z 0.4959913 482.4 ## 5 XOvF2ZuGg1 0.5132439 408.3 ## pop_dens ## 1 533.8628 ## 2 214.0663 ## 3 133.4900 ## 4 214.0663 ## 5 188.8392 ``` --- ## Too boring? Let's scale it down! We have our nice fake coordinates, and we know that we also have variations in some districts (e.g., Cologne) concerning e-car mobility. So, let's try to operationalize the variables on a smaller level of aggregation. *Access to charging infrastructure* > Charging stations in a 5000m buffer *Alternative means of transport* > Distance to the closest train stop *Rural-urban divide* > Population in a 5000m buffer --- ## Charging stations in 5000m Buffer The procedure for calculating the number of chargers in a 5km buffer is very similar to calculating the chargers in a district. ```r # Create 5000m buffers around the fake coordinates buffers <- fake_coordinates %>% sf::st_buffer(dist = 5000) # Perform intersection between buffers and points_sf inter <- sf::st_intersects(buffers, charger_nrw) # Count points within each buffer coordinate_linked_df <- fake_coordinates %>% mutate(num_charger = lengths(inter)) ``` --- ## Distance Calculation I To measure access to alternative transportation (e.g., public transport), we want to measure each respondent's distance to the closest train station. We can get the train station points from OSM. ```r nrw_pt_stops <- osmdata::getbb( "Nordrhein-Westfalen" ) %>% osmdata::opq(timeout = 25*100) %>% osmdata::add_osm_feature(key = "public_transport", value = "stop_position") %>% osmdata::osmdata_sf() nrw_pt_stops <- nrw_pt_stops$osm_points %>% tibble::as_tibble() %>% sf::st_as_sf() %>% sf::st_transform(3035) nrw_pt_trainstops <- nrw_pt_stops %>% dplyr::filter(train == "yes") %>% dplyr::select() # takes a while, so sneaky preparation nrw_pt_trainstops <- sf::st_read("./data/nrw_pt_osmtrainstops.shp", crs = 3035) ``` --- ## Distance Calculation II `sf::st_distance()` will calculate between **all** respondents and **all** train stations resulting in a matrix with 2,710,000 objects (1,000 respondent * 2,710 stations). We can make our lives easier by first identifying the nearest station and then calculating the distance. .pull-left[ ```r # Find the nearest charging station nearest_station <- sf::st_nearest_feature(fake_coordinates, nrw_pt_trainstops) # Calculate the distance between each point in # fake_coordinates & its nearest charging station distances <- sf::st_distance(fake_coordinates, nrw_pt_trainstops[nearest_station,], by_element = TRUE) # add a column for the distances coordinate_linked_df <- coordinate_linked_df %>% mutate( # Calculate distances in kilometers dist_km = as.numeric(distances) / 1000) ``` ] .pull-right[ ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0833 2.1552 3.9025 4.5746 6.1253 15.5141 ``` ] --- ## Population Buffers ...and we're not yet done: we still need the population in the neighborhood. Let's calculate buffers of 5000 meters and add the population mean values to our dataset. ```r # download data & extract information inhabitants_nrw <- z11::z11_get_100m_attribute(Einwohner) %>% terra::crop(. , sampling_area) # spatially link "on the fly" population_buffers <- terra::extract( inhabitants_nrw, fake_coordinates %>% sf::st_buffer(5000) %>% terra::vect(), fun = mean, na.rm = TRUE ) # link with data coordinate_linked_df <- coordinate_linked_df %>% dplyr::mutate(population_buffer = population_buffers[[2]]) ``` --- ## Join with Survey I hope you're not tired of joining data tables. Since we care a tiny bit more about data protection than others, we have yet another joining task left: joining the information we received using our (protected) fake coordinates to the actual survey data via the correspondence table. .pull-left[ ```r # last joins for now fake_survey_data_spatial <- # first join the id dplyr::left_join( correspondence_table, district_linked_df, by = "id_2" ) %>% dplyr::left_join( ., coordinate_linked_df, by = "id_2" ) %>% # drop the fake_coordinate id dplyr::select(-id_2) %>% # join the survey data dplyr::left_join( fake_survey_data, by = "id" ) ``` ] .pull-right[ <img src="data:image/png;base64,#2_2_Applied_Data_Wrangling_files/figure-html/correlation-plot-1.png" width="75%" style="display: block; margin: auto;" /> ] --- class: middle ## Exercise 2_2_1: Spatial Joins [Exercise](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/exercises/2_2_1_Spatial_Joins.html) [Solution](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/exercises/2_2_1_Spatial_Joins.html) --- class: middle ## Addon-slides: Example Studies --- ## Environmental inequalities (Jünger, 2021) > Is income associated with fewer environmental disadvantages, and are there differences between German people and people with a migration background? .pull-left[ .small[ Theoretical Framework - Social and Ethnic Inequalities (Crowder & Downey, 2010) - Place Stratification (Lersch, 2013) Data - GGSS 2016 & 2018 - soil sealing & green spaces ] ] .pull-right[ <img src="data:image/png;base64,#../img/fig_linking_buffer_sealing.png" width="65%" style="display: block; margin: auto;" /> .tinyisher[Leibniz Institute of Ecological Urban and Regional Development (2018) / Jünger, 2019] ] --- ## Results <img src="data:image/png;base64,#../img/FIGURE_2.png" width="70%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2016 & 2018; N = 6,117; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, household size, german region and survey year interaction, inhabitant size of the municipality, and distance to municipality administration] --- ## Attitudes towards minorities (Jünger & Schaeffer, 2022) > Do people who live in ethnic homogenous neighborhoods that are close to ethnic diverse ones have more negative attitudes towards minorities? .pull-left[ .small[ Theoretical Framework - Contact Theory (Allport, 1954) - Ethnic Competition (Stephan et al., 2009) Data - GGSS 2016 - German Census 2011 ] ] .pull-right[ <img src="data:image/png;base64,#../img/Abb1.png" width="65%" style="display: block; margin: auto;" /> .tinyisher[German Census 2011, OpenStreetMap / Jünger & Schaeffer, 2022] ] --- ## Results <img src="data:image/png;base64,#../img/Abb2.png" width="70%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2016; N = 1,689; 95% confidence intervals based on cluster-robust standard errors (sample point); all models control for age, gender, education, income, unemployment, homeownership, immigrants and inhabitants in the neighborhood, inhabitant size of the municipality, german region] --- ## Left Behind by the State? (Stroppe, 2023) > Are political trust levels affected by the accessibility of public services and infrastructures for citizens? .pull-left[ .small[ Theoretical Framework - Political Performance-Trust Link (Easton 1965, Hetherington 2005) - Context condition low-intensity information cue (Cho & Rudolph 2008) Data - GGSS 2018 - hospital, school, train station (distance measures) - municipality data ] ] .pull-right[ <br> <img src="data:image/png;base64,#../img/meandist_trains.PNG" width="65%" style="display: block; margin: auto;" /> .tinyisher[Federal Statistical Office 2019, Deutsche Bahn 2017 and GeoBasis-DE / BKG 2022 / Stroppe, 2023] ] --- ## Results <br> <img src="data:image/png;base64,#../img/fig1_coefplot_colored.png" width="95%" style="display: block; margin: auto;" /> .tinyisher[Data source: GGSS 2018 and Federal Statistical Office 2017. N = 3030, Groups = 152 (Municipalities). Fitted Models: OLS multi-level random effect models. Individual-level controls: income, gender, education, age, personal trust, political interest. Municipality level controls: population density and unemployment. Dependent variable: Trust in government. Survey weights are applied.] --- layout: false class: center background-image: url(data:image/png;base64,#../assets/img/the_end.png) background-size: cover .left-column[ </br> <img src="data:image/png;base64,#../img/Anne.png" width="75%" style="display: block; margin: auto;" /> ] .right-column[ .left[.small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path> </svg> [anne-kathrin.stroppe@gesis.org](mailto:anne-kathrin.stroppe@gesis.org)] .small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path> </svg> [`@astroppe`](https://twitter.com/stroppann)] .small[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path> </svg> [`stroppann`](https://github.com/stroppann)] .small[<svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"></path> </svg> [`NA`](NA)]] ]