Spatial Wrangling

GESIS Workshop: Introduction to Geospatial Techniques for Social Scientists in R

Stefan Jünger, Anne-Kathrin Stroppe, Dennis Abel

2026-04-23

Now

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Spatial Wrangling
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Analysis
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

Mini wrap-up

Thus far, we have learned about

  • Different data formats
  • How to load them
  • First steps in interacting with them
  • Creating maps with ggplot2

But the real magic in geospatial data lies in their flexibility.

Our plan for this afternoon

In this session, you will learn

  • How to wrangle the different geospatial data formats even further
  • Link/join different datasets

1. Advanced Importing

Importing of non-spatial data

Say the data we want to use are not available as a shapefile. Point coordinates are often stored in table formats like .csv – like the location of charging stations for electric cars data in our ./data folder.

# Loading a simple CSV file
echarging_df <- readr::read_delim("./data/charging_points_ger.csv", delim = ";")

echarging_df
# A tibble: 60,560 × 7
   operator                      federal_state     latitude longitude power_kw type                   num_plugs
   <chr>                         <chr>                <dbl>     <dbl>    <dbl> <chr>                      <dbl>
 1 deer GmbH                     Baden-Württemberg     48.3      9.72       44 Normalladeeinrichtung          2
 2 EnBW mobility+ AG und Co.KG   Baden-Württemberg     48.6      9.87       93 Schnellladeeinrichtung         2
 3 SWU Energie GmbH              Baden-Württemberg     48.5     10.2        44 Normalladeeinrichtung          2
 4 SWU Energie GmbH              Baden-Württemberg     48.6     10.1        44 Normalladeeinrichtung          2
 5 SWU Energie GmbH              Baden-Württemberg     48.2     10.1        22 Normalladeeinrichtung          1
 6 EnBW mobility+ AG und Co.KG   Baden-Württemberg     48.5      9.98       30 Normalladeeinrichtung          2
 7 SWU Energie GmbH              Baden-Württemberg     48.5      9.99       22 Normalladeeinrichtung          2
 8 Albwerk GmbH & Co. KG         Baden-Württemberg     48.5      9.76       22 Normalladeeinrichtung          2
 9 EnBW ODR AG                   Baden-Württemberg     48.5     10.0        22 Normalladeeinrichtung          1
10 Autohaus Burger GmbH & Co. KG Baden-Württemberg     48.4      9.77       22 Normalladeeinrichtung          2
# ℹ 60,550 more rows

From data table to geospatial data

We see that besides our attributes (e.g., operator, power,…), the table contains the two variables “longitude” (X) and “latitude” (Y), our point coordinates. When using the command sf::st_as_sf(), it is easy to transform the table into a point layer.

# Transform to spatial data frame
echarging_sf <- 
  echarging_df |>  
  # there are some missings in the data which is not allowed 
  dplyr::filter(!is.na(longitude) & !is.na(latitude)) |> 
  sf::st_as_sf(coords = c("longitude", "latitude"))

# Check the class of the dataset
class(echarging_sf)
[1] "sf"         "tbl_df"     "tbl"        "data.frame"

Final check via plotting

plot(echarging_sf)

Check the CRS!

Make sure to use the option crs = [EPSG_ID]. If not used, your CRS will not be defined, and you can’t perform further commands depending on the CRS. Here, we tried EPSG IO or http://projfinder.com/ to find the correct CRS

echarging_sf <- 
  echarging_df |> 
  # there are some missings in the data which is not allowed  
  dplyr::filter(!is.na(longitude) & !is.na(latitude)) |> 
  sf::st_as_sf(   
    coords = c("longitude", "latitude"),
    crs = 4326
  )

plot(echarging_sf["operator"])

… and the other way round

Do you want to go back to handling a simple data frame? You can quickly achieve this by dropping the geometry column.

# Remove geometry column
echarging_df <- sf::st_drop_geometry(echarging_sf)

# Check the output
head(echarging_df, 2)
# A tibble: 2 × 5
  operator                    federal_state     power_kw type                   num_plugs
  <chr>                       <chr>                <dbl> <chr>                      <dbl>
1 deer GmbH                   Baden-Württemberg       44 Normalladeeinrichtung          2
2 EnBW mobility+ AG und Co.KG Baden-Württemberg       93 Schnellladeeinrichtung         2

APIs / data from the internet

Geospatial data tend to be quite big, and there’s a pressure to distribute data efficiently. Data dumps (on the internet) may not be helpful

  • When resources are low
  • Time’s a factor
  • The data have a large geographic extent

Instead, a Programming Application Interface (API) is often used.

Data providers offering geospatial data APIs

Example: access to public transport

Say, we’re interested in the accessibility of public transport in Cologne.

  • Bus, tram, etc.
  • All platforms and vehicles should be wheel-chair accessible

We can gather this information using OpenStreetMap!

Accessing OSM data: the Overpass API

The Overpass API (formerly known as OSM Server Side Scripting, or OSM3S before 2011) is a read-only API that serves up custom selected parts of the OSM map data. It acts as a database over the web: the client sends a query to the API and returns the data set that corresponds to the query.

Source: https://wiki.openstreetmap.org/wiki/Overpass_API

Starting with a geographic area to query

Many geospatial API requests start with a bounding box or another geographical extent to define which area should be accessed.

cologne_pt_stops <- osmdata::getbb("Köln")

cologne_pt_stops
       min       max
x  6.77253  7.162028
y 50.83044 51.084974

Defining some technical details

The Overpass API also requires a timeout parameter that repeats the request for a while if anything fails.

cologne_pt_stops <-
  osmdata::getbb("Köln") |> 
  osmdata::opq(timeout = 25)

cologne_pt_stops
$bbox
[1] "50.8304399,6.7725303,51.0849743,7.162028"

$prefix
[1] "[out:xml][timeout:25];\n(\n"

$suffix
[1] ");\n(._;>;);\nout body;"

$features
NULL

$osm_types
[1] "node"     "way"      "relation"

attr(,"class")
[1] "list"           "overpass_query"

Turning to the content

The content we aim to request is defined with key/value pairs. It’s best to learn them by looking them up in the official documentation.

cologne_pt_stops <-
  osmdata::getbb("Köln") |> 
  osmdata::opq(timeout = 25) |> 
  osmdata::add_osm_feature(key = "public_transport", value = "stop_position")

cologne_pt_stops
$bbox
[1] "50.8304399,6.7725303,51.0849743,7.162028"

$prefix
[1] "[out:xml][timeout:25];\n(\n"

$suffix
[1] ");\n(._;>;);\nout body;"

$features
[1] "[\"public_transport\"=\"stop_position\"]"

$osm_types
[1] "node"     "way"      "relation"

attr(,"class")
[1] "list"           "overpass_query"

Conduct actual request/download

The data is finally downloaded in the osmdata package, e.g., using the osmdata::osmdata_sf() function.

cologne_pt_stops <-
  osmdata::getbb("Köln") |> 
  osmdata::opq(timeout = 25) |> 
  osmdata::add_osm_feature(key = "public_transport", value = "stop_position") |> 
  osmdata::osmdata_sf()

cologne_pt_stops

Filter and transform

The data comprises a list that can be accessed like any other list in R. Here, we extract the points and wrangle them (spatially).

cologne_pt_stops <-
  osmdata::getbb("Köln") |> 
  osmdata::opq(timeout = 25*100) |> 
  osmdata::add_osm_feature(key = "public_transport", value = "stop_position") |> 
  osmdata::osmdata_sf() |> 
  _$osm_points |> 
  dplyr::filter(wheelchair == "yes")

cologne_pt_stops
Simple feature collection with 612 features and 94 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6.776387 ymin: 50.83381 xmax: 7.161797 ymax: 51.08398
Geodetic CRS:  WGS 84
First 10 features:
           osm_id                    name FIXME      VRS:gemeinde                VRS:name VRS:old_ref VRS:ortsteil
361716     361716              Eifelplatz  <NA>              KÖLN                    <NA>        <NA>   Innenstadt
388550     388550            Brahmsstraße  <NA>              KÖLN                    <NA>        <NA>   Lindenthal
21033643 21033643  Weinsbergstraße/Gürtel  <NA>              KÖLN  Weinsbergstraße/Gürtel        <NA>    Ehrenfeld
27042401 27042401                   Markt  <NA> BERGISCH GLADBACH Bergisch Gladbach Markt        <NA>        Mitte
28122005 28122005                Heumarkt  <NA>              KÖLN                    <NA>        <NA>   Innenstadt
28301370 28301370         Kalker Friedhof  <NA>              KÖLN         Kalker Friedhof        <NA>      Merheim
28301373 28301373                 Merheim  <NA>              KÖLN                    <NA>        <NA>      Merheim
28301416 28301416                 Refrath  <NA> BERGISCH GLADBACH                    <NA>        <NA>      Refrath
30145575 30145575           Köln-Holweide  <NA>              KÖLN         Holweide S-Bahn        <NA>     Holweide
30388742 30388742 Mülheim Berliner Straße  <NA>              <NA>                    <NA>        <NA>         <NA>
         VRS:ref alt_name ashtray bench  bin  bus bus_bay bus_lines check_date check_date:crossing check_date:pole
361716     11509     <NA>    <NA>  <NA> <NA> <NA>    <NA>      <NA>       <NA>                <NA>            <NA>
388550     13313     <NA>    <NA>  <NA> <NA>  yes    <NA>      <NA>       <NA>                <NA>            <NA>
21033643   14221     <NA>    <NA>  <NA> <NA>  yes    <NA>      <NA>       <NA>                <NA>            <NA>
27042401   31241     <NA>    <NA>  <NA> <NA>  yes    <NA>      <NA>       <NA>                <NA>            <NA>
28122005   11110     <NA>    <NA>  <NA> <NA> <NA>    <NA>      <NA>       <NA>                <NA>            <NA>
28301370   18510     <NA>    <NA>  <NA> <NA> <NA>    <NA>      <NA>       <NA>                <NA>            <NA>
28301373   18511     <NA>    <NA>  <NA> <NA> <NA>    <NA>      <NA>       <NA>                <NA>            <NA>
28301416   31612     <NA>    <NA>  <NA> <NA> <NA>    <NA>      <NA>       <NA>                <NA>            <NA>
30145575   19401     <NA>    <NA>  <NA> <NA> <NA>    <NA>      <NA>       <NA>                <NA>            <NA>
30388742    <NA>     <NA>    <NA>  <NA> <NA> <NA>    <NA>      <NA>       <NA>                <NA>            <NA>
         check_date:shelter construction construction:railway construction:tram covered crossing departures_board
361716                 <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
388550                 <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
21033643               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
27042401               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
28122005               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
28301370               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
28301373               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
28301416               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
30145575               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
30388742               <NA>         <NA>                 <NA>              <NA>    <NA>     <NA>             <NA>
         description direction disused exit gtfs:name gtfs:stop_id highway image internet_access layer level light_rail
361716          <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
388550          <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
21033643        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
27042401        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
28122005        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
28301370        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
28301373        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
28301416        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
30145575        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
30388742        <NA>      <NA>    <NA> <NA>      <NA>         <NA>    <NA>  <NA>            <NA>  <NA>  <NA>       <NA>
          lit local_ref name:ar name:en network network:long network:short network:wikidata network:wikipedia note
361716   <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
388550   <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
21033643 <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
27042401 <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
28122005 <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
28301370 <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
28301373 <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
28301416 <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
30145575 <NA>         1    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
30388742 <NA>      <NA>    <NA>    <NA>    <NA>         <NA>          <NA>             <NA>              <NA> <NA>
         note:source note_2 official_name old_name operator operator:short operator:wikidata outdoor_seating panoramax
361716          <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
388550          <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
21033643        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
27042401        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
28122005        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
28301370        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
28301373        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
28301416        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
30145575        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
30388742        <NA>   <NA>          <NA>     <NA>     <NA>           <NA>              <NA>            <NA>      <NA>
         passenger_information_display pole public_transport   railway railway:ref railway:ref:parent  ref ref:AST
361716                            <NA> <NA>    stop_position tram_stop        <NA>               <NA> <NA>    <NA>
388550                            <NA> <NA>    stop_position      <NA>        <NA>               <NA> <NA>    <NA>
21033643                          <NA> <NA>    stop_position      <NA>        <NA>               <NA> <NA>    <NA>
27042401                          <NA> <NA>    stop_position      <NA>        <NA>               <NA> <NA>    <NA>
28122005                          <NA> <NA>    stop_position tram_stop        <NA>               <NA> <NA>    <NA>
28301370                          <NA> <NA>    stop_position tram_stop        <NA>               <NA> <NA>    <NA>
28301373                          <NA> <NA>    stop_position tram_stop        <NA>               <NA> <NA>    <NA>
28301416                          <NA> <NA>    stop_position tram_stop        <NA>               <NA>    1    <NA>
30145575                          <NA> <NA>    stop_position      stop        <NA>               <NA>    1    <NA>
30388742                          <NA> <NA>    stop_position tram_stop        <NA>               <NA> <NA>    <NA>
                   ref:IFOPT ref:VRS ref:hafas ref:ibnr ref_name reg_name route_ref route_ref:note share_taxi
361716                  <NA>    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
388550                  <NA>    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
21033643 de:05315:14221:2:23    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
27042401      de:05378:31241    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
28122005                <NA>    <NA>      <NA>     <NA>     <NA>     <NA>   1, 7, 9           <NA>       <NA>
28301370                <NA>    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
28301373                <NA>    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
28301416 de:05378:31612:2:21    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
30145575                <NA>    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
30388742      de:05315:19713    <NA>      <NA>     <NA>     <NA>     <NA>      <NA>           <NA>       <NA>
         shared_taxi shelter source source:position start_date subway tactile_paving toilets toilets:access
361716          <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
388550          <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
21033643        <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
27042401        <NA>     yes   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
28122005        <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
28301370        <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
28301373        <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
28301416        <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
30145575        <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
30388742        <NA>    <NA>   <NA>            <NA>       <NA>   <NA>           <NA>    <NA>           <NA>
         toilets:num_chambers toilets:wheelchair traffic_sign train tram uic_ref website wheelchair
361716                   <NA>                 no         <NA>  <NA>  yes    <NA>    <NA>        yes
388550                   <NA>               <NA>         <NA>  <NA> <NA>    <NA>    <NA>        yes
21033643                 <NA>               <NA>         <NA>  <NA> <NA>    <NA>    <NA>        yes
27042401                 <NA>               <NA>         <NA>  <NA> <NA>    <NA>    <NA>        yes
28122005                 <NA>               <NA>         <NA>  <NA>  yes    <NA>    <NA>        yes
28301370                 <NA>               <NA>         <NA>  <NA>  yes    <NA>    <NA>        yes
28301373                 <NA>               <NA>         <NA>  <NA>  yes    <NA>    <NA>        yes
28301416                 <NA>               <NA>         <NA>  <NA>  yes    <NA>    <NA>        yes
30145575                 <NA>               <NA>         <NA>   yes <NA>    <NA>    <NA>        yes
30388742                 <NA>               <NA>         <NA>  <NA>  yes    <NA>    <NA>        yes
         wheelchair:description wikidata wikimedia_commons wikipedia                  geometry
361716                     <NA>     <NA>              <NA>      <NA> POINT (6.943506 50.92335)
388550                     <NA>     <NA>              <NA>      <NA> POINT (6.897561 50.92583)
21033643                   <NA>     <NA>              <NA>      <NA>  POINT (6.913467 50.9448)
27042401                   <NA>     <NA>              <NA>      <NA> POINT (7.129956 50.99143)
28122005                   <NA>     <NA>              <NA>      <NA>  POINT (6.959924 50.9357)
28301370                   <NA>     <NA>              <NA>      <NA> POINT (7.042688 50.94265)
28301373                   <NA>     <NA>              <NA>      <NA>  POINT (7.048816 50.9443)
28301416                   <NA>     <NA>              <NA>      <NA>  POINT (7.11643 50.95397)
30145575                   <NA>     <NA>              <NA>      <NA> POINT (7.041192 50.97622)
30388742                   <NA>     <NA>              <NA>      <NA> POINT (7.014818 50.97478)

The data indeed are mappable

tm_shape(cologne_pt_stops) +
  tm_dots()

Raster data access

It is not only vector data that can be processed through these mechanisms.

The idea is the same for raster data.

  • Accessing the information through URLs
  • Just the downloaded data formats differ

One example is data from European Earth observation (EOD) program “Copernicus” and R packages that can be used to download them, such as ecmwfr.

📺Commercial Break📺:

We use ecmwfr for our interface to EOD linking in our R package gxc

Exercise 4_1: OSM Data💪

🖱[Click here for the exercise](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2026/exercises/4_1_OSM_Data.html)

2. Linking

‘Spreadsheet join’ 📅 + 📅 = 💖

While much of our previous data were points, we only had a column for the German federal states as administrative information so far. We’re now moving “a layer down” and looking at Germany on a more fine-grained spatial level: the district by joining data like simple spreadsheets.

‘Spreadsheet join’ 📅 + 📅 = 💖

# Load district shapefile
german_districts <- sf::read_sf("./data/VG250_KRS.shp")

german_districts
Simple feature collection with 431 features and 23 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101487
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 431 × 24
     ADE    GF   BSG ARS   AGS   SDV_ARS   GEN   BEZ     IBZ BEM   NBD   SN_L  SN_R  SN_K  SN_V1 SN_V2 SN_G  FK_S3 NUTS 
   <int> <int> <int> <chr> <chr> <chr>     <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1     4     4     1 01001 01001 01001000… Flen… Krei…    40 --    ja    01    0     01    00    00    000   R     DEF01
 2     4     4     1 01002 01002 01002000… Kiel  Krei…    40 --    ja    01    0     02    00    00    000   R     DEF02
 3     4     4     1 01003 01003 01003000… Lübe… Krei…    40 --    ja    01    0     03    00    00    000   R     DEF03
 4     4     4     1 01004 01004 01004000… Neum… Krei…    40 --    ja    01    0     04    00    00    000   R     DEF04
 5     4     4     1 01051 01051 01051004… Dith… Kreis    42 --    ja    01    0     51    00    00    000   R     DEF05
 6     4     4     1 01053 01053 01053010… Herz… Kreis    42 --    ja    01    0     53    00    00    000   R     DEF06
 7     4     4     1 01054 01054 01054005… Nord… Kreis    42 --    ja    01    0     54    00    00    000   R     DEF07
 8     4     4     1 01055 01055 01055001… Osth… Kreis    42 --    ja    01    0     55    00    00    000   R     DEF08
 9     4     4     1 01056 01056 01056003… Pinn… Kreis    42 --    ja    01    0     56    00    00    000   R     DEF09
10     4     4     1 01057 01057 01057005… Plön  Kreis    42 --    ja    01    0     57    00    00    000   R     DEF0A
# ℹ 421 more rows
# ℹ 5 more variables: ARS_0 <chr>, AGS_0 <chr>, WSK <date>, DEBKG_ID <chr>, geometry <MULTIPOLYGON [m]>

‘Spreadsheet join’ 📅 + 📅 = 💖

# Load district shapefile
german_districts <- sf::read_sf("./data/VG250_KRS.shp")

german_districts

# Load district attributes
attributes_districts <- readr::read_csv2("./data/attributes_districts.csv") 

attributes_districts
# A tibble: 400 × 7
   AGS   car_density ecar_share publictransport_meandist population green_voteshare afd_voteshare
   <chr>       <dbl> <chr>                         <dbl>      <dbl>           <dbl>         <dbl>
 1 01001        4915 2.4                            1765      92550             242            54
 2 01002        4528 2.2                            1914     247717             289             5
 3 01003        4685 1.8                            2551     218095              23            66
 4 01004        5463 2.4                            1809      79502             146            94
 5 01051        6193 1.9                             583     135252              12            88
 6 01053        5949 2                               341     203712             158            84
 7 01054        6325 3.1                            8236     169043              17            52
 8 01055        6431 1.9                            6146     203606             155             7
 9 01056        5617 2.2                            3269     322130              18            69
10 01057        6253 2.4                             570     131266              19            63
# ℹ 390 more rows

‘Spreadsheet join’ 📅 + 📅 = 💖

# Load district shapefile
german_districts <- sf::read_sf("./data/VG250_KRS.shp")

german_districts

# Load district attributes
attributes_districts <- readr::read_csv2("./data/attributes_districts.csv") 

attributes_districts

# Join data
german_districts_enhanced <- 
  german_districts |>  
  dplyr::left_join(attributes_districts, by = "AGS")

german_districts_enhanced
Simple feature collection with 431 features and 29 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101487
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 431 × 30
     ADE    GF   BSG ARS   AGS   SDV_ARS   GEN   BEZ     IBZ BEM   NBD   SN_L  SN_R  SN_K  SN_V1 SN_V2 SN_G  FK_S3 NUTS 
   <int> <int> <int> <chr> <chr> <chr>     <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1     4     4     1 01001 01001 01001000… Flen… Krei…    40 --    ja    01    0     01    00    00    000   R     DEF01
 2     4     4     1 01002 01002 01002000… Kiel  Krei…    40 --    ja    01    0     02    00    00    000   R     DEF02
 3     4     4     1 01003 01003 01003000… Lübe… Krei…    40 --    ja    01    0     03    00    00    000   R     DEF03
 4     4     4     1 01004 01004 01004000… Neum… Krei…    40 --    ja    01    0     04    00    00    000   R     DEF04
 5     4     4     1 01051 01051 01051004… Dith… Kreis    42 --    ja    01    0     51    00    00    000   R     DEF05
 6     4     4     1 01053 01053 01053010… Herz… Kreis    42 --    ja    01    0     53    00    00    000   R     DEF06
 7     4     4     1 01054 01054 01054005… Nord… Kreis    42 --    ja    01    0     54    00    00    000   R     DEF07
 8     4     4     1 01055 01055 01055001… Osth… Kreis    42 --    ja    01    0     55    00    00    000   R     DEF08
 9     4     4     1 01056 01056 01056003… Pinn… Kreis    42 --    ja    01    0     56    00    00    000   R     DEF09
10     4     4     1 01057 01057 01057005… Plön  Kreis    42 --    ja    01    0     57    00    00    000   R     DEF0A
# ℹ 421 more rows
# ℹ 11 more variables: ARS_0 <chr>, AGS_0 <chr>, WSK <date>, DEBKG_ID <chr>, geometry <MULTIPOLYGON [m]>,
#   car_density <dbl>, ecar_share <chr>, publictransport_meandist <dbl>, population <dbl>, green_voteshare <dbl>,
#   afd_voteshare <dbl>

Spatial join

But what can we do if we do not have a matching identifier? For example, there are no matching administrative identifiers in the German district data and the e-charger data.

german_districts_enhanced
Simple feature collection with 431 features and 29 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101487
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 431 × 30
     ADE    GF   BSG ARS   AGS   SDV_ARS   GEN   BEZ     IBZ BEM   NBD   SN_L  SN_R  SN_K  SN_V1 SN_V2 SN_G  FK_S3 NUTS 
   <int> <int> <int> <chr> <chr> <chr>     <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1     4     4     1 01001 01001 01001000… Flen… Krei…    40 --    ja    01    0     01    00    00    000   R     DEF01
 2     4     4     1 01002 01002 01002000… Kiel  Krei…    40 --    ja    01    0     02    00    00    000   R     DEF02
 3     4     4     1 01003 01003 01003000… Lübe… Krei…    40 --    ja    01    0     03    00    00    000   R     DEF03
 4     4     4     1 01004 01004 01004000… Neum… Krei…    40 --    ja    01    0     04    00    00    000   R     DEF04
 5     4     4     1 01051 01051 01051004… Dith… Kreis    42 --    ja    01    0     51    00    00    000   R     DEF05
 6     4     4     1 01053 01053 01053010… Herz… Kreis    42 --    ja    01    0     53    00    00    000   R     DEF06
 7     4     4     1 01054 01054 01054005… Nord… Kreis    42 --    ja    01    0     54    00    00    000   R     DEF07
 8     4     4     1 01055 01055 01055001… Osth… Kreis    42 --    ja    01    0     55    00    00    000   R     DEF08
 9     4     4     1 01056 01056 01056003… Pinn… Kreis    42 --    ja    01    0     56    00    00    000   R     DEF09
10     4     4     1 01057 01057 01057005… Plön  Kreis    42 --    ja    01    0     57    00    00    000   R     DEF0A
# ℹ 421 more rows
# ℹ 11 more variables: ARS_0 <chr>, AGS_0 <chr>, WSK <date>, DEBKG_ID <chr>, geometry <MULTIPOLYGON [m]>,
#   car_density <dbl>, ecar_share <chr>, publictransport_meandist <dbl>, population <dbl>, green_voteshare <dbl>,
#   afd_voteshare <dbl>

Spatial join

But what can we do if we do not have a matching identifier? For example, there are no matching administrative identifiers in the German district data and the e-charger data.

german_districts_enhanced

echarging_sf
Simple feature collection with 60549 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 5.243745 ymin: 47.2844 xmax: 15.54381 ymax: 55.50014
Geodetic CRS:  WGS 84
# A tibble: 60,549 × 6
   operator                      federal_state     power_kw type                   num_plugs            geometry
 * <chr>                         <chr>                <dbl> <chr>                      <dbl>         <POINT [°]>
 1 deer GmbH                     Baden-Württemberg       44 Normalladeeinrichtung          2  (9.72289 48.32789)
 2 EnBW mobility+ AG und Co.KG   Baden-Württemberg       93 Schnellladeeinrichtung         2  (9.87484 48.57853)
 3 SWU Energie GmbH              Baden-Württemberg       44 Normalladeeinrichtung          2  (10.1934 48.52898)
 4 SWU Energie GmbH              Baden-Württemberg       44 Normalladeeinrichtung          2 (10.08268 48.55354)
 5 SWU Energie GmbH              Baden-Württemberg       22 Normalladeeinrichtung          1 (10.07698 48.17996)
 6 EnBW mobility+ AG und Co.KG   Baden-Württemberg       30 Normalladeeinrichtung          2 (9.980588 48.48039)
 7 SWU Energie GmbH              Baden-Württemberg       22 Normalladeeinrichtung          2 (9.985855 48.48316)
 8 Albwerk GmbH & Co. KG         Baden-Württemberg       22 Normalladeeinrichtung          2 (9.760458 48.46448)
 9 EnBW ODR AG                   Baden-Württemberg       22 Normalladeeinrichtung          1 (10.02912 48.49882)
10 Autohaus Burger GmbH & Co. KG Baden-Württemberg       22 Normalladeeinrichtung          2 (9.774676 48.40366)
# ℹ 60,539 more rows

💡Solution: We conduct a spatial join!

Adding district information to the point layer

# Harmonize CRS between two datasets 
echarging_sf_transformed <-
  sf::st_crs(german_districts) |> 
  sf::st_transform(
    echarging_sf, 
    crs = _
  )

Adding district information to the point layer

# Harmonize CRS between two datasets 
echarging_sf_transformed <-
  sf::st_crs(german_districts) |> 
  sf::st_transform(
    echarging_sf, 
    crs = _
  )

# We are only interested in the municipality identifier
# AGS in the districts dataset; let's create a subset
german_districts_subset <- 
  dplyr::select(german_districts, AGS)

german_districts_subset
Simple feature collection with 431 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101487
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 431 × 2
   AGS                                                                                  geometry
   <chr>                                                                      <MULTIPOLYGON [m]>
 1 01001 (((526513.8 6075133, 526547.9 6074977, 526560.3 6074979, 526652.6 6074987, 526733.6 ...
 2 01002 (((575841.6 6032148, 575869.7 6032069, 575902.9 6032014, 575913.5 6031996, 575874.9 ...
 3 01003 (((623056.2 5983746, 623191.6 5983592, 623265.4 5983480, 623354.5 5983321, 623388.7 ...
 4 01004 (((565015.7 6000638, 565128.4 6000455, 565861.6 5999810, 565953.3 5999773, 566057.9 ...
 5 01051 (((505053.4 6023857, 505143.7 6023856, 505238.2 6023892, 505321.8 6023941, 505448.8 ...
 6 01053 (((614093.1 5965726, 614885.3 5964431, 614718.4 5964336, 614800.8 5964220, 614847.9 ...
 7 01054 (((464810.7 6100447, 464936.7 6100389, 465073.3 6100335, 465235 6100274, 465353.5 61...
 8 01055 (((637021.9 6029306, 637121.5 6029220, 637170.1 6029187, 637211.1 6029138, 637223.9 ...
 9 01056 (((549382.6 5971883, 550036.5 5970342, 550205.2 5969679, 550455.1 5969489, 550837.1 ...
10 01057 (((586149.4 6032802, 586197.3 6032788, 586276.6 6032789, 586301.4 6032818, 586330.5 ...
# ℹ 421 more rows

Adding district information to the point layer

# Harmonize CRS between two datasets 
echarging_sf_transformed <-
  sf::st_crs(german_districts) |> 
  sf::st_transform(
    echarging_sf, 
    crs = _
  )

# We are only interested in the municipality identifier
# AGS in the districts dataset; let's create a subset
german_districts_subset <- 
  dplyr::select(german_districts, AGS)

german_districts_subset

# Finally, joining the two datasets is easy
echarging_sf_joined <-
  sf::st_join(
    echarging_sf_transformed, 
    german_districts_subset
  )

plot(echarging_sf_joined["AGS"])

Counting objects

Imagine we want to count the number of charging stations in each German district.

# Again, adjust crs first
echarging_sf_transformed <-
  sf::st_transform(
    echarging_sf, 
    crs = sf::st_crs(german_districts)
  )

Counting objects

Imagine we want to count the number of charging stations in each German district. We now use sf::st_intersects() to identify all intersecting charging stations in each district.

# Again, adjust crs first
echarging_sf_transformed <-
  sf::st_transform(
    echarging_sf, 
    crs = sf::st_crs(german_districts)
  )

# Identify all chargers in each district
chargers_in_districts <- 
  sf::st_intersects(german_districts_enhanced, echarging_sf_transformed)

chargers_in_districts
Sparse geometry binary predicate list of length 431, where the predicate was `intersects'
first 10 elements:
 1: 58961, 58962, 58963, 58964, 58965, 58966, 58967, 58968, 58969, 58970, ...
 2: 58124, 59028, 59029, 59030, 59031, 59032, 59033, 59034, 59035, 59036, ...
 3: 59236, 59237, 59238, 59239, 59240, 59241, 59242, 59243, 59244, 59245, ...
 4: 59334, 59335, 59336, 59337, 59338, 59339, 59340, 59341, 59342, 59343, ...
 5: 57134, 57135, 57136, 57137, 57138, 57139, 57140, 57141, 57142, 57143, ...
 6: 37603, 57291, 57292, 57293, 57294, 57296, 57297, 57298, 57299, 57300, ...
 7: 37879, 57383, 57384, 57385, 57386, 57387, 57388, 57389, 57390, 57391, ...
 8: 39677, 57649, 57650, 57651, 57652, 57653, 57654, 57655, 57656, 57657, ...
 9: 28802, 57869, 57870, 57871, 57872, 57873, 57874, 57875, 57876, 57877, ...
 10: 58049, 58050, 58051, 58052, 58053, 58054, 58055, 58056, 58057, 58058, ...

Counting objects

Imagine we want to count the number of charging stations in each German district. We now use sf::st_intersects() to identify all intersecting charging stations in each district.

# Again, adjust crs first
echarging_sf_transformed <-
  sf::st_transform(
    echarging_sf, 
    crs = sf::st_crs(german_districts)
  )

# Identify all chargers in each district
chargers_in_districts <- 
  sf::st_intersects(german_districts_enhanced, echarging_sf_transformed)

chargers_in_districts

# Finally counting them and adding them as a variable
german_districts_enhanced$charger_in_districts <-
  lengths(chargers_in_districts)

german_districts_enhanced$charger_in_districts
  [1]   67  208   95   94  156  106  259  215  180   72  239  185  169   56  162 1223  216  121  452   92  100   48  101
 [24]   69   36  199 1032  122   54  118   33  111  117   77   63  114   26   98   78   96  161  136   48  130   33  107
 [47]   75  171   28   51  116  111  286   80  108   98   71  199  131   33   28  342   46  771  136  348   57  144   27
 [70]  120   65   79  119  204  366  292  204  279  240  659   55  543  178  213  103  143  138  138  347   61  105  167
 [93]  361  155  240  219  140  157  220  125   95  173  160  279  297  563   88  102   59  251  201  218   69  119  197
[116]  294  107  474   37  153  173  191  535  145  329  145   67  235   98  157  146  145  125  108   86  174  252   57
[139]  217   88   98   43   88   74   46   81   23   52  123   91   84   40  115   66   94   60   55   79   27   71   61
[162]   79  120   48   21   45   36   45   56   71   23   40   41   22   57   88  143   35 1304  768  457  241  391  353
[185]  217  433  156  185  105   84  232   55  202  293  209  132  199  123  306   67  173  128   96  233  194   83  383
[208]   92  129  109  175  144  131  229  244  114  231  434   96  246  337   72  782 1634   28  120   91   93  102  105
[231]  115  127  173  150  111   89  144   60  656   88  113  271  107  153  117   57   80   44  121  125  104  177  200
[254]   72   69   82  396   12  423   32   58   86  114   33  202  119   38   75   46   37   17   84   51   49   88   54
[277]   42   50   36   55   26   89  131  392   14  181   98   81  126   85  128   64   28   67  124  103   51   55   43
[300]  136   35  106  108   95  165   40   69   46   91  157   51   79  136   85  102  123  110  197  149   59   44   92
[323]   80   42 2534   33   38   20   81   96   88   32   70   62  114   42  608   73  136   26   45  113   73   81   40
[346]  115  115  192   73  108   80  183  159  123  114  234  406  147  101   95  110  371  129   82   40   99   93   30
[369]   57   89   91   91   26   72   82   82   64   30  107   56   53   19   56   27   63   43   32   44   28   69   54
[392]   32   43   93   42   33   50   63   66   29   41    0    0    0    0    0    0    0    0    0    0    0    0    0
[415]    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0

Charger count per district

tm_shape(german_districts_enhanced) +
  tm_polygons(
    fill = "charger_in_districts",
    fill.legend = 
      tm_legend(
        title = "Charger Count (Quantiles)"
      ),
    fill.scale = 
      tm_scale_intervals(
        style = "quantile", 
        values = "viridis"
      ),
    col = "lightgrey"
  )

3. Subsetting

Subsetting vector data

One might be interested in only one specific area of Germany, like Cologne. To subset a sf object, you can often use your usual data wrangling workflow. In this case, I know the municipality ID (AGS), which is the only row (and column) I want to keep.

# Subsetting
cologne <-
  german_districts_enhanced |> 
  dplyr::filter(AGS == "05315") |>  
  dplyr::select(AGS) 

plot(cologne)

Using sf for subsetting

If you have no information about IDs but only about the geolocation, you can use methods like sf::st_touches() (or again sf::st_intersects(), or sf::st_within(), sf::st_crosses(), …) to identify, for example, all districts which share a border with Cologne.

# Extract the 'touch points' between two datasets
cologne_touch_points <-
  sf::st_touches(german_districts_enhanced, cologne)

cologne_touch_points
Sparse geometry binary predicate list of length 431, where the predicate was `touches'
first 10 elements:
 1: (empty)
 2: (empty)
 3: (empty)
 4: (empty)
 5: (empty)
 6: (empty)
 7: (empty)
 8: (empty)
 9: (empty)
 10: (empty)

Using sf for subsetting

If you have no information about IDs but only about the geolocation, you can use methods like sf::st_touches() (or again sf::st_intersects(), or sf::st_within(), sf::st_crosses(), …) to identify, for example, all districts which share a border with Cologne.

# Extract the 'touch points' between two datasets
cologne_touch_points <-
  sf::st_touches(german_districts_enhanced, cologne)

# Again, we are counting the numbers of touch points. 
cologne_touch_points <- lengths(cologne_touch_points) 

cologne_touch_points
  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [58] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[115] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[172] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[229] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[286] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[343] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[400] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Using sf for subsetting

If you have no information about IDs but only about the geolocation, you can use methods like sf::st_touches() (or again sf::st_intersects(), or sf::st_within(), sf::st_crosses(), …) to identify, for example, all districts which share a border with Cologne.

# Extract the 'touch points' between two datasets
cologne_touch_points <-
  sf::st_touches(german_districts_enhanced, cologne)

# Again, we are counting the numbers of touch points. 
cologne_touch_points <- lengths(cologne_touch_points) 

cologne_touch_points

# Length of mutual border > 0
cologne_touch_points <- lengths(cologne_touch_points) > 0

cologne_touch_points
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [24] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [47] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [70] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [93] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[116] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[139] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[162] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[185] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[208] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[231] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[254] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[277] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[300] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[323] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[346] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[369] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[392] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[415] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Using sf for subsetting

If you have no information about IDs but only about the geolocation, you can use methods like sf::st_touches() (or again sf::st_intersects(), or sf::st_within(), sf::st_crosses(), …) to identify, for example, all districts which share a border with Cologne.

# extract the 'touch points' between two datasets
cologne_touch_points <-
  sf::st_touches(german_districts_enhanced, cologne)

# again, we are counting the numbers of touch points. 
cologne_touch_points <- lengths(cologne_touch_points) 

cologne_touch_points

# length of mutual border > 0
cologne_touch_points <- cologne_touch_points > 0

cologne_touch_points

cologne_surrounding <-
  german_districts_enhanced |> 
  dplyr::select(AGS) |>  
  dplyr::filter(cologne_touch_points) 

plot(cologne_surrounding)

Cropping data

We could also use as simple spatial extent, e.g., from a raster dataset, to subset our e-charging data. But first, we have to adjust the CRS again (use sf::st_crs(echarging_sf) to look up EPSG code):

inhabitants_cologne <- terra::rast("./data/inhabitants_cologne.tif")

Now we can use a procedure called cropping to subset the data:

echarging_sf_cologne <-
  echarging_sf |>
  # if possible, always transform to CRS of the raster dataset
  sf::st_transform(3035) |> 
  sf::st_crop(inhabitants_cologne)

E-charging stations in Cologne

The result is an vector dataset comprising e-charging stations in Cologne.

tm_shape(echarging_sf_cologne) +
  tm_dots()

‘Subsetting’ Raster Layers

As you know, we can subset vector data by simply filtering for specific attribute values. For example, to subset Cologne’s districts only by the one of Deutz, we can use the dplyr for sf data:

# raster data for Cologne
inhabitants_cologne <- 
  terra::rast("./data/inhabitants_cologne.tif")

inhabitants_cologne[inhabitants_cologne < 0] <- NA

# Cologne district data
deutz <-
  sf::st_read(
    "./data/cologne.shp",
    quiet = TRUE
  ) |> 
  # convert to same CRS as raster data
  sf::st_transform(3035) |> 
  dplyr::filter(NAME == "Deutz")

tm_shape(inhabitants_cologne) +
  tm_raster() +
  tm_shape(deutz) +
  tm_borders()

Cropping raster data

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

cropped_inhabitants_cologne <-
  terra::crop(
    inhabitants_cologne, 
    deutz
  )

tm_shape(cropped_inhabitants_cologne) +
  tm_raster()

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

masked_inhabitants_cologne <-
  raster::mask(
    inhabitants_cologne, 
    deutz
  )

tm_shape(masked_inhabitants_cologne) +
  tm_raster()

Combining Cropping and Masking

cropped_masked_inhabitants_cologne <-
  terra::crop(
    inhabitants_cologne, 
    deutz
  ) |> 
  terra::mask(deutz)

tm_shape(cropped_masked_inhabitants_cologne) +
  tm_raster()

4. Extraction & Aggregation

Changes in terminology

If we only want to add one attribute from a vector dataset Y to another vector dataset X, we can conduct a spatial join using sf::st_join() as before. There is nothing new to tell. In the raster data world, these operations are called raster extractions.

Extracting information from raster data

Raster data are helpful when we aim to

  • Apply calculations that are the same for all geometries in the dataset
  • Extract information from the raster fast and efficient

Do you remember these data operations?

immigrants_cologne <-  terra::rast("./data/immigrants_cologne.tif")
inhabitants_cologne <- terra::rast("./data/inhabitants_cologne.tif")

immigrants_cologne[immigrants_cologne == -9] <- NA
inhabitants_cologne[inhabitants_cologne == -9] <- NA

immigrant_rate <-
  immigrants_cologne * 100 / 
  inhabitants_cologne

Raster extraction

To extract the raster values at a specific point by location, also called lookup, we use the following:

immigrant_rate_lookup <-
  terra::extract(
    x = immigrant_rate, # raster layer to extract from
    y = echarging_sf_cologne, # points to extract to
    raw = TRUE, # output as raw data (no data.frame)
    ID = FALSE  # no ID column before values (really just the raw data...)
  )

# there are many missing values which is why we are looking at such a weird
# data range
immigrant_rate_lookup[170:180,] 
 [1] 38.46154 38.46154 38.46154 38.46154 38.46154 38.46154 38.46154 38.46154 36.36364 22.22222 58.50340

Add results to existing dataset

This information can be added to an existing dataset (our points in this example):

# Adding the extracted values to our dataset as a new column
# Note: the output of terra::extract() is a raw data matrix, which produces
# not so pretty column names. Therefore, we wrap it in the as.vector()
# function.
echarging_sf_cologne$immigrant_rate_value <- as.vector(immigrant_rate_lookup)

echarging_sf_cologne[170:180,]
Simple feature collection with 11 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4106438 ymin: 3096255 xmax: 4107468 ymax: 3097058
Projected CRS: ETRS89-extended / LAEA Europe
# A tibble: 11 × 7
   operator                     federal_state    power_kw type  num_plugs          geometry immigrant_rate_value
   <chr>                        <chr>               <dbl> <chr>     <dbl>       <POINT [m]>                <dbl>
 1 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106442 3096473)                 38.5
 2 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106444 3096481)                 38.5
 3 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106445 3096478)                 38.5
 4 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106440 3096483)                 38.5
 5 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106445 3096490)                 38.5
 6 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106447 3096490)                 38.5
 7 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106443 3096486)                 38.5
 8 Ladegrün! eG                 Nordrhein-Westf…       22 Norm…         1 (4106438 3096481)                 38.5
 9 Q-Park Recharge Germany GmbH Nordrhein-Westf…       44 Norm…         4 (4106713 3096460)                 36.4
10 TankE GmbH                   Nordrhein-Westf…       11 Norm…         1 (4107039 3096255)                 22.2
11 TankE GmbH                   Nordrhein-Westf…       44 Norm…         2 (4107468 3097058)                 58.5

More elaborated: spatial buffers

Sometimes, extracting information 1:1 is not enough. - It’s too narrow - There is missing information about the surroundings of a point

For this endeavor, we can use the function sf::st_buffer(). Here’s an example of creating a circular area of 500 meters around each point in our e-charging subset data of Cologne:

# original data
sf::st_geometry_type(
  echarging_sf_cologne, 
  by_geometry = FALSE
)
[1] POINT
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT MULTILINESTRING MULTIPOLYGON ... TRIANGLE
# buffered data
echarging_sf_cologne_buffered <-
  sf::st_buffer(echarging_sf_cologne, 500) 

sf::st_geometry_type(
  echarging_sf_cologne_buffered, 
  by_geometry = FALSE
)
[1] POLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT MULTILINESTRING MULTIPOLYGON ... TRIANGLE

This is how buffers looks together with the raster data

tm_shape(immigrant_rate) +
  tm_raster() +
  tm_shape(
    echarging_sf_cologne_buffered 
  ) +
  tm_dots(size = .1) +
  tm_borders()

Buffer extraction

We can use spatial buffers of different sizes to extract information on surroundings:

immigrant_rate_buffer <-
  terra::extract(
    x = immigrant_rate, 
    y = echarging_sf_cologne_buffered,
    fun = mean,
    na.rm = TRUE,
    raw = TRUE,
    ID = FALSE
  )

immigrant_rate_buffer[170:180,]
 [1] 23.17138 23.17138 23.20291 23.17138 23.17138 23.20291 23.17138 23.17138 24.75987 24.59184 29.54656

Add results to existing dataset

This information can be added to an existing dataset (our points in this example):

# Adding the extracted values to our dataset as a new column
# Note: the output of terra::extract() is a bit annoying as it is stored as a
# list (despite the raw = TRUE argument). Therefore, we wrap the output in the
# unlist() function
echarging_sf_cologne$immigrant_rate_500 <- unlist(immigrant_rate_buffer)

echarging_sf_cologne[170:180,]
Simple feature collection with 11 features and 7 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4106438 ymin: 3096255 xmax: 4107468 ymax: 3097058
Projected CRS: ETRS89-extended / LAEA Europe
# A tibble: 11 × 8
   operator     federal_state power_kw type  num_plugs          geometry immigrant_rate_value immigrant_rate_500
   <chr>        <chr>            <dbl> <chr>     <dbl>       <POINT [m]>                <dbl>              <dbl>
 1 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106442 3096473)                 38.5               23.2
 2 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106444 3096481)                 38.5               23.2
 3 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106445 3096478)                 38.5               23.2
 4 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106440 3096483)                 38.5               23.2
 5 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106445 3096490)                 38.5               23.2
 6 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106447 3096490)                 38.5               23.2
 7 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106443 3096486)                 38.5               23.2
 8 Ladegrün! eG Nordrhein-We…       22 Norm…         1 (4106438 3096481)                 38.5               23.2
 9 Q-Park Rech… Nordrhein-We…       44 Norm…         4 (4106713 3096460)                 36.4               24.8
10 TankE GmbH   Nordrhein-We…       11 Norm…         1 (4107039 3096255)                 22.2               24.6
11 TankE GmbH   Nordrhein-We…       44 Norm…         2 (4107468 3097058)                 58.5               29.5

Raster aggregation

We can use the same procedure to aggregate a raster dataset into a vector polygon dataset. That’s a widespread use case. Let’s load our Cologne shapefile.

cologne <- 
  sf::st_read(
    "./data/cologne.shp",
    quiet = TRUE
  ) |> 
  # transform to CRS of raster data
  sf::st_transform(3035)

cologne
Simple feature collection with 86 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 4094821 ymin: 3084022 xmax: 4121277 ymax: 3112952
Projected CRS: ETRS89-extended / LAEA Europe
First 10 features:
   OBJECTI NUMMER        NAME NR_STAD      STADTBE FLAECHE LINK                       geometry
1        1    202  Marienburg       2 Rodenkirchen 3046374 <NA> MULTIPOLYGON (((4108023 309...
2        2    201   Bayenthal       2 Rodenkirchen 1284438 <NA> MULTIPOLYGON (((4108228 309...
3        3    209        Weiß       2 Rodenkirchen 4159650 <NA> MULTIPOLYGON (((4113250 309...
4        4    207    Hahnwald       2 Rodenkirchen 2993683 <NA> MULTIPOLYGON (((4107811 309...
5        5    213  Meschenich       2 Rodenkirchen 4719852 <NA> MULTIPOLYGON (((4104640 308...
6        6    212   Immendorf       2 Rodenkirchen 5202232 <NA> MULTIPOLYGON (((4106764 308...
7        7    211      Godorf       2 Rodenkirchen 4592631 <NA> MULTIPOLYGON (((4109432 308...
8        8    308    Lövenich       3   Lindenthal 3692013 <NA> MULTIPOLYGON (((4098703 309...
9        9    307      Weiden       3   Lindenthal 3654710 <NA> MULTIPOLYGON (((4099642 309...
10      10    306 Junkersdorf       3   Lindenthal 7376623 <NA> MULTIPOLYGON (((4099963 309...

Add the aggregated data

cologne$immigrant_rate <-
  terra::extract(
    x = immigrant_rate, 
    y = cologne, 
    fun = mean, 
    na.rm = TRUE, 
    ID = FALSE
  ) |> 
  unlist()

plot(cologne["immigrant_rate"])

Other forms of data aggregation

If you ‘simply’ want to aggregate attributes and geometries of vector data, you can rely on st_combine(x) , st_union(x,y) and st_intersection(x, y) to combine vector datasets, resolve borders and return the intersection of two vector datasets

For raster data, you can aggregate with the function terra::aggregate()(if you have matching raster files) in combination with terra::resample() (if your raster files don’t match).

To deal with spatial misalignment:

Data aggregation

german_districts <-
  sf::read_sf("./data/VG250_KRS.shp") |> 
  dplyr::mutate(
    federal_state =
      as.numeric(stringr::str_sub(AGS,1,2))
  ) 

german_states <-
  german_districts |>  
  dplyr::group_by(federal_state) |>  
  dplyr::summarize(
    geometry = sf::st_union(geometry)
  )

plot(german_states["geometry"])

Exercise 4_2: Subsetting and Linking💪

🖱Click here for the exercise

Backup / For Home Use

5. Conversion & Analysis

Raster to points

raster_now_points <-
  immigrant_rate |> 
  terra::as.points()

plot(raster_now_points)

Points to raster

raster_target_layer <- 
  terra::ext(raster_now_points) |> 
  terra::rast(res = 100)

points_now_raster <- 
  raster_now_points |> 
  terra::rasterize(
    raster_target_layer, 
    field = "cat_0", # get it with names(raster_now_points) 
    fun = "mean",
    background = 0
  )

plot(points_now_raster)

Raster to polygons

polygon_raster <-
  immigrant_rate |>  
  terra::as.polygons() |> 
  sf::st_as_sf()

plot(polygon_raster)

Analysis application: creating a quick ‘heatmap’

Points of interest data are nice for analyzing urban infrastructure. Let’s draw a quick ‘heatmap’ using observation densities.

echarging_sf_cologne_raster <-
  terra::rast(
    echarging_sf_cologne, 
    res = 1000
  )

echarging_sf_cologne_densities <- 
  echarging_sf_cologne |> 
  terra::rasterize(
    echarging_sf_cologne_raster, 
    fun = length, 
    background = 0
  )

plot(echarging_sf_cologne_densities)

Focal statistics / spatial filter

Focal statistics are another method of including information near a point in space. However, it’s applied to the whole dataset and is independent of arbitrary points we project onto a map.

  • Relates focal cells to surrounding cells
  • Vastly used in image processing
  • But also applicable in social science research, as we will see

Analysis: edge detection

Source: https://en.wikipedia.org/wiki/Sobel_operator

Edges of immigrant rates

We can do that as well using a sobel filter

\[r_x = \begin{bmatrix}1 & 0 & -1 \\2 & 0 & -2 \\1 & 0 & -1\end{bmatrix} \times raster\_file \\r_y = \begin{bmatrix}1 & 2 & 1 \\0 & 0 & 0 \\-1 & -2 & -1\end{bmatrix}\times raster\_file \\r_{xy} = \sqrt{r_{x}^2 + r_{y}^2}\]

Implementation in R

From the official documentation:

sobel <- function(r) {
  fy <- matrix(c(1, 0, -1, 2, 0, -2, 1, 0, -1), nrow = 3)
  fx <- matrix(c(-1, -2, -1, 0, 0, 0, 1, 2, 1) , nrow = 3)
  rx <- terra::focal(r, fx)
  ry <- terra::focal(r, fy)
  sqrt(rx^2 + ry^2)
}

Data preparation and application of filter

old_extent <- terra::ext(immigrant_rate) 
new_extent <- old_extent + c(10000, -10000, 10000, -10000)

smaller_immigrant_rate <-
  immigrant_rate |> 
  terra::crop(new_extent)

smaller_immigrant_rate[smaller_immigrant_rate < 10] <- NA

immigrant_edges <- sobel(smaller_immigrant_rate)

Comparison

plot(smaller_immigrant_rate)
plot(immigrant_edges)