Applied Spatial Linking

GESIS Workshop: Introduction to Geospatial Techniques for Social Scientists in R

Stefan Jünger & Dennis Abel

2025-04-10

Now

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Spatial Wrangling
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Analysis
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

What are georeferenced data?

Data with a direct spatial reference \(\rightarrow\) geo-coordinates

  • Information about geometries
  • Optional: Content in relation to the geometries

Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019

Georeferenced survey data

Survey data enriched with geo-coordinates (or other direct spatial references).

With georeferenced survey data, we can analyze interactions between individual behaviors and attitudes and the environment.

An example workflow

From the addresses to analyses with georeferenced survey data, several steps and challenges along the way. We will talk about:

  • Data Protection & Data Access
  • Geocoding
  • Spatial Data Linking
  • An example workflow using the sora package

Data protection

That‘s one of the biggest issues.

  • Explicit spatial references increase the risk of re-identifying anonymized survey respondents
  • Can occur during the processing of data but also during the analysis

Affects all phases of research and data management!

Data availability

Geospatial Data

  • Often de-centralized distributed
  • Fragmented data landscape, at least in Germany

Georeferenced Survey Data

  • Primarily, survey data
  • Depends on documentation
  • Access difficult due to data protection restrictions

https://www.eea.europa.eu/data-and-maps https://datasearch.gesis.org/ https://datasetsearch.research.google.com/

Distribution & re-identification risk

Even without (in)direct spatial references, data may still be sensitive.

  • Geospatial attributes add new information to existing data
  • Maybe part of general data privacy checks, but we may not distribute these data as is

Safe Rooms / Secure Data Centers

  • Control access
  • Checks output

https://www.gesis.org/en/services/processing-and-analyzing-data/guest-research-stays/secure-data-center-sdc

Geocoding

Geocoding is the conversion of indirect spatial references (e.g., addresses) into direct spatial references (e.g., coordinates)

However, conducting this procedure is tricky (not only in R). Many services are either

  • Expensive (at least they cost money or have other restrictions)
  • Probably not data protection-friendly (Hey Google)
  • Or both

Our Approach

We rely on a service offered by the Federal Agency of Cartography and Geodesy (BKG):

  • Online interface and API for online geocoding
  • Offline geocoding possible based on raw data
  • But: Data and service used to be restricted

bkggeocoder

R package bkggeocoder developed at GESIS for (offline) geocoding by Stefan and Jonas Lieth:

New interface in the sora package

We can now also use the sora package to geocode addresses (but thus far, with fewer features than bkggeocoder).

leibniz_addresses <-
  tibble::tribble(
    ~id, ~street, ~house_number, ~zip_code, ~place, ~institute,
    1, "B 2", "1", "68159", "Mannheim", "GESIS",
    2, "Unter Sachsenhausen", "6-8",  "50667", "Köln", "GESIS",
    3, "Kellnerweg", "4", "37077", "Göttingen", "DPZ",
    4, "Reichsstr.", "4-6", "04109",  "Leipzig", "GWZO",
    5, "Schöneckstraße", "6", "79104", "Freiburg", "KIS",
    6, "Albert-Einstein-Straße", "29a", "18059", "Rostock", "LIKAT",
    7, "L7", "1", "68161", "Mannheim", "ZEW",
    8, "Müggelseedamm", "310", "12587", "Berlin", "IGB",
    9, "Campus D2", "2", "66123", "Saarbrücken", "INM",
    10, "Eberswalder Straße", "84", "15374", "Müncheberg (Mark)", "ZALF"
  )

leibniz_addresses
# A tibble: 10 × 6
      id street                 house_number zip_code place             institute
   <dbl> <chr>                  <chr>        <chr>    <chr>             <chr>    
 1     1 B 2                    1            68159    Mannheim          GESIS    
 2     2 Unter Sachsenhausen    6-8          50667    Köln              GESIS    
 3     3 Kellnerweg             4            37077    Göttingen         DPZ      
 4     4 Reichsstr.             4-6          04109    Leipzig           GWZO     
 5     5 Schöneckstraße         6            79104    Freiburg          KIS      
 6     6 Albert-Einstein-Straße 29a          18059    Rostock           LIKAT    
 7     7 L7                     1            68161    Mannheim          ZEW      
 8     8 Müggelseedamm          310          12587    Berlin            IGB      
 9     9 Campus D2              2            66123    Saarbrücken       INM      
10    10 Eberswalder Straße     84           15374    Müncheberg (Mark) ZALF     

Setup and run the Geocoding

# load sora package
library(sora)

# set API key for the session
Sys.setenv(SORA_API_KEY = readLines("sora_key"))

# check if the sora API can be reached
sora_available()
[1] TRUE

Setup and run the Geocoding

# load sora package
library(sora)

# set API key for the session
Sys.setenv(SORA_API_KEY = readLines("sora_key"))

# check if the sora API can be reached
sora_available()

# start the geocoding
leibniz_addresses <-
  sora::sora_geocoder(
    leibniz_addresses
  )
Information from SoRa: 

Setup and run the Geocoding

# load sora package
library(sora)

# set API key for the session
Sys.setenv(SORA_API_KEY = readLines("sora_key"))

# check if the sora API can be reached
sora_available()

# start the geocoding
leibniz_addresses <-
  sora::sora_geocoder(
    leibniz_addresses
  )

# check status
sora::sora_job_status(leibniz_addresses)
Information from SoRa: 
2026-04-22 15:03:58: WAITING ─ The geocoding job is waiting to be processed

Setup and run the Geocoding

# load sora package
library(sora)

# set API key for the session
Sys.setenv(SORA_API_KEY = readLines("sora_key"))

# check if the sora API can be reached
sora_available()

# start the geocoding
leibniz_addresses <-
  sora::sora_geocoder(
    leibniz_addresses
  )

# check status
sora::sora_job_status(leibniz_addresses)
[1] "Waiting for SoRa API to be finished..."
[1] "Waiting for SoRa API to be finished..."
Information from SoRa: 
2026-04-22 15:04:17: SUCCESSFUL ─ Linking was finished successfully

Setup and run the Geocoding

# load sora package
library(sora)

# set API key for the session
Sys.setenv(SORA_API_KEY = readLines("sora_key"))

# check if the sora API can be reached
sora_available()

# start the geocoding
leibniz_addresses <-
  sora::sora_geocoder(
    leibniz_addresses
  )

# check status
sora::sora_job_status(leibniz_addresses)

# pulling the results from the server
leibniz_addresses <- sora::sora_results(leibniz_addresses)

leibniz_addresses
Information from SoRa: 
# A tibble: 10 × 4
   id        y     x score     
   <chr> <dbl> <dbl> <chr>     
 1 1      49.5  8.46 0.999     
 2 2      50.9  6.95 0.9836667 
 3 3      51.6  9.95 0.98488575
 4 4      51.3 12.4  0.9822792 
 5 5      48.0  7.86 0.984953  
 6 6      54.1 12.1  0.99587816
 7 7      49.5  8.48 0.999     
 8 8      52.4 13.6  0.999     
 9 9      49.3  7.04 0.9396661 
10 10     52.5 14.1  0.9485    

Convert To sf Object And Plot

leibniz_addresses_sf <-
  leibniz_addresses |> 
  sf::st_as_sf(coords = c("x", "y"), crs = 4326)

tmaptools::read_osm(
  leibniz_addresses_sf, 
  type = "esri-topo"
) |> 
  terra::rast() |> 
  tm_shape() +
  tm_rgb() +
  tm_shape(leibniz_addresses_sf) +
  tm_dots(size = 2, col = "red")