Introduction

GESIS Workshop: Introduction to Geospatial Techniques for Social Scientists in R

Stefan Jünger, Anne Stroppe & Dennis Abel

2026-04-23

The goal of this course

This course will teach you how to exploit R and apply its geospatial techniques in a social science context.

By the end of this course, you should…

  • Be comfortable with using geospatial data in R
  • Including importing, wrangling, and exploring geospatial data
  • Be able to create maps based on your very own processed geospatial data in R
  • Feel prepared for (your first steps in) spatial analysis

We are (necessarily) selective

There’s a multitude of spatial R packages

  • We cannot cover all of them
  • And we cannot cover all functions
  • You may have used some we are not familiar with

We will show the use of packages we exploit in practice

  • There’s always another way of doing things in R
  • Don’t hesitate to bring up your solutions

You can’t learn everything at once, but you also don’t have to!

Prerequisites for this course

  • Knowledge of R, its syntax, and internal logic
  • Affinity for using script-based languages
  • Don’t be scared to wrangle data with complex structures
  • Working versions of R (and Rstudio) on your computer

About us (Stefan)

  • Senior Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in Social Sciences, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Social inequalities
    • Attitudes towards minorities
    • Environmental attitudes
    • Reproducible research

About us (Anne)

  • Postdoctoral Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in Political Science, University of Mannheim
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Political trust, resentment and voting
    • Spatial Disparities
    • Data Quality of Linked Data

About us (Dennis)

  • Postdoctoral Researcher in the team Survey Data Augmentation at the GESIS department Survey Data Curation
  • Ph.D. in Political Economy, University of Cologne
  • Research interests:
    • Quantitative methods, Geographic Information Systems (GIS)
    • Environmental attitudes and behavior
    • Public policy
    • Open source software

About you

  • What’s your name?
  • Where do you work/research?
  • What are you working on/researching?
  • What is your experience with R or other programming languages?
  • Do you already have experience with geospatial data?

Course schedule

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Spatial Wrangling
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Analysis
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

Now

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Spatial Wrangling
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Analysis
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

“All things are connected”

Catchphrase #1: Tobler’s Law:I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things.” (Tobler 1970)1

Catchphrase #2: Tobler’s Addendum:near can take on many meanings in different situations.” (Tobler 2004)2

\(\rightarrow\)Space is more than geography” (Beck et al. 2006)3


Spatial Association

Source: Gretarsson based on Zensus 2022

“All things are connected”

A lot of (classic) theories inherently make use of space (e.g., Allport 1954)1

  • It’s where people interact
  • It’s what people collectively shape
  • Space becomes place

Thus, there’s a deep intersection or even embeddedness of space in social science research

  • It’s what geographers call “human-environment-system”
  • But often, these links are even only implicit in our data

Geographic information in social science

Exploiting geographic information is not new.

For example, Siegfried (1913)1 used soil composition information to explain election results in France.

The book is often seen as foundational for electoral geography because it demonstrates that political behavior is embedded in place.

Remember the Chicago School?

Park, Burgess, and McKenzie (1925) argue that the city should be studied not just as a physical settlement, but as a social and ecological order shaped by interaction, competition, mobility, institutions, and patterns of land use.

Today

So many studies still rely on these ideas but incorporate space directly, e.g.,

  • Iyer, A., & Pryce, G. (2023). Theorising the causal impacts of social frontiers: The social and psychological implications of discontinuities in the geography of residential mix. Urban Studies, https://doi.org/10.1177/00420980231194834
  • Kent, J. (2022). Can urban fabric encourage tolerance? Evidence that the structure of cities influences attitudes toward migrants in Europe. Cities, 121, 103494. https://doi.org/10.1016/j.cities.2021.103494
  • Schmidt, K., Jacobsen, J., & Iglauer, T. (2023). Proximity to refugee accommodations does not affect locals’ attitudes toward refugees: Evidence from Germany. European Sociological Review, jcad028. https://doi.org/10.1093/esr/jcad028
  • Xu, A. Z. (2023). Segregation and the Spatial Externalities of Inequality: A Theory of Interdependence and Public Goods in Cities. American Political Science Review, 1–18. https://doi.org/10.1017/S0003055423000722
  • Jünger, S., & Schaeffer, M. (2023). Ethnic Diversity and Social Integration—What are the Consequences of Ethnic Residential Boundaries and Halos for Social Integration in Germany? KZfSS Kölner Zeitschrift Für Soziologie Und Sozialpsychologie. https://doi.org/10.1007/s11577-023-00888-1

Xu, 2023

PolSci perspectives

Since 1815, the probability that a randomly chosen country will be a democracy is about 0.75 if the majority of its neighbours are democracies, but only 0.14 if the majority of its neighbors are non-democracies.” (Gleditsch and Ward 2006)

V-Dem. 2026

Some recent studies

Hoffmann et al. 2022 analyse how the experience of climate anomalies and extremes influences environmental attitudes and vote intention in Europe

    • Data integration of 1. harmonized Eurobarometer data, 2. EU parliamentary electoral data, and 3. climatological data
    • Aggregation on regional levels (NUTS-2 and NUTS-3)
    • Climatological data from ERA5 reanalysis (CS3)
    • Calculations of temperature anomalies and extremes based on reference period (1971-2000)
    • Findings suggest effect of temperature anomalies (heat, “dry spell”) on environmental concern and vote intention

Some recent studies

Jean et al. 2016 show how nighttime maps can be utilized as estimates of household consumption and assets

    • Economic indicators are hard to measure in poorer countries - satellite imagery could be an alternative proxy for it
    • The authors integrate 1. survey data (World Bank’s Living Standards Measurement Surveys - LSMS; and Demographic and Health Surveys - DHS) with 2. nighttime light data in five African countries - Nigeria, Tanzania, Uganda, Malawi, and Rwanda
    • ML approach for image feature extraction in nighttime maps
    • Daytime satellite images from Google Static Maps, nighttime lights from US DMSP
    • Model can explain up to 75% of variation in local-level economic outcomes

Our work: Deprivation & discontent

Our work: Climate risks

The challenge in a nutshell

The challenge in a nutshell

The challenge in a nutshell

What are geospatial data?

Data with a direct spatial reference

\(\rightarrow\) geo-coordinates x, y (and z)

Visualizing geometries in different styles depending on format:

  • Vector data (points, lines, polygons)
  • Raster data (grids)
  • Coordinate Reference System (CRS)

Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019

Coordinate reference system (CRS)

  • CRS is a reference system to determine the precise location of points in space
  • GIS programs MUST know CRS for accurate processing, visualization, and analysis of data

In practice, what matters most is that two or more layers match when integrating them.

Coordinate reference system (CRS)

    You may hear from geographic, geocentric, projected, or local CRS.

    What’s the difference?

    • whether 2 dimensional (longitude, latitude) or 3 dimensional (+height) coordinates are used
    • the location of the coordinate system’s origin (center of Earth or not)
    • projection on a flat surface (transformation of longitudes and latitudes to x and y coordinates)
    • location (the smaller, the more precise the projections)

    CRS is based on the:

    1. Geographic Coordinate System (GCS) +
    2. Projected Coordinate System (PCS)

Geographic Coordinate System (GCS)

Necessary to know where exactly on Earth’s surface data is located

    • GCS uses three-dimensional spherical surface to define locations based on datum and latitude and longitude lines
    • Datum: Mathematical model of the Earth that serves as reference point by defining size and shape of Earth
    • Local datum: Optimizes fit for particular location (like NAD83)
    • Geocentric datum: Optimizes fit for entire Earth (like Word Geodetic Survey 1984 - WGS84)
    • WGS84 is standard for GPS and many applications

Source: Caitlin Dempsey

Projected Coordinate System (PCS)

Necessary to draw the data on a flat map

  • PCS represents Earth’ surface on a flat plane by mathematical transformations (projections)
  • Coordinate grid: Here we talk about x and y coordinates (= easting and northing)
  • Conversion of degrees of latitude and longitude into measurable units (like meters)

Different projection approaches. Left: Planar, middle: conic, right: cylindrical. Source

Common PCS - UTM

Universal Transverse Mercator (UTM) is a global map projection which:

    • Projects globe onto a cylinder tangent to a central meridian
    • Divides it into 60 zones
    • Distortion is minimized within each zone
    • Provides high accuracy for small areas

Source

Projections are political

Projections are political

Documentation of CRS

Every geodata object requires a description of the CRS

  • GCS and datum
  • PCS
  • x and y units (like meters)
  • Domain (maximum allowable x and y values)
  • Resolution

Old standard: PROJ.4 strings

This is how your information about the CRS are defined in a classic standard:

+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Source: https://epsg.io/3035

(It’s nothing you would type by hand)

WKT (“Well Known Text”)


PROJCS["ETRS89 / LAEA Europe",
    GEOGCS["ETRS89",
        DATUM["European_Terrestrial_Reference_System_1989",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6258"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4258"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",52],
    PARAMETER["longitude_of_center",10],
    PARAMETER["false_easting",4321000],
    PARAMETER["false_northing",3210000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","3035"]]

Source: https://epsg.io/3035

EPSG Codes

Eventually, working with CRS in R will not be as challenging as it may seem since we don’t have to use PROJ.4 or WKT strings directly.

Most of the time, it’s enough to use so-called EPSG Codes (“European Petroleum Survey Group Geodesy”), a small digit sequence.

Layers Must Match!

EPSG:3857

EPSG:3035

Source: Statistical Office of the European Union Eurostat (2018) / Jünger, 2019

Geospatial data and software

Increased amount of available data

  • Quantitative and on a small spatial scale
  • Often open source and free access

Better tools

  • Standard software, such as R, can be used as Geographic Information System (GIS)

So this is GIS!

Most common understanding: Geographic Information Systems (GIS) as specific software to process geospatial data for

  • Visualization
  • Analysis
  • Interpretation

\(\rightarrow\) In our case, of course, it is R

But base R is limited when it comes to handling geospatial data

Packages in this course I 📦

We will use plenty of different packages during the course, but only a few are our main drivers (e.g., the sf package). Here’s a list of core packages from CRAN you may need for the exercises. The first exercise contains the full list, including Non-CRAN packages.

Packages in this course II 📦

Illustration by Allison Horst

Geospatial data in this course I

In the folder called ./data, you can find (most of) the data files prepped for all the exercises and slides. The following data are included:

Geospatial data in this course II

Please make sure that if you reuse any of the provided data to cite the original data sources.

More details on geospatial data

Let’s learn about geospatial data as we learn about specific formats

Source

Exercise 1: Package Installation

Exercise