Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Geospatial Techniques for Social Scientists in R

Raster Data

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 23, 2024

1 / 60

Now

Day Time Title
April 23 10:00-11:30 Introduction to GIS
April 23 11:45-13:00 Vector Data
April 23 13:00-14:00 Lunch Break
April 23 14:00-15:30 Mapping
April 23 15:45-17:00 Raster Data
April 24 09:00-10:30 Advanced Data Import & Processing
April 24 10:45-12:00 Applied Data Wrangling & Linking
April 24 12:00-13:00 Lunch Break
April 24 13:00-14:30 Investigating Spatial Autocorrelation
April 24 14:45-16:00 Spatial Econometrics & Outlook
2 / 60

General Difference to Vector Data

Data Structure:

  • Other data format(s), different file extensions
  • geometries do not differ within one dataset

Implications:

  • Other geospatial operations possible

Benefits:

  • can be way more efficient
  • straightforward processing of raster values and extraction of zonal statistics
  • it's like working with simple tabular data
3 / 60

Visual Difference Between Vector and Raster Data

4 / 60

What Exactly Are Raster Data?

  • Hold information on (most of the time) evenly shaped grid cells
  • Basically, a simple data table
  • each cell represents one observation

5 / 60

Metadata

  • Information about geometries is globally stored
  • they are the same for all observations
  • their location in space is defined by their cell location in the data table
  • Without this information, raster data were simple image files
6 / 60

Important Metadata

  • Raster Dimensions
    • number of columns, rows, and cells
  • Extent
    • Similar to bounding box in vector data
  • Resolution
    • the size of each raster cell
  • Coordinate reference system
    • defines where on the earth's surface the raster layer lies
7 / 60

All Beginnings Are... Easy!

terra::rast()
## class : SpatRaster
## dimensions : 180, 360, 1 (nrow, ncol, nlyr)
## resolution : 1, 1 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
8 / 60

Feed With Data

input_data <-
sample(1:100, 16) |>
matrix(nrow = 4)
raster_layer <- terra::rast(input_data)
raster_layer
## class : SpatRaster
## dimensions : 4, 4, 1 (nrow, ncol, nlyr)
## resolution : 1, 1 (x, y)
## extent : 0, 4, 0, 4 (xmin, xmax, ymin, ymax)
## coord. ref. :
## source(s) : memory
## name : lyr.1
## min value : 10
## max value : 98
9 / 60

Plotting

terra::plot(raster_layer)

10 / 60

File Formats/Extensions

  • Gtiff/GeoTiff
  • JPEG2000
  • ...
  • .grd
  • netCDF
  • ...
  • sometimes, raster data come even in a text format, such as CSV

In this course, we will only use tiff files as it is pretty common. Just be aware that there a different formats out there.

11 / 60

Implementations in R

AFAIK terra is the most commonly used package for raster data in R.

Some other developments, e.g., in the stars package, also implement an interface to simple features in sf.

The terra package also helps to use more elaborate zonal statistics. The same holds for the spatstat package.

12 / 60

Basic Raster Operations

13 / 60

Loading Raster Tiffs (Census Data)

immigrants_cologne <-
terra::rast("./data/immigrants_cologne.tif")
inhabitants_cologne <-
terra::rast("./data/inhabitants_cologne.tif")
immigrants_cologne
## class : SpatRaster
## dimensions : 289, 264, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094850, 4121250, 3084050, 3112950 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source : immigrants_cologne.tif
## name : immigrants_cologne
## min value : 3
## max value : 639
inhabitants_cologne
## class : SpatRaster
## dimensions : 289, 264, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094850, 4121250, 3084050, 3112950 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source : inhabitants_cologne.tif
## name : inhabitants_cologne
## min value : 3
## max value : 956
14 / 60

Compare Layers by Plotting

terra::plot(immigrants_cologne)

terra::plot(inhabitants_cologne)

15 / 60

Btw: We Can Also Use tmap

tm_shape(immigrants_cologne) +
tm_raster()

tm_shape(inhabitants_cologne) +
tm_raster()

16 / 60

Preparing for Analysis / Base R Functionalities

immigrants_cologne[immigrants_cologne == -9] <- NA
inhabitants_cologne[inhabitants_cologne == -9] <- NA
immigrants_cologne
## class : SpatRaster
## dimensions : 289, 264, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094850, 4121250, 3084050, 3112950 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source(s) : memory
## varname : immigrants_cologne
## name : immigrants_cologne
## min value : 3
## max value : 639
inhabitants_cologne
## class : SpatRaster
## dimensions : 289, 264, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094850, 4121250, 3084050, 3112950 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source(s) : memory
## varname : inhabitants_cologne
## name : inhabitants_cologne
## min value : 3
## max value : 956
17 / 60

Simple Statistics

Working with raster data is straightforward

  • quite speedy
  • yet not as comfortable as working with sf objects

For example, to calculate the mean we would use:

terra::global(immigrants_cologne, fun = "mean", na.rm = TRUE)
## mean
## immigrants_cologne 15.1337
18 / 60

Get All Values As a Vector

We can also extract the values of a raster directly as a vector:

all_raster_values <- terra::values(immigrants_cologne)
mean(all_raster_values, na.rm = TRUE)
## [1] 15.1337

Nevertheless, although raster data are simple data tables, working with them is a bit different compared to, e.g., simple features.

19 / 60

Combining Raster Layers to Calculate New Values

immigrant_rate <-
immigrants_cologne * 100 /
inhabitants_cologne
immigrant_rate
## class : SpatRaster
## dimensions : 289, 264, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094850, 4121250, 3084050, 3112950 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source(s) : memory
## varname : immigrants_cologne
## name : immigrants_cologne
## min value : 0.6637168
## max value : 100.0000000
20 / 60

Combining Raster Layers to Calculate New Values

immigrant_rate <-
immigrants_cologne * 100 /
inhabitants_cologne
immigrant_rate
## class : SpatRaster
## dimensions : 289, 264, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094850, 4121250, 3084050, 3112950 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source(s) : memory
## varname : immigrants_cologne
## name : immigrants_cologne
## min value : 0.6637168
## max value : 100.0000000

21 / 60

'Subsetting' Raster Layers

We can subset vector data by simply filtering for specific attribute values. For example, to subset Cologne's districts only by the one of Deutz, we can use the Tidyverse for sf data:

deutz <-
sf::st_read("./data/cologne.shp") |>
dplyr::filter(NAME == "Deutz")
## Reading layer `cologne' from data source
## `C:\Users\mueller2\a_talks_presentations\gesis-workshop-geospatial-techniques-R-2024\data\cologne.shp' using driver `ESRI Shapefile'
## Simple feature collection with 86 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 4094821 ymin: 3084022 xmax: 4121277 ymax: 3112952
## Projected CRS: ETRS89-extended / LAEA Europe

22 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

23 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

cropped_immigrant_rate <-
terra::crop(immigrant_rate, deutz)
24 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

cropped_immigrant_rate <-
terra::crop(immigrant_rate, deutz)

25 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

26 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

masked_immigrant_rate <-
raster::mask(immigrant_rate, terra::vect(deutz))
27 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

masked_immigrant_rate <-
raster::mask(immigrant_rate, terra::vect(deutz))

28 / 60

Combining Cropping and Masking

cropped_masked_immigrant_rate <-
terra::crop(immigrant_rate, deutz) |>
raster::mask(terra::vect(deutz))
29 / 60

Combining Cropping and Masking

cropped_masked_immigrant_rate <-
terra::crop(immigrant_rate, deutz) |>
raster::mask(terra::vect(deutz))

30 / 60

Exercise 1_4_1: Basic Raster Operations

Exercise

Solution

31 / 60

Raster Extraction / Zonal statistics

32 / 60

Sampling of some points

fancy_points <-
immigrants_cologne |>
terra::spatSample(size = 10, na.rm = TRUE, as.points = TRUE) |>
sf::st_as_sf() |>
dplyr::select(-1)
33 / 60

Sampling of some points

fancy_points <-
immigrants_cologne |>
terra::spatSample(size = 10, na.rm = TRUE, as.points = TRUE) |>
sf::st_as_sf() |>
dplyr::select(-1)
plot(fancy_points)

34 / 60

Extract Information From Rasters

Raster data are helpful when we aim to

  • apply calculations that are the same for all geometries in the dataset
  • extract information from raster fast and efficient
tm_shape(immigrant_rate) +
tm_raster() +
tm_shape(fancy_points) +
tm_dots(size = .25)

35 / 60

Raster Extraction

To extract the raster values at a specific point by location, we use the following:

terra::extract(immigrants_cologne, fancy_points, ID = FALSE)
## immigrants_cologne
## 1 5
## 2 3
## 3 4
## 4 3
## 5 27
## 6 9
## 7 9
## 8 32
## 9 27
## 10 3
36 / 60

Add Results to Existing Dataset

This information can be added to an existing dataset (our points in this example):

fancy_points <-
fancy_points |>
dplyr::mutate(
immigrant_rate_value =
as.vector(
terra::extract(immigrant_rate, fancy_points, ID = FALSE, raw = TRUE)
)
)
fancy_points
## Simple feature collection with 10 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 4102000 ymin: 3089700 xmax: 4120200 ymax: 3110100
## Projected CRS: ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## geometry immigrant_rate_value
## 1 POINT (4106300 3102400) 26.315789
## 2 POINT (4107700 3110100) 2.400000
## 3 POINT (4120200 3100000) 7.547170
## 4 POINT (4103300 3099200) 3.125000
## 5 POINT (4114100 3089900) 20.000000
## 6 POINT (4102000 3089700) 11.538462
## 7 POINT (4113200 3097500) 8.108108
## 8 POINT (4106200 3092300) 21.917808
## 9 POINT (4103800 3105400) 40.298507
## 10 POINT (4117700 3102700) 5.882353
37 / 60

More Elaborated: Spatial Buffers

Sometimes, extracting information 1:1 is not enough

  • too narrow
  • missing information about the surroundings of a point
tm_shape(immigrants_cologne) +
tm_raster() +
tm_shape(
sf::st_buffer(fancy_points, 500)
) +
tm_dots(size = .1) +
tm_borders()

38 / 60

A Closer Look

Jünger, 2021

39 / 60

Buffer Extraction

We can use spatial buffers of different sizes to extract information on surroundings:

terra::extract(
immigrants_cologne,
sf::st_buffer(fancy_points, 500),
fun = mean,
na.rm = TRUE,
ID = FALSE,
raw = TRUE
)
## immigrants_cologne
## 1 16.160000
## 2 14.075000
## 3 8.090909
## 4 8.923077
## 5 17.568627
## 6 8.150000
## 7 8.111111
## 8 17.186047
## 9 57.326087
## 10 7.468750
terra::extract(
immigrants_cologne,
sf::st_buffer(fancy_points, 1000),
fun = mean,
na.rm = TRUE,
ID = FALSE,
raw = TRUE
)
## immigrants_cologne
## 1 15.913043
## 2 14.333333
## 3 7.065789
## 4 22.723757
## 5 18.500000
## 6 7.982143
## 7 5.884615
## 8 14.905109
## 9 50.307018
## 10 6.943925
40 / 60

Exercise 1_4_2: Fancier Raster Operations

Exercise

Solution

41 / 60

Raster Stacks

So far, raster data have been unidimensional: we only had one attribute for each dataset.

But they can also be stacked:

census_stack <- c(immigrants_cologne, inhabitants_cologne)
census_stack
## class : SpatRaster
## dimensions : 289, 264, 2 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094850, 4121250, 3084050, 3112950 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source(s) : memory
## varnames : immigrants_cologne
## inhabitants_cologne
## names : immigrants_cologne, inhabitants_cologne
## min values : 3, 3
## max values : 639, 956
42 / 60

Non-Regular Grids

In this course, we only use regular grid-based raster data. However, be aware that non-regular grid data also exists.

https://r-spatial.github.io/stars/

43 / 60

Relevant In This Context: Earth Observation Data

Earth observation data have become increasingly popular in social science applications. They can be used, among others, to develop alternative measures for inequality, particularly in the global south. For example, Weidmann & Theunissen (2017) use nighttime light emissions data to construct measures of local inequality.

Satellite data or remote sensing data are often not only geographically fine-grained but also temporary. This characteristic makes them attractive for (near) real-time analysis of events.

Weidmann, N. B., & Theunissen, G. (2021). Estimating Local Inequality from Nighttime Lights. Remote Sensing, 13(22), 4624. https://doi.org/10.3390/rs13224624

44 / 60

Tomorrow

Day Time Title
April 23 10:00-11:30 Introduction to GIS
April 23 11:45-13:00 Vector Data
April 23 13:00-14:00 Lunch Break
April 23 14:00-15:30 Mapping
April 23 15:45-17:00 Raster Data
April 24 09:00-10:30 Advanced Data Import & Processing
April 24 10:45-12:00 Applied Data Wrangling & Linking
April 24 12:00-13:00 Lunch Break
April 24 13:00-14:30 Investigating Spatial Autocorrelation
April 24 14:45-16:00 Spatial Econometrics & Outlook
46 / 60

Add-on Slides: Conversion and possible applications

47 / 60

Raster to Points

raster_now_points <-
immigrant_rate |>
terra::as.points()
raster_now_points
## class : SpatVector
## geometry : points
## dimensions : 13867, 1 (geometries, attributes)
## extent : 4094900, 4121200, 3084100, 3112900 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## names : immigrants_cologne
## type : <num>
## values : 6.509
## 45.03
## 33.19
48 / 60

Points to Raster

points_now_raster <-
raster_now_points |>
terra::rast(vals = raster_now_points$immigrants_cologne, resolution = 100)
points_now_raster
## class : SpatRaster
## dimensions : 288, 263, 1 (nrow, ncol, nlyr)
## resolution : 100, 100 (x, y)
## extent : 4094900, 4121200, 3084100, 3112900 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
## source(s) : memory
## name : lyr.1
## min value : 0.6637168
## max value : 100.0000000
49 / 60

Application: Point Pattern Analysis Using Global Kernel Densities (‘Heatmap’)

kernel_densities <-
raster_now_points |>
sf::st_as_sf() |>
dplyr::filter(immigrants_cologne > 25) |>
sf::as_Spatial() |>
as("ppp") |>
spatstat.explore::density.ppp(sigma = 250) |>
terra::rast()
terra::crs(kernel_densities) <- "epsg:3035"

50 / 60

Raster to Polygons

polygon_raster <-
immigrant_rate |>
terra::as.polygons() |>
sf::st_as_sf()

51 / 60

Application: Rotation in Space

52 / 60

Focal Statistics

Focal statistics are another method of including information near a point in space. However, it's applied to the whole dataset and is independent of arbitrary points we project onto a map.

  • relates focal cells to surrounding cells
  • vastly used in image processing
  • but also applicable in social science research, as we will see
53 / 60

Focal Application: Edge Detection

Source: https://en.wikipedia.org/wiki/Sobel_operator

54 / 60

Edges of Immigrant Rates

55 / 60

We Can Do That As Well Using a Sobel Filter

rx=[101202101]×raster_filery=[121000121]×raster_filerxy=r2x+r2y

56 / 60

Implementation in R

From the official documentation:

sobel <- function(r) {
fy <- matrix(c(1, 0, -1, 2, 0, -2, 1, 0, -1), nrow = 3)
fx <- matrix(c(-1, -2, -1, 0, 0, 0, 1, 2, 1) , nrow = 3)
rx <- terra::focal(r, fx)
ry <- terra::focal(r, fy)
sqrt(rx^2 + ry^2)
}
57 / 60

Data Preparation and Application of Filter

old_extent <- terra::ext(immigrant_rate)
new_extent <- old_extent + c(10000, -10000, 10000, -10000)
smaller_immigrant_rate <-
immigrant_rate |>
terra::crop(new_extent)
smaller_immigrant_rate[smaller_immigrant_rate < 15] <- NA
immigrant_edges <-
sobel(smaller_immigrant_rate)
58 / 60

Comparison

59 / 60

Now

Day Time Title
April 23 10:00-11:30 Introduction to GIS
April 23 11:45-13:00 Vector Data
April 23 13:00-14:00 Lunch Break
April 23 14:00-15:30 Mapping
April 23 15:45-17:00 Raster Data
April 24 09:00-10:30 Advanced Data Import & Processing
April 24 10:45-12:00 Applied Data Wrangling & Linking
April 24 12:00-13:00 Lunch Break
April 24 13:00-14:30 Investigating Spatial Autocorrelation
April 24 14:45-16:00 Spatial Econometrics & Outlook
2 / 60
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow