Introduction to Geospatial Techniques for Social Scientists in R

Raster Data

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 23, 2024

1 / 60

Now
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

2 / 60

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

General Difference to Vector Data

Data Structure:

Other data format(s), different file extensions
geometries do not differ within one dataset

Implications:

Other geospatial operations possible

Benefits:

can be way more efficient
straightforward processing of raster values and extraction of zonal statistics
it's like working with simple tabular data

3 / 60

Visual Difference Between Vector and Raster Data

4 / 60

What Exactly Are Raster Data?

Hold information on (most of the time) evenly shaped grid cells
Basically, a simple data table
each cell represents one observation

5 / 60

MetadataInformation about geometries is globally stored
they are the same for all observations
their location in space is defined by their cell location in the data table
Without this information, raster data were simple image files
6 / 60

Important MetadataRaster Dimensionsnumber of columns, rows, and cells

ExtentSimilar to bounding box in vector data

Resolutionthe size of each raster cell

Coordinate reference systemdefines where on the earth's surface the raster layer lies

7 / 60

All Beginnings Are... Easy!

terra::rast()

## class       : SpatRaster 
## dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84

8 / 60

Feed With Data

input_data <- 
  sample(1:100, 16) |> 
  matrix(nrow = 4)
raster_layer <- terra::rast(input_data)
raster_layer

## class       : SpatRaster 
## dimensions  : 4, 4, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 0, 4, 0, 4  (xmin, xmax, ymin, ymax)
## coord. ref. :  
## source(s)   : memory
## name        : lyr.1 
## min value   :    10 
## max value   :    98

9 / 60

Plotting

terra::plot(raster_layer)

10 / 60

File Formats/Extensions

Gtiff/GeoTiff
JPEG2000
...
.grd
netCDF
...
sometimes, raster data come even in a text format, such as CSV

In this course, we will only use tiff files as it is pretty common. Just be aware that there a different formats out there.

11 / 60

Implementations in `R`

AFAIK terra is the most commonly used package for raster data in R.

Some other developments, e.g., in the stars package, also implement an interface to simple features in sf.

The terra package also helps to use more elaborate zonal statistics. The same holds for the spatstat package.

12 / 60

Basic Raster Operations13 / 60

Loading Raster Tiffs (Census Data)

immigrants_cologne <-
  terra::rast("./data/immigrants_cologne.tif")
inhabitants_cologne <-
  terra::rast("./data/inhabitants_cologne.tif")
immigrants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source      : immigrants_cologne.tif 
## name        : immigrants_cologne 
## min value   :                  3 
## max value   :                639

inhabitants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source      : inhabitants_cologne.tif 
## name        : inhabitants_cologne 
## min value   :                   3 
## max value   :                 956

14 / 60

Compare Layers by Plotting

terra::plot(immigrants_cologne)

terra::plot(inhabitants_cologne)

15 / 60

Btw: We Can Also Use `tmap`

tm_shape(immigrants_cologne) +
  tm_raster()

tm_shape(inhabitants_cologne) +
  tm_raster()

16 / 60

Preparing for Analysis / Base `R` Functionalities

immigrants_cologne[immigrants_cologne == -9] <- NA
inhabitants_cologne[inhabitants_cologne == -9] <- NA
immigrants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : immigrants_cologne 
## name        : immigrants_cologne 
## min value   :                  3 
## max value   :                639

inhabitants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : inhabitants_cologne 
## name        : inhabitants_cologne 
## min value   :                   3 
## max value   :                 956

17 / 60

Simple Statistics

Working with raster data is straightforward

quite speedy
yet not as comfortable as working with sf objects

For example, to calculate the mean we would use:

terra::global(immigrants_cologne, fun = "mean", na.rm = TRUE)

##                       mean
## immigrants_cologne 15.1337

18 / 60

Get All Values As a Vector

We can also extract the values of a raster directly as a vector:

all_raster_values <- terra::values(immigrants_cologne)
mean(all_raster_values, na.rm = TRUE)

## [1] 15.1337

Nevertheless, although raster data are simple data tables, working with them is a bit different compared to, e.g., simple features.

19 / 60

Combining Raster Layers to Calculate New Values

immigrant_rate <-
  immigrants_cologne * 100 / 
  inhabitants_cologne
immigrant_rate

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : immigrants_cologne 
## name        : immigrants_cologne 
## min value   :          0.6637168 
## max value   :        100.0000000

20 / 60

Combining Raster Layers to Calculate New Values

immigrant_rate <-
  immigrants_cologne * 100 / 
  inhabitants_cologne
immigrant_rate

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : immigrants_cologne 
## name        : immigrants_cologne 
## min value   :          0.6637168 
## max value   :        100.0000000

21 / 60

'Subsetting' Raster Layers

We can subset vector data by simply filtering for specific attribute values. For example, to subset Cologne's districts only by the one of Deutz, we can use the Tidyverse for sf data:

deutz <-
  sf::st_read("./data/cologne.shp") |> 
  dplyr::filter(NAME == "Deutz")

## Reading layer `cologne' from data source 
##   `C:\Users\mueller2\a_talks_presentations\gesis-workshop-geospatial-techniques-R-2024\data\cologne.shp' using driver `ESRI Shapefile'
## Simple feature collection with 86 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 4094821 ymin: 3084022 xmax: 4121277 ymax: 3112952
## Projected CRS: ETRS89-extended / LAEA Europe

22 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

23 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

cropped_immigrant_rate <-
  terra::crop(immigrant_rate, deutz)

24 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

cropped_immigrant_rate <-
  terra::crop(immigrant_rate, deutz)

25 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

26 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

masked_immigrant_rate <-
  raster::mask(immigrant_rate, terra::vect(deutz))

27 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

masked_immigrant_rate <-
  raster::mask(immigrant_rate, terra::vect(deutz))

28 / 60

Combining Cropping and Masking

cropped_masked_immigrant_rate <-
  terra::crop(immigrant_rate, deutz) |> 
  raster::mask(terra::vect(deutz))

29 / 60

Combining Cropping and Masking

cropped_masked_immigrant_rate <-
  terra::crop(immigrant_rate, deutz) |> 
  raster::mask(terra::vect(deutz))

30 / 60

Exercise 1_4_1: Basic Raster Operations

Exercise

Solution

31 / 60

Raster Extraction / Zonal statistics32 / 60

Sampling of some pointsfancy_points <-
  immigrants_cologne |> 
  terra::spatSample(size = 10, na.rm = TRUE, as.points = TRUE) |> 
  sf::st_as_sf() |> 
  dplyr::select(-1)

33 / 60

Sampling of some points

fancy_points <-
  immigrants_cologne |> 
  terra::spatSample(size = 10, na.rm = TRUE, as.points = TRUE) |> 
  sf::st_as_sf() |> 
  dplyr::select(-1)

plot(fancy_points)

34 / 60

Extract Information From Rasters

Raster data are helpful when we aim to

apply calculations that are the same for all geometries in the dataset
extract information from raster fast and efficient

tm_shape(immigrant_rate) +
  tm_raster() +
  tm_shape(fancy_points) +
  tm_dots(size = .25)

35 / 60

Raster Extraction

To extract the raster values at a specific point by location, we use the following:

terra::extract(immigrants_cologne, fancy_points, ID = FALSE)

##    immigrants_cologne
## 1                   5
## 2                   3
## 3                   4
## 4                   3
## 5                  27
## 6                   9
## 7                   9
## 8                  32
## 9                  27
## 10                  3

36 / 60

Add Results to Existing Dataset

This information can be added to an existing dataset (our points in this example):

fancy_points <-
  fancy_points |> 
  dplyr::mutate(
    immigrant_rate_value = 
      as.vector(
        terra::extract(immigrant_rate, fancy_points, ID = FALSE, raw = TRUE)
      )
  )
fancy_points

## Simple feature collection with 10 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 4102000 ymin: 3089700 xmax: 4120200 ymax: 3110100
## Projected CRS: ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
##                   geometry immigrant_rate_value
## 1  POINT (4106300 3102400)            26.315789
## 2  POINT (4107700 3110100)             2.400000
## 3  POINT (4120200 3100000)             7.547170
## 4  POINT (4103300 3099200)             3.125000
## 5  POINT (4114100 3089900)            20.000000
## 6  POINT (4102000 3089700)            11.538462
## 7  POINT (4113200 3097500)             8.108108
## 8  POINT (4106200 3092300)            21.917808
## 9  POINT (4103800 3105400)            40.298507
## 10 POINT (4117700 3102700)             5.882353

37 / 60

More Elaborated: Spatial Buffers

Sometimes, extracting information 1:1 is not enough

too narrow
missing information about the surroundings of a point

tm_shape(immigrants_cologne) +
  tm_raster() +
  tm_shape(
    sf::st_buffer(fancy_points, 500) 
  ) +
  tm_dots(size = .1) +
  tm_borders()

38 / 60

A Closer Look

Jünger, 2021

39 / 60

Buffer Extraction

We can use spatial buffers of different sizes to extract information on surroundings:

terra::extract(
  immigrants_cologne, 
  sf::st_buffer(fancy_points, 500),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
)

##    immigrants_cologne
## 1           16.160000
## 2           14.075000
## 3            8.090909
## 4            8.923077
## 5           17.568627
## 6            8.150000
## 7            8.111111
## 8           17.186047
## 9           57.326087
## 10           7.468750

terra::extract(
  immigrants_cologne, 
  sf::st_buffer(fancy_points, 1000),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
)

##    immigrants_cologne
## 1           15.913043
## 2           14.333333
## 3            7.065789
## 4           22.723757
## 5           18.500000
## 6            7.982143
## 7            5.884615
## 8           14.905109
## 9           50.307018
## 10           6.943925

40 / 60

Exercise 1_4_2: Fancier Raster Operations

Exercise

Solution

41 / 60

Raster Stacks

So far, raster data have been unidimensional: we only had one attribute for each dataset.

But they can also be stacked:

census_stack <- c(immigrants_cologne, inhabitants_cologne)
census_stack

## class       : SpatRaster 
## dimensions  : 289, 264, 2  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varnames    : immigrants_cologne 
##               inhabitants_cologne 
## names       : immigrants_cologne, inhabitants_cologne 
## min values  :                  3,                   3 
## max values  :                639,                 956

42 / 60

Non-Regular Grids

In this course, we only use regular grid-based raster data. However, be aware that non-regular grid data also exists.

https://r-spatial.github.io/stars/

43 / 60

Relevant In This Context: Earth Observation Data

Earth observation data have become increasingly popular in social science applications. They can be used, among others, to develop alternative measures for inequality, particularly in the global south. For example, Weidmann & Theunissen (2017) use nighttime light emissions data to construct measures of local inequality.

Satellite data or remote sensing data are often not only geographically fine-grained but also temporary. This characteristic makes them attractive for (near) real-time analysis of events.

Weidmann, N. B., & Theunissen, G. (2021). Estimating Local Inequality from Nighttime Lights. Remote Sensing, 13(22), 4624. https://doi.org/10.3390/rs13224624

44 / 60

Magic of Data Cubes In the `stars` Package

https://raw.githubusercontent.com/r-spatial/stars/master/images/cube2.png

45 / 60

Tomorrow
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

46 / 60

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Add-on Slides: Conversion and possible applications47 / 60

Raster to Points

raster_now_points <-
  immigrant_rate |> 
  terra::as.points()
raster_now_points

##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 13867, 1  (geometries, attributes)
##  extent      : 4094900, 4121200, 3084100, 3112900  (xmin, xmax, ymin, ymax)
##  coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
##  names       : immigrants_cologne
##  type        :              <num>
##  values      :              6.509
##                             45.03
##                             33.19

48 / 60

Points to Raster

points_now_raster <-
  raster_now_points |> 
  terra::rast(vals = raster_now_points$immigrants_cologne, resolution = 100)
points_now_raster

## class       : SpatRaster 
## dimensions  : 288, 263, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094900, 4121200, 3084100, 3112900  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## name        :       lyr.1 
## min value   :   0.6637168 
## max value   : 100.0000000

49 / 60

Application: Point Pattern Analysis Using Global Kernel Densities (‘Heatmap’)

kernel_densities <- 
  raster_now_points |>  
  sf::st_as_sf() |> 
  dplyr::filter(immigrants_cologne > 25) |> 
  sf::as_Spatial() |> 
  as("ppp") |> 
  spatstat.explore::density.ppp(sigma = 250) |>  
  terra::rast()
terra::crs(kernel_densities) <- "epsg:3035"

50 / 60

Raster to Polygons

polygon_raster <-
  immigrant_rate |>  
  terra::as.polygons() |> 
  sf::st_as_sf()

51 / 60

Application: Rotation in Space

52 / 60

Focal Statistics

Focal statistics are another method of including information near a point in space. However, it's applied to the whole dataset and is independent of arbitrary points we project onto a map.

relates focal cells to surrounding cells
vastly used in image processing
but also applicable in social science research, as we will see

53 / 60

Focal Application: Edge Detection

Source: https://en.wikipedia.org/wiki/Sobel_operator

54 / 60

Edges of Immigrant Rates

55 / 60

We Can Do That As Well Using a Sobel Filter

$r_x = \begin{bmatrix}1 & 0 & -1 \\2 & 0 & -2 \\1 & 0 & -1\end{bmatrix} \times raster\_file \\r_y = \begin{bmatrix}1 & 2 & 1 \\0 & 0 & 0 \\-1 & -2 & -1\end{bmatrix}\times raster\_file \\r_{xy} = \sqrt{r_{x}^2 + r_{y}^2}$

56 / 60

Implementation in R

From the official documentation:

sobel <- function(r) {
  fy <- matrix(c(1, 0, -1, 2, 0, -2, 1, 0, -1), nrow = 3)
  fx <- matrix(c(-1, -2, -1, 0, 0, 0, 1, 2, 1) , nrow = 3)
  rx <- terra::focal(r, fx)
  ry <- terra::focal(r, fy)
  sqrt(rx^2 + ry^2)
}

57 / 60

Data Preparation and Application of Filter

old_extent <- terra::ext(immigrant_rate) 
new_extent <- old_extent + c(10000, -10000, 10000, -10000)
smaller_immigrant_rate <-
  immigrant_rate |> 
  terra::crop(new_extent)
smaller_immigrant_rate[smaller_immigrant_rate < 15] <- NA
immigrant_edges <-
  sobel(smaller_immigrant_rate)

58 / 60

Comparison

59 / 60

stefan.juenger@gesis.org

@StefanJuenger

StefanJuenger

https://stefanjuenger.github.io

60 / 60

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Raster Data

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 23, 2024

1 / 60

Now
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

2 / 60

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

General Difference to Vector Data

Data Structure:

Other data format(s), different file extensions
geometries do not differ within one dataset

Implications:

Other geospatial operations possible

Benefits:

can be way more efficient
straightforward processing of raster values and extraction of zonal statistics
it's like working with simple tabular data

3 / 60

Visual Difference Between Vector and Raster Data

4 / 60

What Exactly Are Raster Data?

Hold information on (most of the time) evenly shaped grid cells
Basically, a simple data table
each cell represents one observation

5 / 60

MetadataInformation about geometries is globally stored
they are the same for all observations
their location in space is defined by their cell location in the data table
Without this information, raster data were simple image files
6 / 60

Important MetadataRaster Dimensionsnumber of columns, rows, and cells

ExtentSimilar to bounding box in vector data

Resolutionthe size of each raster cell

Coordinate reference systemdefines where on the earth's surface the raster layer lies

7 / 60

All Beginnings Are... Easy!

terra::rast()

## class       : SpatRaster 
## dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84

8 / 60

Feed With Data

input_data <- 
  sample(1:100, 16) |> 
  matrix(nrow = 4)
raster_layer <- terra::rast(input_data)
raster_layer

## class       : SpatRaster 
## dimensions  : 4, 4, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 0, 4, 0, 4  (xmin, xmax, ymin, ymax)
## coord. ref. :  
## source(s)   : memory
## name        : lyr.1 
## min value   :    10 
## max value   :    98

9 / 60

Plotting

terra::plot(raster_layer)

10 / 60

File Formats/Extensions

Gtiff/GeoTiff
JPEG2000
...
.grd
netCDF
...
sometimes, raster data come even in a text format, such as CSV

In this course, we will only use tiff files as it is pretty common. Just be aware that there a different formats out there.

11 / 60

Implementations in `R`

AFAIK terra is the most commonly used package for raster data in R.

Some other developments, e.g., in the stars package, also implement an interface to simple features in sf.

The terra package also helps to use more elaborate zonal statistics. The same holds for the spatstat package.

12 / 60

Basic Raster Operations13 / 60

Loading Raster Tiffs (Census Data)

immigrants_cologne <-
  terra::rast("./data/immigrants_cologne.tif")
inhabitants_cologne <-
  terra::rast("./data/inhabitants_cologne.tif")
immigrants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source      : immigrants_cologne.tif 
## name        : immigrants_cologne 
## min value   :                  3 
## max value   :                639

inhabitants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source      : inhabitants_cologne.tif 
## name        : inhabitants_cologne 
## min value   :                   3 
## max value   :                 956

14 / 60

Compare Layers by Plotting

terra::plot(immigrants_cologne)

terra::plot(inhabitants_cologne)

15 / 60

Btw: We Can Also Use `tmap`

tm_shape(immigrants_cologne) +
  tm_raster()

tm_shape(inhabitants_cologne) +
  tm_raster()

16 / 60

Preparing for Analysis / Base `R` Functionalities

immigrants_cologne[immigrants_cologne == -9] <- NA
inhabitants_cologne[inhabitants_cologne == -9] <- NA
immigrants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : immigrants_cologne 
## name        : immigrants_cologne 
## min value   :                  3 
## max value   :                639

inhabitants_cologne

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : inhabitants_cologne 
## name        : inhabitants_cologne 
## min value   :                   3 
## max value   :                 956

17 / 60

Simple Statistics

Working with raster data is straightforward

quite speedy
yet not as comfortable as working with sf objects

For example, to calculate the mean we would use:

terra::global(immigrants_cologne, fun = "mean", na.rm = TRUE)

##                       mean
## immigrants_cologne 15.1337

18 / 60

Get All Values As a Vector

We can also extract the values of a raster directly as a vector:

all_raster_values <- terra::values(immigrants_cologne)
mean(all_raster_values, na.rm = TRUE)

## [1] 15.1337

Nevertheless, although raster data are simple data tables, working with them is a bit different compared to, e.g., simple features.

19 / 60

Combining Raster Layers to Calculate New Valuesimmigrant_rate <-
  immigrants_cologne * 100 / 
  inhabitants_cologne
immigrant_rate

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : immigrants_cologne 
## name        : immigrants_cologne 
## min value   :          0.6637168 
## max value   :        100.0000000
20 / 60

Combining Raster Layers to Calculate New Values

immigrant_rate <-
  immigrants_cologne * 100 / 
  inhabitants_cologne
immigrant_rate

## class       : SpatRaster 
## dimensions  : 289, 264, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varname     : immigrants_cologne 
## name        : immigrants_cologne 
## min value   :          0.6637168 
## max value   :        100.0000000

21 / 60

'Subsetting' Raster Layers

We can subset vector data by simply filtering for specific attribute values. For example, to subset Cologne's districts only by the one of Deutz, we can use the Tidyverse for sf data:

deutz <-
  sf::st_read("./data/cologne.shp") |> 
  dplyr::filter(NAME == "Deutz")

## Reading layer `cologne' from data source 
##   `C:\Users\mueller2\a_talks_presentations\gesis-workshop-geospatial-techniques-R-2024\data\cologne.shp' using driver `ESRI Shapefile'
## Simple feature collection with 86 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 4094821 ymin: 3084022 xmax: 4121277 ymax: 3112952
## Projected CRS: ETRS89-extended / LAEA Europe

22 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

23 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

cropped_immigrant_rate <-
  terra::crop(immigrant_rate, deutz)

24 / 60

Cropping

Cropping is a method of cutting out a specific slice of a raster layer based on an input dataset or geospatial extent, such as a bounding box.

cropped_immigrant_rate <-
  terra::crop(immigrant_rate, deutz)

25 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

26 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

masked_immigrant_rate <-
  raster::mask(immigrant_rate, terra::vect(deutz))

27 / 60

Masking

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

masked_immigrant_rate <-
  raster::mask(immigrant_rate, terra::vect(deutz))

28 / 60

Combining Cropping and Maskingcropped_masked_immigrant_rate <-
  terra::crop(immigrant_rate, deutz) |> 
  raster::mask(terra::vect(deutz))

29 / 60

Combining Cropping and Masking

cropped_masked_immigrant_rate <-
  terra::crop(immigrant_rate, deutz) |> 
  raster::mask(terra::vect(deutz))

30 / 60

Exercise 1_4_1: Basic Raster Operations

Exercise

Solution

31 / 60

Raster Extraction / Zonal statistics32 / 60

Sampling of some pointsfancy_points <-
  immigrants_cologne |> 
  terra::spatSample(size = 10, na.rm = TRUE, as.points = TRUE) |> 
  sf::st_as_sf() |> 
  dplyr::select(-1)

33 / 60

Sampling of some points

fancy_points <-
  immigrants_cologne |> 
  terra::spatSample(size = 10, na.rm = TRUE, as.points = TRUE) |> 
  sf::st_as_sf() |> 
  dplyr::select(-1)

plot(fancy_points)

34 / 60

Extract Information From Rasters

Raster data are helpful when we aim to

apply calculations that are the same for all geometries in the dataset
extract information from raster fast and efficient

tm_shape(immigrant_rate) +
  tm_raster() +
  tm_shape(fancy_points) +
  tm_dots(size = .25)

35 / 60

Raster Extraction

To extract the raster values at a specific point by location, we use the following:

terra::extract(immigrants_cologne, fancy_points, ID = FALSE)

##    immigrants_cologne
## 1                   5
## 2                   3
## 3                   4
## 4                   3
## 5                  27
## 6                   9
## 7                   9
## 8                  32
## 9                  27
## 10                  3

36 / 60

Add Results to Existing Dataset

This information can be added to an existing dataset (our points in this example):

fancy_points <-
  fancy_points |> 
  dplyr::mutate(
    immigrant_rate_value = 
      as.vector(
        terra::extract(immigrant_rate, fancy_points, ID = FALSE, raw = TRUE)
      )
  )
fancy_points

## Simple feature collection with 10 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 4102000 ymin: 3089700 xmax: 4120200 ymax: 3110100
## Projected CRS: ETRS89-extended / LAEA Europe (with axis order normalized for visualization)
##                   geometry immigrant_rate_value
## 1  POINT (4106300 3102400)            26.315789
## 2  POINT (4107700 3110100)             2.400000
## 3  POINT (4120200 3100000)             7.547170
## 4  POINT (4103300 3099200)             3.125000
## 5  POINT (4114100 3089900)            20.000000
## 6  POINT (4102000 3089700)            11.538462
## 7  POINT (4113200 3097500)             8.108108
## 8  POINT (4106200 3092300)            21.917808
## 9  POINT (4103800 3105400)            40.298507
## 10 POINT (4117700 3102700)             5.882353

37 / 60

More Elaborated: Spatial Buffers

Sometimes, extracting information 1:1 is not enough

too narrow
missing information about the surroundings of a point

tm_shape(immigrants_cologne) +
  tm_raster() +
  tm_shape(
    sf::st_buffer(fancy_points, 500) 
  ) +
  tm_dots(size = .1) +
  tm_borders()

38 / 60

A Closer Look

Jünger, 2021

39 / 60

Buffer Extraction

We can use spatial buffers of different sizes to extract information on surroundings:

terra::extract(
  immigrants_cologne, 
  sf::st_buffer(fancy_points, 500),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
)

##    immigrants_cologne
## 1           16.160000
## 2           14.075000
## 3            8.090909
## 4            8.923077
## 5           17.568627
## 6            8.150000
## 7            8.111111
## 8           17.186047
## 9           57.326087
## 10           7.468750

terra::extract(
  immigrants_cologne, 
  sf::st_buffer(fancy_points, 1000),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
)

##    immigrants_cologne
## 1           15.913043
## 2           14.333333
## 3            7.065789
## 4           22.723757
## 5           18.500000
## 6            7.982143
## 7            5.884615
## 8           14.905109
## 9           50.307018
## 10           6.943925

40 / 60

Exercise 1_4_2: Fancier Raster Operations

Exercise

Solution

41 / 60

Raster Stacks

So far, raster data have been unidimensional: we only had one attribute for each dataset.

But they can also be stacked:

census_stack <- c(immigrants_cologne, inhabitants_cologne)
census_stack

## class       : SpatRaster 
## dimensions  : 289, 264, 2  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094850, 4121250, 3084050, 3112950  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## varnames    : immigrants_cologne 
##               inhabitants_cologne 
## names       : immigrants_cologne, inhabitants_cologne 
## min values  :                  3,                   3 
## max values  :                639,                 956

42 / 60

Non-Regular Grids

In this course, we only use regular grid-based raster data. However, be aware that non-regular grid data also exists.

https://r-spatial.github.io/stars/

43 / 60

Relevant In This Context: Earth Observation Data

Satellite data or remote sensing data are often not only geographically fine-grained but also temporary. This characteristic makes them attractive for (near) real-time analysis of events.

Weidmann, N. B., & Theunissen, G. (2021). Estimating Local Inequality from Nighttime Lights. Remote Sensing, 13(22), 4624. https://doi.org/10.3390/rs13224624

44 / 60

Magic of Data Cubes In the `stars` Package

https://raw.githubusercontent.com/r-spatial/stars/master/images/cube2.png

45 / 60

Tomorrow
 
    Day 
    Time 
    Title 
  


    April 23 
    10:00-11:30 
    Introduction to GIS 
  

    April 23 
    11:45-13:00 
    Vector Data 
  

    April 23 
    13:00-14:00 
    Lunch Break 
  

    April 23 
    14:00-15:30 
    Mapping 
  

    April 23 
    15:45-17:00 
    Raster Data 
  

    April 24 
    09:00-10:30 
    Advanced Data Import & Processing 
  

    April 24 
    10:45-12:00 
    Applied Data Wrangling & Linking 
  

    April 24 
    12:00-13:00 
    Lunch Break 
  

    April 24 
    13:00-14:30 
    Investigating Spatial Autocorrelation 
  

    April 24 
    14:45-16:00 
    Spatial Econometrics & Outlook 
  

46 / 60

Day	Time	Title
April 23	10:00-11:30	Introduction to GIS
April 23	11:45-13:00	Vector Data
April 23	13:00-14:00	Lunch Break
April 23	14:00-15:30	Mapping
April 23	15:45-17:00	Raster Data
April 24	09:00-10:30	Advanced Data Import & Processing
April 24	10:45-12:00	Applied Data Wrangling & Linking
April 24	12:00-13:00	Lunch Break
April 24	13:00-14:30	Investigating Spatial Autocorrelation
April 24	14:45-16:00	Spatial Econometrics & Outlook

Add-on Slides: Conversion and possible applications47 / 60

Raster to Points

raster_now_points <-
  immigrant_rate |> 
  terra::as.points()
raster_now_points

##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 13867, 1  (geometries, attributes)
##  extent      : 4094900, 4121200, 3084100, 3112900  (xmin, xmax, ymin, ymax)
##  coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
##  names       : immigrants_cologne
##  type        :              <num>
##  values      :              6.509
##                             45.03
##                             33.19

48 / 60

Points to Raster

points_now_raster <-
  raster_now_points |> 
  terra::rast(vals = raster_now_points$immigrants_cologne, resolution = 100)
points_now_raster

## class       : SpatRaster 
## dimensions  : 288, 263, 1  (nrow, ncol, nlyr)
## resolution  : 100, 100  (x, y)
## extent      : 4094900, 4121200, 3084100, 3112900  (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (with axis order normalized for visualization) 
## source(s)   : memory
## name        :       lyr.1 
## min value   :   0.6637168 
## max value   : 100.0000000

49 / 60

Application: Point Pattern Analysis Using Global Kernel Densities (‘Heatmap’)

kernel_densities <- 
  raster_now_points |>  
  sf::st_as_sf() |> 
  dplyr::filter(immigrants_cologne > 25) |> 
  sf::as_Spatial() |> 
  as("ppp") |> 
  spatstat.explore::density.ppp(sigma = 250) |>  
  terra::rast()
terra::crs(kernel_densities) <- "epsg:3035"

50 / 60

Raster to Polygons

polygon_raster <-
  immigrant_rate |>  
  terra::as.polygons() |> 
  sf::st_as_sf()

51 / 60

Application: Rotation in Space

52 / 60

Focal Statistics

Focal statistics are another method of including information near a point in space. However, it's applied to the whole dataset and is independent of arbitrary points we project onto a map.

relates focal cells to surrounding cells
vastly used in image processing
but also applicable in social science research, as we will see

53 / 60

Focal Application: Edge Detection

Source: https://en.wikipedia.org/wiki/Sobel_operator

54 / 60

Edges of Immigrant Rates

55 / 60

We Can Do That As Well Using a Sobel Filter

56 / 60

Implementation in R

From the official documentation:

sobel <- function(r) {
  fy <- matrix(c(1, 0, -1, 2, 0, -2, 1, 0, -1), nrow = 3)
  fx <- matrix(c(-1, -2, -1, 0, 0, 0, 1, 2, 1) , nrow = 3)
  rx <- terra::focal(r, fx)
  ry <- terra::focal(r, fy)
  sqrt(rx^2 + ry^2)
}

57 / 60

Data Preparation and Application of Filter

old_extent <- terra::ext(immigrant_rate) 
new_extent <- old_extent + c(10000, -10000, 10000, -10000)
smaller_immigrant_rate <-
  immigrant_rate |> 
  terra::crop(new_extent)
smaller_immigrant_rate[smaller_immigrant_rate < 15] <- NA
immigrant_edges <-
  sobel(smaller_immigrant_rate)

58 / 60

Comparison

59 / 60

stefan.juenger@gesis.org

@StefanJuenger

StefanJuenger

https://stefanjuenger.github.io

60 / 60

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

Introduction to Geospatial Techniques for Social Scientists in R

Raster Data

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 23, 2024

Now

General Difference to Vector Data

Visual Difference Between Vector and Raster Data

What Exactly Are Raster Data?

Metadata

Important Metadata

All Beginnings Are... Easy!

Feed With Data

Plotting

File Formats/Extensions

Implementations in R

Basic Raster Operations

Loading Raster Tiffs (Census Data)

Compare Layers by Plotting

Btw: We Can Also Use tmap

Preparing for Analysis / Base R Functionalities

Simple Statistics

Get All Values As a Vector

Combining Raster Layers to Calculate New Values

Combining Raster Layers to Calculate New Values

'Subsetting' Raster Layers

Cropping

Cropping

Cropping

Masking

Masking

Masking

Combining Cropping and Masking

Combining Cropping and Masking

Exercise 1_4_1: Basic Raster Operations

Raster Extraction / Zonal statistics

Sampling of some points

Sampling of some points

Extract Information From Rasters

Raster Extraction

Add Results to Existing Dataset

More Elaborated: Spatial Buffers

A Closer Look

Buffer Extraction

Exercise 1_4_2: Fancier Raster Operations

Raster Stacks

Non-Regular Grids

Relevant In This Context: Earth Observation Data

Magic of Data Cubes In the stars Package

Tomorrow

Add-on Slides: Conversion and possible applications

Raster to Points

Points to Raster

Application: Point Pattern Analysis Using Global Kernel Densities (‘Heatmap’)

Raster to Polygons

Application: Rotation in Space

Focal Statistics

Focal Application: Edge Detection

Edges of Immigrant Rates

We Can Do That As Well Using a Sobel Filter

Implementation in R

Data Preparation and Application of Filter

Comparison

Now

Help

Introduction to Geospatial Techniques for Social Scientists in R

Introduction to Geospatial Techniques for Social Scientists in R

Raster Data

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 23, 2024

Now

General Difference to Vector Data

Visual Difference Between Vector and Raster Data

What Exactly Are Raster Data?

Metadata

Important Metadata

All Beginnings Are... Easy!

Feed With Data

Plotting

Implementations in `R`

Btw: We Can Also Use `tmap`

Preparing for Analysis / Base `R` Functionalities

Magic of Data Cubes In the `stars` Package

Implementations in `R`

Btw: We Can Also Use `tmap`

Preparing for Analysis / Base `R` Functionalities

Magic of Data Cubes In the `stars` Package