+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to Geospatial Techniques for Social Scientists in R

Advanced Data Import & Processing

Stefan Jünger & Anne-Kathrin Stroppe

GESIS Workshop

April 24, 2024

1 / 47

Now

Day Time Title
April 23 10:00-11:30 Introduction to GIS
April 23 11:45-13:00 Vector Data
April 23 13:00-14:00 Lunch Break
April 23 14:00-15:30 Mapping
April 23 15:45-17:00 Raster Data
April 24 09:00-10:30 Advanced Data Import & Processing
April 24 10:45-12:00 Applied Data Wrangling & Linking
April 24 12:00-13:00 Lunch Break
April 24 13:00-14:30 Investigating Spatial Autocorrelation
April 24 14:45-16:00 Spatial Econometrics & Outlook
2 / 47

More on the Geospatial Data Landscape

Geospatial data tend to be quite big

  • pressure to distribute data efficiently

Data dumps (on the internet) may not be helpful

  • when resources are low
  • time's a factor
  • the data have a large geographic extent

Often a Programming Application Interfaces (API) is used

3 / 47

What Is an API?

An Application Programming Interface (API) serves as an entry point to data distributed over the internet to

  • get data
  • push data

Standardized mechanisms

  • to query data
  • often just a simple text string to enter in your URL address bar
  • it gets complicated when login data are required
4 / 47

Data Providers Offering Geospatial Data APIs

5 / 47

Example: Access to Public Transport

Say, we're interested in the accessibility of public transport in Cologne

  • bus, tram, etc.
  • all platforms and vehicles should be wheel-chair accessible

We can gather this information using OpenStreetMap!

6 / 47

Accessing OSM Data: The Overpass API

The Overpass API (formerly known as OSM Server Side Scripting, or OSM3S before 2011) is a read-only API that serves up custom selected parts of the OSM map data. It acts as a database over the web: the client sends a query to the API and returns the data set that corresponds to the query.

Source: https://wiki.openstreetmap.org/wiki/Overpass_API

7 / 47

Starting With a Geographic Area to Query

Many geospatial API requests start with a bounding box or another geographical extent to define which area should be accessed.

cologne_pt_stops <-
osmdata::getbb(
"Köln"
)
8 / 47

Starting With a Geographic Area to Query

Many geospatial API requests start with a bounding box or another geographical extent to define which area should be accessed.

cologne_pt_stops <-
osmdata::getbb(
"Köln"
)

9 / 47

Defining Some Technical Details

The Overpass API also requires a timeout parameter that repeats the request for a while if anything fails.

cologne_pt_stops <-
cologne_pt_stops |>
osmdata::opq(timeout = 25*100)
cologne_pt_stops
## $bbox
## [1] "50.8304399,6.7725303,51.0849743,7.162028"
##
## $prefix
## [1] "[out:xml][timeout:2500];\n(\n"
##
## $suffix
## [1] ");\n(._;>;);\nout body;"
##
## $features
## NULL
##
## $osm_types
## [1] "node" "way" "relation"
##
## attr(,"class")
## [1] "list" "overpass_query"
## attr(,"nodes_only")
## [1] FALSE
10 / 47

Turning to the Content

The content we aim to request is defined with key/value pairs. It's best to learn them by looking them up in the official documentation.

cologne_pt_stops <-
cologne_pt_stops |>
osmdata::add_osm_feature(key = "public_transport", value = "stop_position")
cologne_pt_stops
## $bbox
## [1] "50.8304399,6.7725303,51.0849743,7.162028"
##
## $prefix
## [1] "[out:xml][timeout:2500];\n(\n"
##
## $suffix
## [1] ");\n(._;>;);\nout body;"
##
## $features
## [1] "[\"public_transport\"=\"stop_position\"]"
##
## $osm_types
## [1] "node" "way" "relation"
##
## attr(,"class")
## [1] "list" "overpass_query"
## attr(,"nodes_only")
## [1] FALSE
11 / 47

Conduct Actual Request/Download

Data is finally downloaded in the osmdata package, e.g., using the osmdata::osmdata_sf() function.

cologne_pt_stops <-
cologne_pt_stops |>
osmdata::osmdata_sf()
cologne_pt_stops
12 / 47

Filter and Transform

The data comprises a list that can be accessed as any list in R. Here, we extract the points and wrangle them spatially.

cologne_pt_stops <-
cologne_pt_stops$osm_points |>
tibble::as_tibble() |>
sf::st_as_sf() |>
sf::st_transform(3035) |>
dplyr::filter(wheelchair == "yes")
cologne_pt_stops
## Simple feature collection with 597 features and 89 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 4094554 ymin: 3084876 xmax: 4121662 ymax: 3112024
## Projected CRS: ETRS89-extended / LAEA Europe
## # A tibble: 597 × 90
## osm_id name FIXME `VRS:gemeinde` `VRS:name` `VRS:old_ref` `VRS:ortsteil`
## * <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 361716 Eifel… <NA> KÖLN <NA> <NA> Innenstadt
## 2 388550 Brahm… <NA> KÖLN <NA> <NA> Lindenthal
## 3 21033643 Weins… <NA> KÖLN Weinsberg… <NA> Ehrenfeld
## 4 27042401 Markt <NA> BERGISCH GLAD… Bergisch … <NA> Mitte
## 5 28122005 Heuma… <NA> KÖLN <NA> <NA> Innenstadt
## 6 28301370 Kalke… <NA> KÖLN Kalker Fr… <NA> Merheim
## 7 28301373 Merhe… <NA> KÖLN <NA> <NA> Merheim
## 8 28301416 Refra… <NA> BERGISCH GLAD… <NA> <NA> Refrath
## 9 30145575 Köln-… <NA> KÖLN Holweide … <NA> Holweide
## 10 30388742 Mülhe… <NA> <NA> <NA> <NA> <NA>
## # ℹ 587 more rows
## # ℹ 83 more variables: `VRS:ref` <chr>, alt_name <chr>, ashtray <chr>,
## # bench <chr>, bin <chr>, bus <chr>, bus_bay <chr>, bus_lines <chr>,
## # check_date <chr>, `check_date:crossing` <chr>, covered <chr>,
## # crossing <chr>, departures_board <chr>, description <chr>,
## # direction <chr>, disused <chr>, fixme <chr>, `gtfs:name` <chr>,
## # `gtfs:stop_id` <chr>, highway <chr>, image <chr>, internet_access <chr>, …
13 / 47

The Data Indeed Are Mappable

tm_shape(cologne_pt_stops) +
tm_dots()
14 / 47

The Data Indeed Are Mappable

tm_shape(cologne_pt_stops) +
tm_dots()

15 / 47

Fiddling With the Data: Creating a Quick 'Heatmap'

OpenStreetMap points data are nice for analyzing urban infrastructure. Let's draw a quick 'heatmap' using kernel densities.

cologne_pt_stops_densities <-
cologne_pt_stops |>
sf::as_Spatial() |>
as("ppp") |>
spatstat.explore::density.ppp(sigma = 500) |>
terra::rast()
terra::crs(cologne_pt_stops_densities) <- "epsg:3035"
16 / 47

Fiddling With the Data: Creating a Quick 'Heatmap'

OpenStreetMap points data are nice for analyzing urban infrastructure. Let's draw a quick 'heatmap' using kernel densities.

cologne_pt_stops_densities <-
cologne_pt_stops |>
sf::as_Spatial() |>
as("ppp") |>
spatstat.explore::density.ppp(sigma = 500) |>
terra::rast()
terra::crs(cologne_pt_stops_densities) <- "epsg:3035"

17 / 47

Exercise 2_1_1: Working with OSM Data

Exercise

Solution

18 / 47

Accessing Unpackaged (Vector) Data

Not all data come as pretty as OpenStreetMap data in, e.g., the osmdata package.

Don't worry. There are methods to import data from a source that

  • only provide a URL
  • not yet prepared for analysis
19 / 47

Example: GeoJSON Files

JSON files are pretty popular

  • standardized and well-structured
  • similar to XML-files, but human readability is a bit better
  • also easy to parse for editors and browser

There's also a JSON flavor for geospatial data...

20 / 47

GeoJSON Snippet

{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": 12,
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
6.957362270020273,
50.94308762750329
]
...

Source: https://www.offenedaten-koeln.de/

21 / 47

An Application From Cologne’s Open Data Portal

park_and_ride <-
glue::glue(
"https://geoportal.stadt-koeln.de/arcgis/rest/services/verkehr/",
"verkehrskalender/MapServer/8/query?where=objectid+is+not+null&text=&",
"objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&",
"spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&",
"relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&",
"maxAllowableOffset=&geometryPrecision=&outSR=4326&havingClause=&",
"returnIdsOnly=false&returnCountOnly=false&orderByFields=&",
"groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&",
"gdbVersion=&historicMoment=&returnDistinctValues=false&resultOffset=&",
"resultRecordCount=&returnExtentOnly=false&datumTransformation=&",
"parameterValues=&rangeValues=&quantizationParameters=&featureEncoding=",
"esriDefault&f=pjson"
) |>
sf::st_read(as_tibble = TRUE)
## Reading layer `file' from data source
## `https://geoportal.stadt-koeln.de/arcgis/rest/services/verkehr/verkehrskalender/MapServer/8/query?where=objectid+is+not+null&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=4326&havingClause=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&historicMoment=&returnDistinctValues=false&resultOffset=&resultRecordCount=&returnExtentOnly=false&datumTransformation=&parameterValues=&rangeValues=&quantizationParameters=&featureEncoding=esriDefault&f=pjson'
## using driver `ESRIJSON'
## Simple feature collection with 24 features and 9 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 6.814912 ymin: 50.84826 xmax: 7.16169 ymax: 51.05211
## Geodetic CRS: WGS 84

Source: https://www.offenedaten-koeln.de/dataset/park-and-ride-anlagen-koeln

22 / 47

Park & Ride Parking Spaces

With just two 'simple' commands, we can retrieve geospatial data about Cologne's Park & Ride parking spaces in R. Not too bad, right?

tm_shape(park_and_ride) +
tm_dots()
23 / 47

Park & Ride Parking Spaces

With just two 'simple' commands, we can retrieve geospatial data about Cologne's Park & Ride parking spaces in R. Not too bad, right?

tm_shape(park_and_ride) +
tm_dots()

24 / 47

Raster Data Access

It is not only vector data that can be processed through these mechanisms.

The idea is the same for raster data

  • accessing the information through URLs
  • just the downloaded data formats differ
25 / 47

Example: Downloading German Weather Data

The German Weather Service provides excellent weather data in its Climate Data Center (CDC): https://opendata.dwd.de/climate_environment/CDC/

Let's download some summer temperature data using a custom function.

download_dwd_data <- function(url, path) {
download.file(url, dest = paste0(path, "/", "lyr.1.asc.gz"))
R.utils::gunzip(
paste0(path, "/", "lyr.1.asc.gz"),
overwrite = TRUE,
remove = TRUE
)
raster_file <-
terra::rast(paste0(path, "/", "lyr.1.asc"))
unlink(paste0(path, "/", "lyr.1.asc.gz"))
raster_file
}
26 / 47

Voilà

paste0(
"https://opendata.dwd.de/climate_environment/CDC/grids_germany/seasonal/",
"air_temperature_max/14_JJA/grids_germany_seasonal_air_temp_max_202314.",
"asc.gz") |>
download_dwd_data(path = "./data/") |>
terra::plot()
27 / 47

Voilà

paste0(
"https://opendata.dwd.de/climate_environment/CDC/grids_germany/seasonal/",
"air_temperature_max/14_JJA/grids_germany_seasonal_air_temp_max_202314.",
"asc.gz") |>
download_dwd_data(path = "./data/") |>
terra::plot()

28 / 47

OSM Data Can Be Gathered As Raster Data, Too

The tmaptools package is handy for downloading OpenStreetMap data (tiles).

cologne_raster <-
tmaptools::read_osm(park_and_ride) |>
terra::rast()
cologne_raster
## class : SpatRaster
## dimensions : 939, 1006, 3 (nrow, ncol, nlyr)
## resolution : 38.37281, 38.36172 (x, y)
## extent : 758632.6, 797235.6, 6594496, 6630517 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / Pseudo-Mercator
## source(s) : memory
## names : red, green, blue
## min values : 3, 2, 2
## max values : 254, 254, 254
29 / 47

Mapped OSM Raster Data

The resulting data can be packed in a tmap workflow.

tm_shape(cologne_raster) +
tm_rgb()
30 / 47

Mapped OSM Raster Data

The resulting data can be packed in a tmap workflow.

tm_shape(cologne_raster) +
tm_rgb()

31 / 47

Use It As a Background Map

These data are images, so they are perfect for use as background maps when mapping other geospatial attributes, such as our park and ride data.

tm_shape(cologne_raster) +
tm_rgb() +
tm_shape(park_and_ride) +
tm_dots(size = .3)
32 / 47

Use It As a Background Map

These data are images, so they are perfect for use as background maps when mapping other geospatial attributes, such as our park and ride data.

tm_shape(cologne_raster) +
tm_rgb() +
tm_shape(park_and_ride) +
tm_dots(size = .3)

33 / 47

Playing With Different Map Types

A list of available type names can be seen with the function call OpenStreetMap::getMapInfo().

34 / 47

Playing With Different Map Types

A list of available type names can be seen with the function call OpenStreetMap::getMapInfo().

tmaptools::read_osm(
park_and_ride,
type = "bing"
) |>
tm_shape() +
tm_rgb() +
tm_shape(park_and_ride) +
tm_dots(size = .3, col = "red")

35 / 47

Playing With Different Map Types

A list of available type names can be seen with the function call OpenStreetMap::getMapInfo().

tmaptools::read_osm(
park_and_ride,
type = "bing"
) |>
tm_shape() +
tm_rgb() +
tm_shape(park_and_ride) +
tm_dots(size = .3, col = "red")

tmaptools::read_osm(
park_and_ride,
type = "esri-topo"
) |>
tm_shape() +
tm_rgb() +
tm_shape(park_and_ride) +
tm_dots(size = .3)

36 / 47

Shameless Advertising: The z11 Package

There are times when no APIs are available to download geospatial data.

The German Census 2011 is a prime example

  • only a data dump of Gigabytes of data exists for hundreds of attributes
  • when you often work with these data, it's a pain

Thus, I created my own (pseudo-)API for these data hosted over Github.

37 / 47

Accessing Data

Details on using the package can be found here, but it's straightforward. For example, if you want to download data on immigrant rates on a 1 km² grid, you can use the following function.

immigrants_germany <-
z11::z11_get_1km_attribute(Auslaender_A)
immigrants_germany[immigrants_germany <= -1] <- NA
immigrants_germany
## class : SpatRaster
## dimensions : 867, 641, 1 (nrow, ncol, nlyr)
## resolution : 1000, 1000 (x, y)
## extent : 4031500, 4672500, 2684500, 3551500 (xmin, xmax, ymin, ymax)
## coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035)
## source(s) : memory
## name : Auslaender_A
## min value : 0
## max value : 100
38 / 47

It’s a Raster

As it is a raster file, you can plot it easily.

39 / 47

It’s a Raster

As it is a raster file, you can plot it easily.

tm_shape(immigrants_germany) +
tm_raster(palette = "viridis")
40 / 47

It’s a Raster

As it is a raster file, you can plot it easily.

tm_shape(immigrants_germany) +
tm_raster(palette = "viridis")

41 / 47

Exercise 2_1_2: Wrangling the German Census

Exercise

Solution

42 / 47

Add-on Slides

43 / 47

OGC Web Services

OSM and others provide a standardized interface to their data.

The Open Geospatial Consortium developed a more broadly used interface design for more diverse data sources

  • often used by public authorities all over the world
  • support of well-supported data formats
  • good documentation
44 / 47

Displaying vs. Downloading

These web services can broadly be divided into services to

  • display data
  • download data

Let's briefly focus on download services!

45 / 47

Download Services

Web Feature Service (WFS)

  • vector data

Web Coverage Service (WCS)

  • raster data

Unfortunately, as of today, no ready-to-play packages are providing full access to OWS services in R

  • The ows4R package lets you only use WFS services
  • But you could establish an interface with Python and use its OWSLib to access WCS data (see add-on slides.
46 / 47

Now

Day Time Title
April 23 10:00-11:30 Introduction to GIS
April 23 11:45-13:00 Vector Data
April 23 13:00-14:00 Lunch Break
April 23 14:00-15:30 Mapping
April 23 15:45-17:00 Raster Data
April 24 09:00-10:30 Advanced Data Import & Processing
April 24 10:45-12:00 Applied Data Wrangling & Linking
April 24 12:00-13:00 Lunch Break
April 24 13:00-14:30 Investigating Spatial Autocorrelation
April 24 14:45-16:00 Spatial Econometrics & Outlook
2 / 47
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow