class: center, middle, inverse, title-slide .title[ # Introduction to Geospatial Techniques for Social Scientists in R ] .subtitle[ ## Vector Data ] .author[ ### Stefan Jünger & Anne-Kathrin Stroppe ] .institute[ ###
GESIS Workshop
] .date[ ### April 23, 2024 ] --- layout: true --- ## Now <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Title </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> April 23 </td> <td style="text-align:left;color: gray !important;"> 10:00-11:30 </td> <td style="text-align:left;font-weight: bold;"> Introduction to GIS </td> </tr> <tr> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> April 23 </td> <td style="text-align:left;color: gray !important;background-color: yellow !important;"> 11:45-13:00 </td> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Vector Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> April 23 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 13:00-14:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 23 </td> <td style="text-align:left;color: gray !important;"> 14:00-15:30 </td> <td style="text-align:left;font-weight: bold;"> Mapping </td> </tr> <tr> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> April 23 </td> <td style="text-align:left;color: gray !important;border-bottom: 1px solid"> 15:45-17:00 </td> <td style="text-align:left;font-weight: bold;border-bottom: 1px solid"> Raster Data </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;"> 09:00-10:30 </td> <td style="text-align:left;font-weight: bold;"> Advanced Data Import & Processing </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;"> 10:45-12:00 </td> <td style="text-align:left;font-weight: bold;"> Applied Data Wrangling & Linking </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00-13:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;"> 13:00-14:30 </td> <td style="text-align:left;font-weight: bold;"> Investigating Spatial Autocorrelation </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> April 24 </td> <td style="text-align:left;color: gray !important;"> 14:45-16:00 </td> <td style="text-align:left;font-weight: bold;"> Spatial Econometrics & Outlook </td> </tr> </tbody> </table> --- ## Why care about data types and formats? There are differences in the way spatial information is stored, processed, and visually represented. - Different commands for data import and manipulation - Spatial linking techniques and analyses partly determined by data format - Visualization of data can vary So: Always know what kind of data you are dealing with! --- ## Representing the world in vectors .pull-left[ .center[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/world-cities-1.png" width="120%" style="display: block; margin: auto;" /> ] ] .pull-right[ The surface of the earth is represented by simple geometries and attributes. Each object is defined by longitude (x) and latitude (y) values. ] --- ## Vector data: Geometries .pull-left[ Every real-world feature is one of three types of geometry: - Points: discrete location (e.g., city) - Lines: linear feature (e.g., river) - Polygons: enclosed area (e.g, country, administrative boundaries) ] .pull-right[ <img src="data:image/png;base64,#../img/vector_geometries.png" width="90%" style="display: block; margin: auto;" /> <br> <small><small><small> National Ecological Observatory Network (NEON), cited by [Datacarpentry](https://datacarpentry.org/organization-geospatial/instructor/02-intro-vector-data.html)</small></small></small> ] --- ## Vector data: Attribute tables Only geometries means that we do not have any other information. We need to assign attributes to each geometry to hold additional information `\(\rightarrow\)` data tables called attribute tables - Each row represents a geometric object, which we can also call observation or case. - Each column holds an attribute or, in "our" language, a variable. --- ## Vector data: Attribute tables .center[ <img src="data:image/png;base64,#../img/attr_table.png" width="90%" style="display: block; margin: auto;" /> ] --- ## New best friend: Shapefiles Both the geometric information and attribute table can be saved within one file. Rather often, *ESRI Shapefiles* are used to store vector data. Shapefiles consist of at least three mandatory files with the extensions: - .shp : shape format - .shx : shape index format - .dbf : attribute format - (.prj: CRS/Projection) You don't have to remember what they stand for, but you can only load the data if one of those files is missing. --- ## Welcome to `simple features` .pull-left[ .small[ Several packages are out there to wrangle and visualize spatial and, especially, vector data within `R`. We will use a package called `sf` ("simple features"). Why? `simple features` refers to a formal standard representing spatial geometries and supports interfaces to other programming languages and GIS systems. ] ] .pull-right[ <img src="data:image/png;base64,#../img/sf.jpg" width="1600" style="display: block; margin: auto;" /> <small><small>Illustration by [Allison Horst](https://allisonhorst.com/r-packages-functions) </small></small> ] --- ## Load a shapefile The first step is, of course, loading the data. We want to import the shapefile the administrative borders of the German states (*Bundesländer*) called `VG250_LAN.shp`. ```r # load library library(sf) # load data german_states <- sf::read_sf("./data/VG250_LAN.shp") ``` --- ## Inspect your data: Classics Let's have a quick look at the imported data. Like every other data set, we inspect the data to check some metadata and see if the importing worked correctly. ```r # object type class(german_states) ``` ``` ## [1] "sf" "tbl_df" "tbl" "data.frame" ``` ```r # number of rows nrow(german_states) ``` ``` ## [1] 35 ``` ```r # number of columns ncol(german_states) ``` ``` ## [1] 24 ``` --- ## Inspect your data: Classics You can see that there are no huge differences between the shapefile we just imported and a regular data table. ```r # head of data table head(german_states, 2) ``` ``` ## Simple feature collection with 2 features and 23 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 426205.6 ymin: 5913462 xmax: 650128.7 ymax: 6101487 ## Projected CRS: ETRS89 / UTM zone 32N ## # A tibble: 2 × 24 ## ADE GF BSG ARS AGS SDV_ARS GEN BEZ IBZ BEM NBD SN_L SN_R ## <int> <int> <int> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr> <chr> ## 1 2 4 1 01 01 0100200000… Schl… Land 20 -- ja 01 0 ## 2 2 4 1 02 02 0200000000… Hamb… Frei… 22 -- ja 02 0 ## # ℹ 11 more variables: SN_K <chr>, SN_V1 <chr>, SN_V2 <chr>, SN_G <chr>, ## # FK_S3 <chr>, NUTS <chr>, ARS_0 <chr>, AGS_0 <chr>, WSK <date>, DEBKG_ID <chr>, ## # geometry <MULTIPOLYGON [m]> ``` --- ## Inspect your data: Spatial features Besides our general data inspection, we also want to check the spatial features of our import. This includes the geometric type (points? lines? polygons?) and the coordinate reference system. ```r # type of geometry sf::st_geometry(german_states) ``` ``` ## Geometry set for 35 features ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6106244 ## Projected CRS: ETRS89 / UTM zone 32N ## First 5 geometries: ``` --- ## Inspect your data: Spatial features Each polygon is defined by several points that are connected to build an enclosed area. Several polygons in one data frame have the `sf` type multipolygons. Just as Germany consists of several states, the polygon Germany consists of several smaller polygons. ```r # the simple features column attr(german_states, "sf_column") ``` ``` ## [1] "geometry" ``` ```r # further inspecting dplyr::glimpse(german_states$geometry) ``` ``` ## sfc_MULTIPOLYGON of length 35; first list element: List of 28 ## $ :List of 1 ## ..$ : num [1:11707, 1:2] 464811 464937 465073 465235 465354 ... ## $ :List of 1 ## ..$ : num [1:757, 1:2] 634993 635176 635330 635473 635552 ... ## $ :List of 1 ## ..$ : num [1:306, 1:2] 471895 472061 472272 472312 472384 ... ## $ :List of 1 ## ..$ : num [1:190, 1:2] 480925 480903 481160 481210 481281 ... ## $ :List of 1 ## ..$ : num [1:230, 1:2] 457616 457561 457500 457464 457503 ... ## $ :List of 1 ## ..$ : num [1:234, 1:2] 477852 477719 477605 477561 477535 ... ## $ :List of 1 ## ..$ : num [1:117, 1:2] 468857 468880 468943 469077 469224 ... ## $ :List of 1 ## ..$ : num [1:105, 1:2] 534788 534808 534478 534402 534334 ... ## $ :List of 1 ## ..$ : num [1:123, 1:2] 535979 536027 536039 536060 536084 ... ## $ :List of 1 ## ..$ : num [1:68, 1:2] 483056 483131 482991 482989 482964 ... ## $ :List of 1 ## ..$ : num [1:98, 1:2] 479878 479947 479994 480009 480007 ... ## $ :List of 1 ## ..$ : num [1:68, 1:2] 488077 488141 488316 488413 488445 ... ## $ :List of 1 ## ..$ : num [1:39, 1:2] 527492 527482 527421 527332 527237 ... ## $ :List of 1 ## ..$ : num [1:60, 1:2] 480594 480694 480760 481193 481251 ... ## $ :List of 1 ## ..$ : num [1:59, 1:2] 427174 427275 427432 427585 427458 ... ## $ :List of 1 ## ..$ : num [1:43, 1:2] 429524 429512 429537 429550 429563 ... ## $ :List of 1 ## ..$ : num [1:35, 1:2] 488694 488986 489140 489227 489050 ... ## $ :List of 1 ## ..$ : num [1:38, 1:2] 471007 470960 470930 470874 470840 ... ## $ :List of 1 ## ..$ : num [1:42, 1:2] 482577 482657 482684 482774 482823 ... ## $ :List of 1 ## ..$ : num [1:39, 1:2] 536845 536814 536782 536747 536704 ... ## $ :List of 1 ## ..$ : num [1:34, 1:2] 485251 485277 485279 485261 485235 ... ## $ :List of 1 ## ..$ : num [1:28, 1:2] 635056 635121 635182 635220 635266 ... ## $ :List of 1 ## ..$ : num [1:23, 1:2] 468313 468243 468176 468114 468108 ... ## $ :List of 1 ## ..$ : num [1:14, 1:2] 547663 547739 547807 547906 548038 ... ## $ :List of 1 ## ..$ : num [1:34, 1:2] 459204 459240 459271 459299 459325 ... ## $ :List of 1 ## ..$ : num [1:33, 1:2] 536538 536569 536602 536621 536612 ... ## $ :List of 1 ## ..$ : num [1:15, 1:2] 643372 643346 643314 643289 643245 ... ## $ :List of 1 ## ..$ : num [1:7, 1:2] 546233 546342 545809 545894 546030 ... ## - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg" ``` --- ## Inspect your data: Spatial features Remember: The Coordinate Reference System is very important. A crucial step is to check the CRS of your geospatial data. ```r # coordinate reference system sf::st_crs(german_states) ``` ``` ## Coordinate Reference System: ## User input: ETRS89 / UTM zone 32N ## wkt: ## PROJCRS["ETRS89 / UTM zone 32N", ## BASEGEOGCRS["ETRS89", ## ENSEMBLE["European Terrestrial Reference System 1989 ensemble", ## MEMBER["European Terrestrial Reference Frame 1989"], ## MEMBER["European Terrestrial Reference Frame 1990"], ## MEMBER["European Terrestrial Reference Frame 1991"], ## MEMBER["European Terrestrial Reference Frame 1992"], ## MEMBER["European Terrestrial Reference Frame 1993"], ## MEMBER["European Terrestrial Reference Frame 1994"], ## MEMBER["European Terrestrial Reference Frame 1996"], ## MEMBER["European Terrestrial Reference Frame 1997"], ## MEMBER["European Terrestrial Reference Frame 2000"], ## MEMBER["European Terrestrial Reference Frame 2005"], ## MEMBER["European Terrestrial Reference Frame 2014"], ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]], ## ENSEMBLEACCURACY[0.1]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4258]], ## CONVERSION["UTM zone 32N", ## METHOD["Transverse Mercator", ## ID["EPSG",9807]], ## PARAMETER["Latitude of natural origin",0, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8801]], ## PARAMETER["Longitude of natural origin",9, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8802]], ## PARAMETER["Scale factor at natural origin",0.9996, ## SCALEUNIT["unity",1], ## ID["EPSG",8805]], ## PARAMETER["False easting",500000, ## LENGTHUNIT["metre",1], ## ID["EPSG",8806]], ## PARAMETER["False northing",0, ## LENGTHUNIT["metre",1], ## ID["EPSG",8807]]], ## CS[Cartesian,2], ## AXIS["(E)",east, ## ORDER[1], ## LENGTHUNIT["metre",1]], ## AXIS["(N)",north, ## ORDER[2], ## LENGTHUNIT["metre",1]], ## USAGE[ ## SCOPE["Engineering survey, topographic mapping."], ## AREA["Europe between 6°E and 12°E: Austria; Belgium; Denmark - onshore and offshore; Germany - onshore and offshore; Norway including - onshore and offshore; Spain - offshore."], ## BBOX[38.76,6,84.33,12.01]], ## ID["EPSG",25832]] ``` --- ## `sf::st_transform()` When a CRS is messed up, or one wants to combine data with non-matching CRS, it will all go downwards. The good thing is that the command `sf::st_transform()` allows us to *translate* our spatial data from one coordinate reference system to another. ```r # transform crs german_states <- sf::st_transform(german_states, crs = 3035) # check crs sf::st_crs(german_states) ``` ``` ## Coordinate Reference System: ## User input: EPSG:3035 ## wkt: ## PROJCRS["ETRS89-extended / LAEA Europe", ## BASEGEOGCRS["ETRS89", ## ENSEMBLE["European Terrestrial Reference System 1989 ensemble", ## MEMBER["European Terrestrial Reference Frame 1989"], ## MEMBER["European Terrestrial Reference Frame 1990"], ## MEMBER["European Terrestrial Reference Frame 1991"], ## MEMBER["European Terrestrial Reference Frame 1992"], ## MEMBER["European Terrestrial Reference Frame 1993"], ## MEMBER["European Terrestrial Reference Frame 1994"], ## MEMBER["European Terrestrial Reference Frame 1996"], ## MEMBER["European Terrestrial Reference Frame 1997"], ## MEMBER["European Terrestrial Reference Frame 2000"], ## MEMBER["European Terrestrial Reference Frame 2005"], ## MEMBER["European Terrestrial Reference Frame 2014"], ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]], ## ENSEMBLEACCURACY[0.1]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4258]], ## CONVERSION["Europe Equal Area 2001", ## METHOD["Lambert Azimuthal Equal Area", ## ID["EPSG",9820]], ## PARAMETER["Latitude of natural origin",52, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8801]], ## PARAMETER["Longitude of natural origin",10, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8802]], ## PARAMETER["False easting",4321000, ## LENGTHUNIT["metre",1], ## ID["EPSG",8806]], ## PARAMETER["False northing",3210000, ## LENGTHUNIT["metre",1], ## ID["EPSG",8807]]], ## CS[Cartesian,2], ## AXIS["northing (Y)",north, ## ORDER[1], ## LENGTHUNIT["metre",1]], ## AXIS["easting (X)",east, ## ORDER[2], ## LENGTHUNIT["metre",1]], ## USAGE[ ## SCOPE["Statistical analysis."], ## AREA["Europe - European Union (EU) countries and candidates. Europe - onshore and offshore: Albania; Andorra; Austria; Belgium; Bosnia and Herzegovina; Bulgaria; Croatia; Cyprus; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Germany; Gibraltar; Greece; Hungary; Iceland; Ireland; Italy; Kosovo; Latvia; Liechtenstein; Lithuania; Luxembourg; Malta; Monaco; Montenegro; Netherlands; North Macedonia; Norway including Svalbard and Jan Mayen; Poland; Portugal including Madeira and Azores; Romania; San Marino; Serbia; Slovakia; Slovenia; Spain including Canary Islands; Sweden; Switzerland; Türkiye (Turkey); United Kingdom (UK) including Channel Islands and Isle of Man; Vatican City State."], ## BBOX[24.6,-35.58,84.73,44.83]], ## ID["EPSG",3035]] ``` --- ## A very, very first map For inspecting the data and check if we actually loaded what we want to load, we can have a very first glimpse. ```r # plot sf object plot(german_states) ``` .center[ <img src="data:image/png;base64,#../img/plot_german_states.png" width="60%" style="display: block; margin: auto;" /> ] --- ## Import point layer Unfortunately, the data we want to visualize or analyze are not always available as shapefiles. Point coordinates are often stored in table formats like `.csv` - as is the location of charging stations for electric cars in our `./data` folder. ```r echarging_df <- readr::read_delim("./data/charging_points_ger.csv", delim =";") head(echarging_df) ``` ``` ## # A tibble: 6 × 7 ## operator federal_state latitude longitude power_kw type num_plugs ## <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 deer GmbH Baden-Württe… 48.3 9.72 44 Norm… 2 ## 2 EnBW mobility+ AG und Co… Baden-Württe… 48.6 9.87 93 Schn… 2 ## 3 SWU Energie GmbH Baden-Württe… 48.5 10.2 44 Norm… 2 ## 4 SWU Energie GmbH Baden-Württe… 48.6 10.1 44 Norm… 2 ## 5 SWU Energie GmbH Baden-Württe… 48.2 10.1 22 Norm… 1 ## 6 EnBW mobility+ AG und Co… Baden-Württe… 48.5 9.98 30 Norm… 2 ``` --- ## From data table to geospatial data We see that besides our attributes (e.g., operator, power,...), the table contains the two variables "longitude" (X) and "latitude" (Y), our point coordinates. When using the command `sf::st_as_sf()`, it is easy to transform the table into a point layer. .pull-left[ ```r # transform to spatial data frame echarging_sf <- sf::st_as_sf( echarging_df %>% # there were some missings in my data that # are not allowed filter(!is.na(longitude) & !is.na(latitude)), coords = c("longitude", "latitude") ) # inspect data class(echarging_sf) sf::st_geometry(echarging_sf) ``` ] .pull-right[ ``` ## [1] "sf" "tbl_df" "tbl" "data.frame" ``` ``` ## Geometry set for 60549 features ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: 5.243745 ymin: 47.2844 xmax: 15.54381 ymax: 55.50014 ## CRS: NA ## First 5 geometries: ``` ] --- ## Our point data ```r plot(echarging_sf) ``` <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/plot-charging-1.png" width="55%" style="display: block; margin: auto;" /> --- ## Once again: Check the CRS! Make sure to use the option `crs = [EPSG_ID]`. If not used, your CRS will not be defined, and you can't perform further commands depending on the CRS. Here, I tried [EPSG IO](https://epsg.io) or [http://projfinder.com/](http://projfinder.com/) to find out. .pull-left[ ```r # transform to spatial data frame echarging_sf <- sf::st_as_sf( echarging_df %>% # there were some missings in my data that # are not allowed dplyr::filter(!is.na(longitude) & !is.na(latitude)), coords = c("longitude","latitude"), crs = 4326 ) ``` ] .pull-right[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/plot-charging-crs-1.png" style="display: block; margin: auto;" /> ] --- ## ... and the other way round Do you want to go back to handling a simple data frame? You can quickly achieve this by dropping the geometry column. ```r # check class class(german_states) ``` ``` ## [1] "sf" "tbl_df" "tbl" "data.frame" ``` ```r # remove geometry sf::st_drop_geometry(german_states) %>% head(2) ``` ``` ## # A tibble: 2 × 23 ## ADE GF BSG ARS AGS SDV_ARS GEN BEZ IBZ BEM NBD SN_L SN_R ## <int> <int> <int> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr> <chr> ## 1 2 4 1 01 01 0100200000… Schl… Land 20 -- ja 01 0 ## 2 2 4 1 02 02 0200000000… Hamb… Frei… 22 -- ja 02 0 ## # ℹ 10 more variables: SN_K <chr>, SN_V1 <chr>, SN_V2 <chr>, SN_G <chr>, ## # FK_S3 <chr>, NUTS <chr>, ARS_0 <chr>, AGS_0 <chr>, WSK <date>, DEBKG_ID <chr> ``` --- class: middle ## Exercise 1_2_1: Import Vector Data [Exercise](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/exercises/1_2_1_Import_Vector_Data.html) [Solution](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/solutions/1_2_1_Import_Vector_Data.html) --- ## Data wrangling After importing the data sets, we are now ready to manipulate our data. We are working with the `dplyr` package to manipulate the data frames for all regular data wrangling tasks. But if you are used to working with the base R language, feel free to do so. .center[ <img src="data:image/png;base64,#../img/tidyverse.png" width="50%" style="display: block; margin: auto;" /> <small><small>Meme found on [Reddit](https://www.reddit.com/r/Rlanguage/comments/anv1d5/my_meme_of_the_day/?utm_source=share&utm_medium=web2x&context=3) </small></small> ] --- ## Data Intro: German districts We're moving "a layer down" and looking at Germany on a more fine-grained spatial level: the district. ```r german_districts <- sf::read_sf("./data/VG250_KRS.shp") %>% sf::st_transform(crs = 3035) german_districts ``` ``` ## Simple feature collection with 431 features and 23 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 4031313 ymin: 2684076 xmax: 4672526 ymax: 3551489 ## Projected CRS: ETRS89-extended / LAEA Europe ## # A tibble: 431 × 24 ## ADE GF BSG ARS AGS SDV_ARS GEN BEZ IBZ BEM NBD SN_L SN_R ## * <int> <int> <int> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr> <chr> ## 1 4 4 1 01001 01001 010010000… Flen… Krei… 40 -- ja 01 0 ## 2 4 4 1 01002 01002 010020000… Kiel Krei… 40 -- ja 01 0 ## 3 4 4 1 01003 01003 010030000… Lübe… Krei… 40 -- ja 01 0 ## 4 4 4 1 01004 01004 010040000… Neum… Krei… 40 -- ja 01 0 ## 5 4 4 1 01051 01051 010510044… Dith… Kreis 42 -- ja 01 0 ## 6 4 4 1 01053 01053 010530100… Herz… Kreis 42 -- ja 01 0 ## 7 4 4 1 01054 01054 010540056… Nord… Kreis 42 -- ja 01 0 ## 8 4 4 1 01055 01055 010550012… Osth… Kreis 42 -- ja 01 0 ## 9 4 4 1 01056 01056 010560039… Pinn… Kreis 42 -- ja 01 0 ## 10 4 4 1 01057 01057 010570057… Plön Kreis 42 -- ja 01 0 ## # ℹ 421 more rows ## # ℹ 11 more variables: SN_K <chr>, SN_V1 <chr>, SN_V2 <chr>, SN_G <chr>, ## # FK_S3 <chr>, NUTS <chr>, ARS_0 <chr>, AGS_0 <chr>, WSK <date>, DEBKG_ID <chr>, ## # geometry <MULTIPOLYGON [m]> ``` --- ## German districts .small[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/plot_districts-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Data Intro: Attributes Since it would be a bit boring to work with just administrative information, there's an extra table with more attributes called *attributes_districts.csv*. ```r attributes_districts <- readr::read_delim("./data/attributes_districts.csv", delim =";") ``` ``` ## # A tibble: 2 × 7 ## district_id car_density ecar_share publictransport_meandist population ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1001 492. 2.4 176. 92550 ## 2 1002 453. 2.2 191. 247717 ## # ℹ 2 more variables: green_voteshare <dbl>, afd_voteshare <dbl> ``` --- ## Add attributes: Join data table You might already see that we have an ID for the districts (*AGS* and *district_id*) in both data tables. This is a good example of how easily `sf` objects can be handled. ```r german_districts_enhanced <- german_districts %>% dplyr::mutate(district_id = as.numeric(AGS)) %>% dplyr::select(district_id) %>% dplyr::left_join(attributes_districts, by = "district_id") head(german_districts_enhanced, 2) ``` ``` ## Simple feature collection with 2 features and 7 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 4279627 ymin: 3460480 xmax: 4335232 ymax: 3524426 ## Projected CRS: ETRS89-extended / LAEA Europe ## # A tibble: 2 × 8 ## district_id geometry car_density ecar_share publictransport_mean…¹ ## <dbl> <MULTIPOLYGON [m]> <dbl> <dbl> <dbl> ## 1 1001 (((4283235 3524256, 4283… 492. 2.4 176. ## 2 1002 (((4331981 3480575, 4332… 453. 2.2 191. ## # ℹ abbreviated name: ¹publictransport_meandist ## # ℹ 3 more variables: population <dbl>, green_voteshare <dbl>, afd_voteshare <dbl> ``` --- ## Add (more) attributes Besides the regular join, we can also perform a so-called *spatial join*. For example, we want to count the number of charging stations in each German district. ```r # adjust crs first echarging_sf_3035 <- sf::st_transform(echarging_sf, crs=3035) # perform spatial join to identify for each charger the correct district id charger_in_districts <- sf::st_join(echarging_sf_3035, german_districts_enhanced, join = st_within) # count the number of charger within a district charger_districts_count <- dplyr::count(charger_in_districts, district_id, name = "charger_count") # Join the charger count with the German district attributes german_districts_enhanced <- dplyr::left_join( german_districts_enhanced, charger_districts_count %>% sf::st_drop_geometry(), by = "district_id" ) %>% # assumption that their simply is no charger in some districts dplyr::mutate(charger_count = tidyr::replace_na(charger_count, 0)) ``` --- ## Charger Count per District <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/charger-count-plot-1.png" style="display: block; margin: auto;" /> --- ## Subsetting the data One might be interested in only one specific area of Germany, like Cologne. To subset a `sf` object, you can often use your usual data wrangling workflow. In this case, I know the district_id, and that is the only row I want to keep. .pull-left[ ```r cologne <- german_districts_enhanced %>% dplyr::filter(district_id == 5315) %>% dplyr::select(district_id) plot(cologne) ``` ] .pull-right[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/stouches-1.png" style="display: block; margin: auto;" /> ] --- ## Using `sf` for subsetting If you have no information about *ids* but only about the geolocation, you can use `sf::st_touches()` (or `st_touches()`, `st_within()`, `st_intersect()`, `st_crosses()`...) to identify for example all districts which share a border with Cologne. .pull-left[ ```r cologne_surrounding <- german_districts_enhanced %>% dplyr::select(district_id) %>% # length of mutual border > 0 dplyr::filter( lengths(sf::st_touches(., cologne)) > 0 ) plot(cologne_surrounding) ``` ] -- .pull-right[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/surround-1.png" style="display: block; margin: auto;" /> ] --- ## Export the data After Wrangling and adjusting the data, you can save them. There are, again, several options to do so. Two notes: .small[ 1.Be careful when saving shapefiles: column names will automatically be abbreviated! 2.Make sure that the CRS is included in your folder or the file name. ] ```r # Export as Shapefile sf::st_write( german_districts_enhanced, "./participant_materials/districts_enhanced_epsg3035.shp" ) # Export data frame as csv without geometric attributes german_districts_enhanced_df <- sf::st_drop_geometry(german_districts_enhanced) readr::write_csv( german_districts_enhanced, "./participant_materials/german_districts_enhanced.csv" ) # Export data frame as csv with geometric attributes sf::st_write( echarging_sf_3035, "./participant_materials/echarging_epsg3035.csv", layer_options = "GEOMETRY=AS_XY" ) ``` --- class: middle ## Exercise 1_2_2: Manipulate Vector Data [Exercise](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/exercises/1_2_2_Manipulate_Vector_Data.html) [Solution](https://stefanjuenger.github.io/gesis-workshop-geospatial-techniques-R-2024/solutions/1_2_2_Manipulate_Vector_Data.html) --- ## Wrap-Up .pull-left[ We made it through our first session dealing with vector data! You can: - load - transform - manipulate - and export vector data. The next step is producing an awesome map! 🎊 ] .pull-right[ <img src="data:image/png;base64,#1_2_Vector_Data_files/figure-html/cologne-ecars-1.png" style="display: block; margin: auto;" /> ] --- class: middle ## Lunch Break 🌮 .center[ But, that is for after lunch! ] --- layout: false class: center background-image: url(data:image/png;base64,#../assets/img/the_end.png) background-size: cover .left-column[ </br> <img src="data:image/png;base64,#../img/Anne.png" width="75%" style="display: block; margin: auto;" /> ] .right-column[ .left[.small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path> </svg> [anne-kathrin.stroppe@gesis.org](mailto:anne-kathrin.stroppe@gesis.org)] .small[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path> </svg> [`@astroppe`](https://twitter.com/stroppann)] .small[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path> </svg> [`stroppann`](https://github.com/stroppann)] .small[<svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"></path> </svg> [`NA`](NA)]] ]