In this final set of exercises for the second data wrangling session, we want to work with factors and conditional recoding.

The packages we need here are the same as before (sjlabelled, tidyverse, haven) plus the naniar package.

As in the previous exercises, to be on the safe side, we can import the data once more.

allbus_2021 <- read_sav("./data/allbus_2021/ZA5280_v1-0-0.sav") %>% 
  remove_all_labels()

1

First, let’s create an unordered factor based on the variable representing marital status (mstat). The factor levels should be the English translations of the first five value labels listed in the codebook: “Married and living with spouse”, “Married and living separately”, “Widowed”, “Divorced”, “Unmarried”. Note: For simplicity, we only want to focus on five categories here.
The dplyr function we need to use here (in combination with mutate()) is recode_factor().

2

Based on what we did in task 1, let’s create another unordered factor named unmarried that has the value/level “unmarried” if the respondent is not and has never been married and “is or has been married” otherwise. Note: For creating the new factor variable, we can use the as.factor() function from base R.
For this simple conditional recode, we can use the ifelse() function from base R.

3

Finally, let’s create an ordered factor based on the numeric income variable (di01a) named inc_cat with the following levels: “up to 1499 Euro”, “1500 to 2499 Euro”, “2500 to 3499 Euro”, “3500 to 4499 Euro”, “4500 to 5499 Euro”, “5500 to 6499 Euro”, “more than 6500 Euro”.
We can use case_when() from dplyr for the conditional recode based on the numeric income variable and combine it with factor from base R for creating an ordered factor. We can also use the between() (helper) function we have encountered in the first data wrangling session with(in) case_when(). NB: For the levels to be in the correct order, we need to specify this within the factor() function.