In this set of exercises, we will practice filtering and rearranging the order of rows.

As for the previous set of data wrangling exercises, before we can begin, we need to load the tidyverse package(s) and import the data. Also for these exercises, it is advisable to open the codebook codebook for the data set.

library(tidyverse)
library(haven)
library(sjlabelled)
## Warning: Paket 'sjlabelled' wurde unter R Version 4.1.3 erstellt
allbus_2021 <- read_sav("./data/allbus_2021ZA5280_v1-0-0.sav") %>% 
  remove_all_labels() %>% 
  as_tibble()

1

As a first exercise, using base R, let’s create a new data set named allbus_single that only contains data from respondents who reported being single.
The variable representing marital status is named mstat and the value indicating that the respondent is single is 5. Remember that there are 2 options in base R for filtering rows (the same ones as for selecting columns).

2

Now, let’s use the dplyr function for filtering rows: Create an object named allbus_male_sober_driver that only contains respondents who report that they are male as well as never having driven a car whilst being drunk in the past.
The names of the variables we need here are sex and cs02 (drunk driving) and the values we want to filter for are 1 (male), and 1 (never), respectively.

3

Using the same function from dplyr, create another subset of cases called allbus_social_media_news that only includes respondents that stated that they use social media between (inclusively) three to five days per week as a means of news source.
The variable we need for this is called lm35 and the values of that variable we are looking for are 3 to 5. You can use the helper function between() here (remember that the values you provide to this function are inclusive).

4

Let’s briefly turn back to base R for this task: Sort the allbus_2021 data set in descending order of the ls01 (general life satisfaction) variable. You can overwrite the original allbus_2021 object for this task. Have a look at the resulting data frame to check if your code worked.
You need the base R function order() here. You can check your result using head(). To limit the amount of output, you can subset columns using [ ] within the head() command (general life satisfaction is the 513th variable in the data set, so you could, e.g., subset columns 510:513).

5

Let’s rearrange the order of rows again, this time using a function from the dplyr package. To restore the original order of the allbus_2021 data set, sort in ascending order of the respid variable. As for the previous task, check whether your code works, but this time using a (short) pipe chain and a dplyr function for catching a glimpse of your data.
The dplyr function you are looking for is in another castle… Just kidding (and apologies for the silly “Super Mario” reference here… that’s what happens when you work with pipes more than a plumber does), it’s arrange().