In this set of exercises, we will practice filtering and rearranging the order of rows.
As for the previous set of data wrangling exercises, before we can begin, we need to load the tidyverse
package(s) and import the data. Also for these exercises, it is advisable to open the codebook codebook for the data set.
library(tidyverse)
library(haven)
library(sjlabelled)
## Warning: Paket 'sjlabelled' wurde unter R Version 4.1.3 erstellt
allbus_2021 <- read_sav("./data/allbus_2021ZA5280_v1-0-0.sav") %>%
remove_all_labels() %>%
as_tibble()
base R
, let’s create a new data set named allbus_single
that only contains data from respondents who reported being single.
mstat
and the value indicating that the respondent is single is 5. Remember that there are 2 options in base R
for filtering rows (the same ones as for selecting columns).
dplyr
function for filtering rows: Create an object named allbus_male_sober_driver
that only contains respondents who report that they are male as well as never having driven a car whilst being drunk in the past.
sex
and cs02
(drunk driving) and the values we want to filter for are 1 (male), and 1 (never), respectively.
dplyr
, create another subset of cases called allbus_social_media_news
that only includes respondents that stated that they use social media between (inclusively) three to five days per week as a means of news source.
lm35
and the values of that variable we are looking for are 3 to 5. You can use the helper function between()
here (remember that the values you provide to this function are inclusive).
base R
for this task: Sort the allbus_2021
data set in descending order of the ls01
(general life satisfaction) variable. You can overwrite the original allbus_2021
object for this task. Have a look at the resulting data frame to check if your code worked.
base R
function order()
here. You can check your result using head()
. To limit the amount of output, you can subset columns using [ ] within the head()
command (general life satisfaction is the 513th variable in the data set, so you could, e.g., subset columns 510:513).
dplyr
package. To restore the original order of the allbus_2021
data set, sort in ascending order of the respid
variable. As for the previous task, check whether your code works, but this time using a (short) pipe chain and a dplyr
function for catching a glimpse of your data.
dplyr
function you are looking for is in another castle… Just kidding (and apologies for the silly “Super Mario” reference here… that’s what happens when you work with pipes more than a plumber does), it’s arrange()
.