Exercise 2_1_1: Selecting & renaming variables

As in the presentation, we will use data from the German General Social Survey - Allbus 2021 for this exercise. You should (have) download(ed) the dataset in .sav format and saved it in a folder caller data within the folder containing the materials for this workshop. Also remember that it is helpful to consult the codebook for the data set. That being sad, let’s get wrangling…

…but before we can do that, we need to load the tidyverse and haven package(s) and import the data.

library (tidyverse)
library (haven)

## Warning: Paket 'haven' wurde unter R Version 4.1.3 erstellt

library(sjlabelled)

## Warning: Paket 'sjlabelled' wurde unter R Version 4.1.3 erstellt

allbus_2021 <- read_sav("./data/allbus_2021ZA5280_v1-0-0.sav") %>% 
  remove_all_labels() %>% 
  as_tibble()

1

Before we apply any changes to our data, let’s first pipe them into a function to catch a glimpse (hint hint).

Clues

The clue for this task is already “hidden” in the text of the task ;-)

2

Using base R, create a new object called allbus_institut_trust that contains all variables that assess how much people trust institutions (e.g., European Commission, Bundestag, police). To find the required variable names, you can check the codebook (search for “trust”) or have a look at the clue for this task.

Clues

The first variable we want to select for our subset is named pt01, and the last one is pt20. They appear consecutively in the data set. Remember that there are two options for selecting columns in base R: One is subsetting using [ ], the other is the subset() function.

3

Use a function from the dplyr package to create a new object named allbus_2021_info that only contains the (binary) variables that asked about the use of different devices for the individual Internet consumption. Again, you can consult the code book to find the right variable names (search for “Internet”) or have a look at the clue for this task, instead.

Clues

The first variable we want to select for our subset is named lm27, and the last one is lm34. They appear consecutively in the data set.

4

Again, using a function from the tidyverse package dplyr, select only the character variables from the allbus_2021 data set and assign them to an object named allbus_char.

Clues

You need to use the selection helper where() for this task.

5

After creating subsets of variables, let’s now rename those variables using dplyr functions again for the allbus_2021_info object in one step. First, rename the variables lm27 to internet_use_pc, lm28 to internet_use_laptop, and lm29 to internet_use_tablet. Then rename the variables lm30 lm31 lm32 lm33 , and lm34 to internet_use_smartphone, internet_use_TV, internet_use_playstation, internet_use_ebook, and internet_use_other, respectively, using a function from dplyr.

Clues

You can also rename variables within the select() command.

6

As the final task in this set of exercises, do the previous selection and renaming procedure in base R. That is, first rename the variables lm27 to internet_use_pc, lm28 to internet_use_laptop, and lm29 to internet_use_tablet with base R.

Then, rename the variables lm30 lm31 lm32 lm33 , and lm34 to internet_use_smartphone, internet_use_TV, internet_use_playstation, internet_use_ebook, and internet_use_other, respectively, using a function from dplyr.

When using the dplyr function for renaming the variables, assign the result to the same object name as before (i.e., overwrite the internet_use_pc object).

Clues

The base R function we need here is colnames(), and the dplyr function is rename(). Remember that the correct syntax the rename() function is new_name = old_name.

Exercise 2_1_1: Selecting & renaming variables

Johannes Breuer, Stefan Jünger, & Veronika Batzdorfer

Introduction to R for Data Analysis

1

Clues

2

Clues

3

Clues

4

Clues

5

Clues

6

Clues