Exercise 5_1_1: R Markdown

The structure of this final exercise for the course is a bit different. We want you to create and work with R Markdown documents (creating HTML output) to go through some of the things we covered and did in the previous sessions. There are no coding tasks in this document.

We have created an R Markdown document (which we have also already knitted to create an HTML output file) that demonstrates some of the things you can do/create with R Markdown and repeats a few of the topics and steps we went through in the sessions before this one.

This document uses Gapminder data and you can find it in the exercises folder: It is called explore_gapminder.Rmd.

1

The first thing we want you to do is to open the explore_gapminder.Rmdfile in RStudio and explore it a bit to see what it contains. You can also open explore_gapminder.html (in your browser) to see the .Rmd and the resulting output document side-by-side.

Clues

You can open the .Rmd file via the File tab or the menu (File -> Open File) in RStudio.

You might notice that there are quite a few things specified in the YAML header. Let’s briefly go through them:

toc: true -> The document will contain a table of contents (ToC)

toc_depth: 3 -> The ToC will contain header levels 1 to 3

number_sections: true -> The sections divided by headers will be numbered

toc_float: true -> The ToC is floating, meaning that it moves when you scroll

code_folding: hide -> By default, the code chunks are hidden, but you display them by clicking the Code buttons in the HTML document

theme: flatly -> The Bootswatch them flatly is used to style the document

highlight: tango -> The document uses the Pandoc code highlighting style tango

code_download: true -> The document includes a button allowing you to download the full code

df_print: paged -> When data frames are printed in the document they are printed in paged tables

NB: To knit the document you need to have the packages it uses installed. These are the following ones: rmarkdown, knitr, tidyverse, visdat, janitor, pander, patchwork, correlation, GGally, broom, sjPlot, scales.

An easy option for checking whether you have these packages installed, doing so if that is not the case in one go, and loading them is the packages() function from the easypackages package. To use it for this purpose, you can run the following code:

if (!require(easypackages)) install.packages("easypackages")
library(easypackages)

packages("rmarkdown", "knitr", "tidyverse", "visdat", "janitor", "pander", "patchwork", "correlation", "GGally", "broom", "sjPlot", "scales", prompt = F)

Feel free to play around a bit with the explore_gapminder.Rmd and its output (you can also change parts of the YAML header to see how that influences the output).

2

The big task we have for you for this exercise is to create a similar R Markdown document for a subset of the German General Social Survey - ALLBUS 2021 data gesis_2021_subset. Using the explore_gapminder.Rmd as a starting point and guidance, we want you to do the following in this document:

Load and wrangle the data the same way as before to create the allbus subset regarding xenophobic attitudes in the exploratory data anlaysis (e.g., select variables like xenophobic attitudes, contact, demographic variables, handle missing values, recode values, and calculate aggregate measures), but with one difference: Also include two additional variables: ls01 (life satisfaction) which should be renamed to life_satf and pt12 (trust in the federal government) which should be renamed to trust_gov. Note: The “trust in people” variable only has three values/levels (have a look at the codebook). If we want it to reflect trust on an ordinal level, we need to recode its values: 2 = 1, 3 = 2, 1 = 3.
In addition to that, define a function called inverter as follows and use it to create a new variable called distrust_gov based on trust_gov: inverter2 <- function (var) {max(var, na.rm = TRUE) - var + 1}

Note: You can do both of these first things in a code chunk that is not displayed in the output document and you can include all steps in a single long data wrangling pipe(line) if you want.

Get an overview of the missing data using the vis_miss() function from the visdat package.
Look at the relative frequencies for the variables sex, agec, educ, and party_vote using a function from the janitor package.
Create bar plots with ggplot2 to visualize the relative frequencies (percentages) for the variables educ and party_vote.
Using the pander package, include a table with the output of the base R function summary() for the variables on xenophobic attitudes.
Create ggplot2 bar plots to visualize the distribution of the variables life_satf and left_right.
Create a ggplot2 boxplot to show differences in xenophobic attitudes supporters of different parties (party_vote); also showing (jittered) individual data points.
Calculate correlations between xenophobia, left_right, life_satf and xenophobia using the correlation package and display them in a table using the kable() from the knitr package.
Create a plot with the GGally package to visualize these correlations.
Calculate a logistic regression model with a dichotomized version of distrust_gov as the dependent variable and contact, life_satf, and xenophobia as predictors (also include an intercept). For the dichotomization of the trust variable, we will just recode every value > 4 to 1 and all other values to 0.
Turn the output of this model into a table with the broom package and display it with knitr::kable().
Create one regression plot with the coefficients and another one with the predictions using the sjPlot package.

This is a lot, but you can find template code (explore_gapminder.Rmd) for most of this in the the solutions folder (and the rest in the slides for the previous sessions). To get you started we have created an almost empty template .Rmd called explore_gesis_allbus_template.Rmd in the exercises folder.

Clues

If you’re stuck, you can find the solutions in the explore_gesis_allbus.Rmd in the solutions folder.

Exercise 5_1_1: R Markdown

Johannes Breuer, Stefan Jünger, & Veronika Batzdorfer

Introduction to R for Data Analysis

1

Clues

2

Clues