The structure of this final exercise for the course is a bit
different. We want you to create and work with R Markdown
documents (creating HTML output) to go through some of the
things we covered and did in the previous sessions. There are no coding
tasks in this document.
We have created an R Markdown document (which we have
also already knitted to create an HTML output file) that
demonstrates some of the things you can do/create with
R Markdown and repeats a few of the topics and steps we
went through in the sessions before this one.
This document uses Gapminder data and you can find it in the
exercises folder: It is called
explore_gapminder.Rmd.
explore_gapminder.Rmdfile in RStudio and explore
it a bit to see what it contains. You can also open
explore_gapminder.html (in your browser) to see the
.Rmd and the resulting output document side-by-side.
.Rmd file via the File tab or
the menu (File -> Open File) in
RStudio.
You might notice that there are quite a few things specified in the
YAML header. Let’s briefly go through them:
toc: true -> The document will contain a table of contents (ToC)
toc_depth: 3 -> The ToC will contain header levels 1 to 3
number_sections: true -> The sections divided by headers will be numbered
toc_float: true -> The ToC is floating, meaning that it moves when you scroll
code_folding: hide -> By default, the code chunks are hidden, but
you display them by clicking the Code buttons in the
HTML document
theme: flatly -> The Bootswatch them flatly is used to style the document
highlight: tango -> The document uses the Pandoc code highlighting style tango
code_download: true -> The document includes a button allowing you to download the full code
df_print: paged -> When data frames are printed in the document they are printed in paged tables
NB: To knit the document you need to have the
packages it uses installed. These are the following ones:
rmarkdown, knitr, tidyverse,
visdat, janitor, pander,
patchwork, correlation, GGally,
broom, sjPlot, scales.
An easy option for checking whether you have these packages
installed, doing so if that is not the case in one go, and loading them
is the packages() function from the
easypackages package. To use it for this purpose, you can
run the following code:
if (!require(easypackages)) install.packages("easypackages")
library(easypackages)
packages("rmarkdown", "knitr", "tidyverse", "visdat", "janitor", "pander", "patchwork", "correlation", "GGally", "broom", "sjPlot", "scales", prompt = F)
Feel free to play around a bit with the
explore_gapminder.Rmd and its output (you can also change
parts of the YAML header to see how that influences the
output).
The big task we have for you for this exercise is to create a similar
R Markdown document for a subset of the German General
Social Survey - ALLBUS 2021 data gesis_2021_subset.
Using the explore_gapminder.Rmd as a starting point and
guidance, we want you to do the following in this document:
Load and wrangle the data the same way as before to create the
allbus subset regarding xenophobic attitudes in the
exploratory data anlaysis (e.g., select variables like xenophobic
attitudes, contact, demographic variables, handle missing values, recode
values, and calculate aggregate measures), but with one difference: Also
include two additional variables: ls01 (life satisfaction)
which should be renamed to life_satf and pt12
(trust in the federal government) which should be renamed to
trust_gov. Note: The “trust in people” variable
only has three values/levels (have a look at the codebook). If we want
it to reflect trust on an ordinal level, we need to recode its values: 2
= 1, 3 = 2, 1 = 3.
In addition to that, define a function called
inverter as follows and use it to create a new variable
called distrust_gov based on trust_gov:
inverter2 <- function (var) {max(var, na.rm = TRUE) - var + 1}
Note: You can do both of these first things in a code chunk that is not displayed in the output document and you can include all steps in a single long data wrangling pipe(line) if you want.
Get an overview of the missing data using the
vis_miss() function from the visdat
package.
Look at the relative frequencies for the variables
sex, agec, educ, and
party_vote using a function from the janitor
package.
Create bar plots with ggplot2 to visualize the
relative frequencies (percentages) for the variables educ
and party_vote.
Using the pander package, include a table with the
output of the base R function summary() for
the variables on xenophobic attitudes.
Create ggplot2 bar plots to visualize the
distribution of the variables life_satf and
left_right.
Create a ggplot2 boxplot to show differences in
xenophobic attitudes supporters of different parties (party_vote); also
showing (jittered) individual data points.
Calculate correlations between xenophobia,
left_right, life_satf and
xenophobia using the correlation package and
display them in a table using the kable() from the
knitr package.
Create a plot with the GGally package to visualize
these correlations.
Calculate a logistic regression model with a dichotomized version
of distrust_gov as the dependent variable and
contact, life_satf, and
xenophobia as predictors (also include an intercept). For
the dichotomization of the trust variable, we will just
recode every value > 4 to 1 and all other values to 0.
Turn the output of this model into a table with the
broom package and display it with
knitr::kable().
Create one regression plot with the coefficients and another one
with the predictions using the sjPlot package.
explore_gapminder.Rmd) for most of this in the the
solutions folder (and the rest in the slides for the
previous sessions). To get you started we have created an almost empty
template .Rmd called
explore_gesis_allbus_template.Rmd in the
exercises folder.
explore_gesis_allbus.Rmd in the solutions
folder.