The structure of this final exercise for the course is a bit
different. We want you to create and work with R Markdown
documents (creating HTML
output) to go through some of the
things we covered and did in the previous sessions. There are no coding
tasks in this document.
We have created an R Markdown
document (which we have
also already knitted to create an HTML
output file) that
demonstrates some of the things you can do/create with
R Markdown
and repeats a few of the topics and steps we
went through in the sessions before this one.
This document uses Gapminder data and you can find it in the
exercises
folder: It is called
explore_gapminder.Rmd
.
explore_gapminder.Rmd
file in RStudio and explore
it a bit to see what it contains. You can also open
explore_gapminder.html
(in your browser) to see the
.Rmd
and the resulting output document side-by-side.
.Rmd
file via the File
tab or
the menu (File
-> Open File
) in
RStudio.
You might notice that there are quite a few things specified in the
YAML
header. Let’s briefly go through them:
toc: true -> The document will contain a table of contents (ToC)
toc_depth: 3 -> The ToC will contain header levels 1 to 3
number_sections: true -> The sections divided by headers will be numbered
toc_float: true -> The ToC is floating, meaning that it moves when you scroll
code_folding: hide -> By default, the code chunks are hidden, but
you display them by clicking the Code
buttons in the
HTML
document
theme: flatly -> The Bootswatch them flatly is used to style the document
highlight: tango -> The document uses the Pandoc code highlighting style tango
code_download: true -> The document includes a button allowing you to download the full code
df_print: paged -> When data frames are printed in the document they are printed in paged tables
NB: To knit the document you need to have the
packages it uses installed. These are the following ones:
rmarkdown
, knitr
, tidyverse
,
visdat
, janitor
, pander
,
patchwork
, correlation
, GGally
,
broom
, sjPlot
, scales
.
An easy option for checking whether you have these packages
installed, doing so if that is not the case in one go, and loading them
is the packages()
function from the
easypackages
package. To use it for this purpose, you can
run the following code:
if (!require(easypackages)) install.packages("easypackages")
library(easypackages)
packages("rmarkdown", "knitr", "tidyverse", "visdat", "janitor", "pander", "patchwork", "correlation", "GGally", "broom", "sjPlot", "scales", prompt = F)
Feel free to play around a bit with the
explore_gapminder.Rmd
and its output (you can also change
parts of the YAML
header to see how that influences the
output).
The big task we have for you for this exercise is to create a similar
R Markdown
document for a subset of the German General
Social Survey - ALLBUS 2021 data gesis_2021_subset
.
Using the explore_gapminder.Rmd
as a starting point and
guidance, we want you to do the following in this document:
Load and wrangle the data the same way as before to create the
allbus
subset regarding xenophobic attitudes in the
exploratory data anlaysis (e.g., select variables like xenophobic
attitudes, contact, demographic variables, handle missing values, recode
values, and calculate aggregate measures), but with one difference: Also
include two additional variables: ls01
(life satisfaction)
which should be renamed to life_satf
and pt12
(trust in the federal government) which should be renamed to
trust_gov
. Note: The “trust in people” variable
only has three values/levels (have a look at the codebook). If we want
it to reflect trust on an ordinal level, we need to recode its values: 2
= 1, 3 = 2, 1 = 3.
In addition to that, define a function called
inverter
as follows and use it to create a new variable
called distrust_gov
based on trust_gov
:
inverter2 <- function (var) {max(var, na.rm = TRUE) - var + 1}
Note: You can do both of these first things in a code chunk that is not displayed in the output document and you can include all steps in a single long data wrangling pipe(line) if you want.
Get an overview of the missing data using the
vis_miss()
function from the visdat
package.
Look at the relative frequencies for the variables
sex
, agec
, educ
, and
party_vote
using a function from the janitor
package.
Create bar plots with ggplot2
to visualize the
relative frequencies (percentages) for the variables educ
and party_vote
.
Using the pander
package, include a table with the
output of the base R
function summary()
for
the variables on xenophobic attitudes.
Create ggplot2
bar plots to visualize the
distribution of the variables life_satf
and
left_right
.
Create a ggplot2
boxplot to show differences in
xenophobic attitudes supporters of different parties (party_vote); also
showing (jittered) individual data points.
Calculate correlations between xenophobia,
left_right
, life_satf
and
xenophobia
using the correlation
package and
display them in a table using the kable()
from the
knitr
package.
Create a plot with the GGally
package to visualize
these correlations.
Calculate a logistic regression model with a dichotomized version
of distrust_gov
as the dependent variable and
contact
, life_satf
, and
xenophobia
as predictors (also include an intercept). For
the dichotomization of the trust
variable, we will just
recode every value > 4 to 1 and all other values to 0.
Turn the output of this model into a table with the
broom
package and display it with
knitr::kable()
.
Create one regression plot with the coefficients and another one
with the predictions using the sjPlot
package.
explore_gapminder.Rmd
) for most of this in the the
solutions
folder (and the rest in the slides for the
previous sessions). To get you started we have created an almost empty
template .Rmd
called
explore_gesis_allbus_template.Rmd
in the
exercises
folder.
explore_gesis_allbus.Rmd
in the solutions
folder.