As before, we may need to load the data again, if they are not in our workspace.

allbus_2021_eda <- readRDS("./data/allbus_2021_eda.rds")

Besides base R and the tidyverse, for this set of exercises, we also need the janitor and the correlation package. So, make sure to install and load them.

1

As a first exercise, use base R to create a crosstab for the variables agec (rows) and party_vote (columns) showing row percentages.
We need to combine round(), table(), and prop.table() here, add an argument to prop.table() to get row totals, and transform the results to represent percentages. Extra hint: Rows are the second dimension in R dataframes.
round(prop.table(table(allbus_2021_eda$agec, allbus_2021_eda$party_vote), 1)*100, 2)
##                 
##                  CDU-CSU   SPD   FDP Gruene Linke   AfD Other party Would not vote
##   <= 25 years      11.73 11.30 14.50  32.20  9.81  6.61        7.68           6.18
##   26 to 30 years   19.95 10.22 13.22  28.25  7.21  7.69        6.97           6.49
##   31 to 35 years   26.12 12.08 11.52  26.67  7.34  6.88        2.97           6.41
##   36 to 40 years   30.05 17.80 10.73  20.21  7.33  6.71        2.59           4.56
##   41 to 45 years   38.45 25.56 10.02   7.98  7.16  5.11        1.64           4.09
##   46 to 50 years   43.48 13.04 13.04  13.04  8.70  4.35        0.00           4.35

2

Now, let’s use the janitor package to get the same results.
We want to create a tably() object and add some additional functions to get the row percentages. As the table() function excludes missing values by default, we need to make sure that missing values for the party_vote variable are excluded here as well.
library(janitor)

allbus_2021_eda %>% 
  filter(!is.na(party_vote)) %>% 
  tabyl(agec, party_vote) %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 2)
##            agec CDU-CSU    SPD    FDP Gruene Linke    AfD Other party Would not vote
##     <= 25 years  11.73% 11.30% 14.50% 32.20% 9.81%  6.61%       7.68%          6.18%
##  26 to 30 years  19.95% 10.22% 13.22% 28.25% 7.21%  7.69%       6.97%          6.49%
##  31 to 35 years  26.12% 12.08% 11.52% 26.67% 7.34%  6.88%       2.97%          6.41%
##  36 to 40 years  30.05% 17.80% 10.73% 20.21% 7.33%  6.71%       2.59%          4.56%
##  41 to 45 years  38.45% 25.56% 10.02%  7.98% 7.16%  5.11%       1.64%          4.09%
##  46 to 50 years  43.48% 13.04% 13.04% 13.04% 8.70%  4.35%       0.00%          4.35%
##            <NA>  36.84%  0.00%  5.26% 15.79% 5.26% 26.32%       0.00%         10.53%

3

As a final exercise on crosstabs, compute a chi-square test for the tabyl we have created before.
This time, we need to filter our missing values for our variables of interest. We do not need the percentage sign or the row percentages for this.
allbus_2021_eda %>% 
  filter(!is.na(party_vote),
         !is.na(agec)) %>% 
  tabyl(agec, party_vote) %>% 
  chisq.test()
## 
##  Pearson's Chi-squared test
## 
## data:  .
## X-squared = 307.85, df = 35, p-value < 2.2e-16

4

Let’s turn to correlations: Use the correlation package to calculate and print correlations between the following variables: left_right, sat_dem, xenophobia, contact
The name of the function you need is the same as that of the package we use here.
library(correlation)

allbus_2021_eda %>% 
  select(left_right,
         sat_dem,
         xenophobia,
         contact) %>% 
  correlation()
## # Correlation Matrix (pearson-method)
## 
## Parameter1 | Parameter2 |     r |         95% CI |      t |   df |         p
## ----------------------------------------------------------------------------
## left_right |    sat_dem | -0.11 | [-0.14, -0.08] |  -6.48 | 3389 | < .001***
## left_right | xenophobia |  0.38 | [ 0.35,  0.41] |  23.16 | 3140 | < .001***
## left_right |    contact | -0.05 | [-0.09, -0.02] |  -2.82 | 2867 | 0.010**  
## sat_dem    | xenophobia | -0.30 | [-0.34, -0.25] | -12.54 | 1599 | < .001***
## sat_dem    |    contact |  0.04 | [-0.01,  0.09] |   1.42 | 1463 | 0.155    
## xenophobia |    contact | -0.31 | [-0.34, -0.28] | -17.55 | 2870 | < .001***
## 
## p-value adjustment method: Holm (1979)
## Observations: 1465-3391

5

As a final exercise, compute the correlations between sat_dem, xenophobia, and contact, using the same function and variables as in the previous exercise, but group them by agec this time.
You need to group the data by agec before computing the correlations.
allbus_2021_eda %>% 
  select(agec,
         sat_dem,
         xenophobia,
         contact) %>% 
  group_by(agec) %>% 
  correlation()
## # Correlation Matrix (pearson-method)
## 
## Group          | Parameter1 | Parameter2 |     r |         95% CI |      t |  df |         p
## --------------------------------------------------------------------------------------------
## <= 25 years    |    sat_dem | xenophobia | -0.28 | [-0.41, -0.14] |  -3.92 | 176 | < .001***
## <= 25 years    |    sat_dem |    contact |  0.06 | [-0.09,  0.21] |   0.80 | 167 | 0.423    
## <= 25 years    | xenophobia |    contact | -0.12 | [-0.22, -0.02] |  -2.31 | 350 | 0.043*   
## 26 to 30 years |    sat_dem | xenophobia | -0.31 | [-0.40, -0.21] |  -5.94 | 336 | < .001***
## 26 to 30 years |    sat_dem |    contact | -0.07 | [-0.18,  0.04] |  -1.29 | 323 | 0.199    
## 26 to 30 years | xenophobia |    contact | -0.24 | [-0.31, -0.16] |  -6.16 | 627 | < .001***
## 31 to 35 years |    sat_dem | xenophobia | -0.31 | [-0.39, -0.22] |  -6.68 | 431 | < .001***
## 31 to 35 years |    sat_dem |    contact |  0.04 | [-0.06,  0.13] |   0.74 | 416 | 0.457    
## 31 to 35 years | xenophobia |    contact | -0.20 | [-0.27, -0.14] |  -5.94 | 817 | < .001***
## 36 to 40 years |    sat_dem | xenophobia | -0.44 | [-0.51, -0.36] | -10.08 | 435 | < .001***
## 36 to 40 years |    sat_dem |    contact |  0.19 | [ 0.09,  0.28] |   3.74 | 383 | < .001***
## 36 to 40 years | xenophobia |    contact | -0.33 | [-0.39, -0.26] |  -9.55 | 766 | < .001***
## 41 to 45 years |    sat_dem | xenophobia | -0.25 | [-0.38, -0.12] |  -3.63 | 197 | 0.001**  
## 41 to 45 years |    sat_dem |    contact |  0.02 | [-0.14,  0.18] |   0.27 | 153 | 0.791    
## 41 to 45 years | xenophobia |    contact | -0.20 | [-0.31, -0.09] |  -3.43 | 271 | 0.001**  
## 46 to 50 years |    sat_dem | xenophobia | -0.35 | [-0.79,  0.31] |  -1.13 |   9 | 0.858    
## 46 to 50 years |    sat_dem |    contact |  0.35 | [-0.47,  0.85] |   0.93 |   6 | 0.858    
## 46 to 50 years | xenophobia |    contact | -0.03 | [-0.51,  0.45] |  -0.13 |  15 | 0.897    
## 
## p-value adjustment method: Holm (1979)
## Observations: 8-819