Exercise 3_1_3: Crosstabs & correlations

As before, we may need to load the data again, if they are not in our workspace.

allbus_2021_eda <- readRDS("./data/allbus_2021_eda.rds")

Besides base R and the tidyverse, for this set of exercises, we also need the janitor and the correlation package. So, make sure to install and load them.

1

As a first exercise, use base R to create a crosstab for the variables agec (rows) and party_vote (columns) showing row percentages.

Clues

We need to combine round(), table(), and prop.table() here, add an argument to prop.table() to get row totals, and transform the results to represent percentages. Extra hint: Rows are the second dimension in R dataframes.

solution

round(prop.table(table(allbus_2021_eda$agec, allbus_2021_eda$party_vote), 1)*100, 2)

##                 
##                  CDU-CSU   SPD   FDP Gruene Linke   AfD Other party Would not vote
##   <= 25 years      11.73 11.30 14.50  32.20  9.81  6.61        7.68           6.18
##   26 to 30 years   19.95 10.22 13.22  28.25  7.21  7.69        6.97           6.49
##   31 to 35 years   26.12 12.08 11.52  26.67  7.34  6.88        2.97           6.41
##   36 to 40 years   30.05 17.80 10.73  20.21  7.33  6.71        2.59           4.56
##   41 to 45 years   38.45 25.56 10.02   7.98  7.16  5.11        1.64           4.09
##   46 to 50 years   43.48 13.04 13.04  13.04  8.70  4.35        0.00           4.35

2

Now, let’s use the janitor package to get the same results.

Clues

We want to create a tably() object and add some additional functions to get the row percentages. As the table() function excludes missing values by default, we need to make sure that missing values for the party_vote variable are excluded here as well.

solution

library(janitor)

allbus_2021_eda %>% 
  filter(!is.na(party_vote)) %>% 
  tabyl(agec, party_vote) %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 2)

##            agec CDU-CSU    SPD    FDP Gruene Linke    AfD Other party Would not vote
##     <= 25 years  11.73% 11.30% 14.50% 32.20% 9.81%  6.61%       7.68%          6.18%
##  26 to 30 years  19.95% 10.22% 13.22% 28.25% 7.21%  7.69%       6.97%          6.49%
##  31 to 35 years  26.12% 12.08% 11.52% 26.67% 7.34%  6.88%       2.97%          6.41%
##  36 to 40 years  30.05% 17.80% 10.73% 20.21% 7.33%  6.71%       2.59%          4.56%
##  41 to 45 years  38.45% 25.56% 10.02%  7.98% 7.16%  5.11%       1.64%          4.09%
##  46 to 50 years  43.48% 13.04% 13.04% 13.04% 8.70%  4.35%       0.00%          4.35%
##            <NA>  36.84%  0.00%  5.26% 15.79% 5.26% 26.32%       0.00%         10.53%

3

As a final exercise on crosstabs, compute a chi-square test for the tabyl we have created before.

Clues

This time, we need to filter our missing values for our variables of interest. We do not need the percentage sign or the row percentages for this.

solution

allbus_2021_eda %>% 
  filter(!is.na(party_vote),
         !is.na(agec)) %>% 
  tabyl(agec, party_vote) %>% 
  chisq.test()

## 
##  Pearson's Chi-squared test
## 
## data:  .
## X-squared = 307.85, df = 35, p-value < 2.2e-16

4

Let’s turn to correlations: Use the correlation package to calculate and print correlations between the following variables: left_right, sat_dem, xenophobia, contact

Clues

The name of the function you need is the same as that of the package we use here.

solution

library(correlation)

allbus_2021_eda %>% 
  select(left_right,
         sat_dem,
         xenophobia,
         contact) %>% 
  correlation()

## # Correlation Matrix (pearson-method)
## 
## Parameter1 | Parameter2 |     r |         95% CI |      t |   df |         p
## ----------------------------------------------------------------------------
## left_right |    sat_dem | -0.11 | [-0.14, -0.08] |  -6.48 | 3389 | < .001***
## left_right | xenophobia |  0.38 | [ 0.35,  0.41] |  23.16 | 3140 | < .001***
## left_right |    contact | -0.05 | [-0.09, -0.02] |  -2.82 | 2867 | 0.010**  
## sat_dem    | xenophobia | -0.30 | [-0.34, -0.25] | -12.54 | 1599 | < .001***
## sat_dem    |    contact |  0.04 | [-0.01,  0.09] |   1.42 | 1463 | 0.155    
## xenophobia |    contact | -0.31 | [-0.34, -0.28] | -17.55 | 2870 | < .001***
## 
## p-value adjustment method: Holm (1979)
## Observations: 1465-3391

5

As a final exercise, compute the correlations between sat_dem, xenophobia, and contact, using the same function and variables as in the previous exercise, but group them by agec this time.

Clues

You need to group the data by agec before computing the correlations.

solution

allbus_2021_eda %>% 
  select(agec,
         sat_dem,
         xenophobia,
         contact) %>% 
  group_by(agec) %>% 
  correlation()

## # Correlation Matrix (pearson-method)
## 
## Group          | Parameter1 | Parameter2 |     r |         95% CI |      t |  df |         p
## --------------------------------------------------------------------------------------------
## <= 25 years    |    sat_dem | xenophobia | -0.28 | [-0.41, -0.14] |  -3.92 | 176 | < .001***
## <= 25 years    |    sat_dem |    contact |  0.06 | [-0.09,  0.21] |   0.80 | 167 | 0.423    
## <= 25 years    | xenophobia |    contact | -0.12 | [-0.22, -0.02] |  -2.31 | 350 | 0.043*   
## 26 to 30 years |    sat_dem | xenophobia | -0.31 | [-0.40, -0.21] |  -5.94 | 336 | < .001***
## 26 to 30 years |    sat_dem |    contact | -0.07 | [-0.18,  0.04] |  -1.29 | 323 | 0.199    
## 26 to 30 years | xenophobia |    contact | -0.24 | [-0.31, -0.16] |  -6.16 | 627 | < .001***
## 31 to 35 years |    sat_dem | xenophobia | -0.31 | [-0.39, -0.22] |  -6.68 | 431 | < .001***
## 31 to 35 years |    sat_dem |    contact |  0.04 | [-0.06,  0.13] |   0.74 | 416 | 0.457    
## 31 to 35 years | xenophobia |    contact | -0.20 | [-0.27, -0.14] |  -5.94 | 817 | < .001***
## 36 to 40 years |    sat_dem | xenophobia | -0.44 | [-0.51, -0.36] | -10.08 | 435 | < .001***
## 36 to 40 years |    sat_dem |    contact |  0.19 | [ 0.09,  0.28] |   3.74 | 383 | < .001***
## 36 to 40 years | xenophobia |    contact | -0.33 | [-0.39, -0.26] |  -9.55 | 766 | < .001***
## 41 to 45 years |    sat_dem | xenophobia | -0.25 | [-0.38, -0.12] |  -3.63 | 197 | 0.001**  
## 41 to 45 years |    sat_dem |    contact |  0.02 | [-0.14,  0.18] |   0.27 | 153 | 0.791    
## 41 to 45 years | xenophobia |    contact | -0.20 | [-0.31, -0.09] |  -3.43 | 271 | 0.001**  
## 46 to 50 years |    sat_dem | xenophobia | -0.35 | [-0.79,  0.31] |  -1.13 |   9 | 0.858    
## 46 to 50 years |    sat_dem |    contact |  0.35 | [-0.47,  0.85] |   0.93 |   6 | 0.858    
## 46 to 50 years | xenophobia |    contact | -0.03 | [-0.51,  0.45] |  -0.13 |  15 | 0.897    
## 
## p-value adjustment method: Holm (1979)
## Observations: 8-819

Exercise 3_1_3: Crosstabs & correlations

Johannes Breuer, Stefan Jünger, & Veronika Batzdorfer

Introduction to R for Data Analysis

1

Clues

solution

2

Clues

solution

3

Clues

solution

4

Clues

solution

5

Clues

solution