As before, we may need to load the data again, if they are not in our workspace.
allbus_2021_eda <- readRDS("./data/allbus_2021_eda.rds")
Besides base R
and the tidyverse
, for this
set of exercises, we also need the janitor
and the
correlation
package. So, make sure to install and load
them.
base R
to create a crosstab for
the variables agec
(rows) and party_vote
(columns) showing row percentages.
round()
, table()
, and
prop.table()
here, add an argument to
prop.table()
to get row totals, and transform the results
to represent percentages. Extra hint: Rows are the second dimension in
R
dataframes.
round(prop.table(table(allbus_2021_eda$agec, allbus_2021_eda$party_vote), 1)*100, 2)
##
## CDU-CSU SPD FDP Gruene Linke AfD Other party Would not vote
## <= 25 years 11.73 11.30 14.50 32.20 9.81 6.61 7.68 6.18
## 26 to 30 years 19.95 10.22 13.22 28.25 7.21 7.69 6.97 6.49
## 31 to 35 years 26.12 12.08 11.52 26.67 7.34 6.88 2.97 6.41
## 36 to 40 years 30.05 17.80 10.73 20.21 7.33 6.71 2.59 4.56
## 41 to 45 years 38.45 25.56 10.02 7.98 7.16 5.11 1.64 4.09
## 46 to 50 years 43.48 13.04 13.04 13.04 8.70 4.35 0.00 4.35
janitor
package to get the same results.
tably()
object and add some additional
functions to get the row percentages. As the table()
function excludes missing values by default, we need to make sure that
missing values for the party_vote
variable are excluded
here as well.
library(janitor)
allbus_2021_eda %>%
filter(!is.na(party_vote)) %>%
tabyl(agec, party_vote) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2)
## agec CDU-CSU SPD FDP Gruene Linke AfD Other party Would not vote
## <= 25 years 11.73% 11.30% 14.50% 32.20% 9.81% 6.61% 7.68% 6.18%
## 26 to 30 years 19.95% 10.22% 13.22% 28.25% 7.21% 7.69% 6.97% 6.49%
## 31 to 35 years 26.12% 12.08% 11.52% 26.67% 7.34% 6.88% 2.97% 6.41%
## 36 to 40 years 30.05% 17.80% 10.73% 20.21% 7.33% 6.71% 2.59% 4.56%
## 41 to 45 years 38.45% 25.56% 10.02% 7.98% 7.16% 5.11% 1.64% 4.09%
## 46 to 50 years 43.48% 13.04% 13.04% 13.04% 8.70% 4.35% 0.00% 4.35%
## <NA> 36.84% 0.00% 5.26% 15.79% 5.26% 26.32% 0.00% 10.53%
tabyl
we have created before.
allbus_2021_eda %>%
filter(!is.na(party_vote),
!is.na(agec)) %>%
tabyl(agec, party_vote) %>%
chisq.test()
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 307.85, df = 35, p-value < 2.2e-16
correlation
package to
calculate and print correlations between the following variables:
left_right
, sat_dem
, xenophobia
,
contact
library(correlation)
allbus_2021_eda %>%
select(left_right,
sat_dem,
xenophobia,
contact) %>%
correlation()
## # Correlation Matrix (pearson-method)
##
## Parameter1 | Parameter2 | r | 95% CI | t | df | p
## ----------------------------------------------------------------------------
## left_right | sat_dem | -0.11 | [-0.14, -0.08] | -6.48 | 3389 | < .001***
## left_right | xenophobia | 0.38 | [ 0.35, 0.41] | 23.16 | 3140 | < .001***
## left_right | contact | -0.05 | [-0.09, -0.02] | -2.82 | 2867 | 0.010**
## sat_dem | xenophobia | -0.30 | [-0.34, -0.25] | -12.54 | 1599 | < .001***
## sat_dem | contact | 0.04 | [-0.01, 0.09] | 1.42 | 1463 | 0.155
## xenophobia | contact | -0.31 | [-0.34, -0.28] | -17.55 | 2870 | < .001***
##
## p-value adjustment method: Holm (1979)
## Observations: 1465-3391
sat_dem
, xenophobia
, and contact
,
using the same function and variables as in the previous exercise, but
group them by agec
this time.
agec
before computing the
correlations.
allbus_2021_eda %>%
select(agec,
sat_dem,
xenophobia,
contact) %>%
group_by(agec) %>%
correlation()
## # Correlation Matrix (pearson-method)
##
## Group | Parameter1 | Parameter2 | r | 95% CI | t | df | p
## --------------------------------------------------------------------------------------------
## <= 25 years | sat_dem | xenophobia | -0.28 | [-0.41, -0.14] | -3.92 | 176 | < .001***
## <= 25 years | sat_dem | contact | 0.06 | [-0.09, 0.21] | 0.80 | 167 | 0.423
## <= 25 years | xenophobia | contact | -0.12 | [-0.22, -0.02] | -2.31 | 350 | 0.043*
## 26 to 30 years | sat_dem | xenophobia | -0.31 | [-0.40, -0.21] | -5.94 | 336 | < .001***
## 26 to 30 years | sat_dem | contact | -0.07 | [-0.18, 0.04] | -1.29 | 323 | 0.199
## 26 to 30 years | xenophobia | contact | -0.24 | [-0.31, -0.16] | -6.16 | 627 | < .001***
## 31 to 35 years | sat_dem | xenophobia | -0.31 | [-0.39, -0.22] | -6.68 | 431 | < .001***
## 31 to 35 years | sat_dem | contact | 0.04 | [-0.06, 0.13] | 0.74 | 416 | 0.457
## 31 to 35 years | xenophobia | contact | -0.20 | [-0.27, -0.14] | -5.94 | 817 | < .001***
## 36 to 40 years | sat_dem | xenophobia | -0.44 | [-0.51, -0.36] | -10.08 | 435 | < .001***
## 36 to 40 years | sat_dem | contact | 0.19 | [ 0.09, 0.28] | 3.74 | 383 | < .001***
## 36 to 40 years | xenophobia | contact | -0.33 | [-0.39, -0.26] | -9.55 | 766 | < .001***
## 41 to 45 years | sat_dem | xenophobia | -0.25 | [-0.38, -0.12] | -3.63 | 197 | 0.001**
## 41 to 45 years | sat_dem | contact | 0.02 | [-0.14, 0.18] | 0.27 | 153 | 0.791
## 41 to 45 years | xenophobia | contact | -0.20 | [-0.31, -0.09] | -3.43 | 271 | 0.001**
## 46 to 50 years | sat_dem | xenophobia | -0.35 | [-0.79, 0.31] | -1.13 | 9 | 0.858
## 46 to 50 years | sat_dem | contact | 0.35 | [-0.47, 0.85] | 0.93 | 6 | 0.858
## 46 to 50 years | xenophobia | contact | -0.03 | [-0.51, 0.45] | -0.13 | 15 | 0.897
##
## p-value adjustment method: Holm (1979)
## Observations: 8-819