Exercise 4_1_1: t-test & ANOVA

As in the slides, we will, again, use the data from the **German General Social Survey - ALLBUS 2021*. If they are not (still/yet) in your workspace, you first need to load them.

library(haven)

allbus_2021_cda_1 <- read_spss("./data/allbus_2021/ZA5280_v1-0-0.sav")

## Converting atomic to factors. Please wait...

1

Let’s start with a very simple analysis: Compute a t-test to compare the means of the two eastwest groups in the data for the variable pa01. You may get an error message complaining that the variable is not numeric.

Clues

For the t-test you can use a base R function named after this test. For converting the variable to numeric you have to use the function as.numeric().

solution

t.test(as.numeric(pa01) ~ eastwest, data = allbus_2021_cda_1)

## 
##  Welch Two Sample t-test
## 
## data:  as.numeric(pa01) by eastwest
## t = 5.6564, df = 3312.6, p-value = 1.677e-08
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  0.1974363 0.4069274
## sample estimates:
## mean in group 1 mean in group 2 
##        4.964733        4.662551

2

Next, use a base R function to run an ANOVA to test the relationship between the left-right self-placement and age groups.

Clues

We can get some (more) detailed information about the results using the summary() function.

solution

anova <- aov(as.numeric(pa01) ~ agec, data = allbus_2021_cda_1)

summary(anova)

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## agec           5    110  22.003   6.881 2.07e-06 ***
## Residuals   5099  16304   3.197                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 237 Beobachtungen als fehlend gelöscht

3

Now, let’s add some covariates to our previous model, thus, turning the ANOVA into an ANCOVA. The covariates we want to include are sex and, again, German region.

Clues

Remember that you can simply add covariates in a formula in R with +.

solution

ancova <- 
  aov(as.numeric(pa01) ~ agec + sex + eastwest, data = allbus_2021_cda_1)

summary(ancova)

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## agec           5    108   21.69   6.939 1.82e-06 ***
## sex            2    249  124.60  39.867  < 2e-16 ***
## eastwest       1    115  114.88  36.757 1.43e-09 ***
## Residuals   5089  15905    3.13                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 244 Beobachtungen als fehlend gelöscht

4

Again, as in the case of an ANOVA the output may not be too informative. Compute Tukey Honest Significant Differences to get an idea about what’s going on.

Clues

There is the TukeyHSD() function for this purpose.

solution

TukeyHSD(ancova)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = as.numeric(pa01) ~ agec + sex + eastwest, data = allbus_2021_cda_1)
## 
## $agec
##            diff          lwr       upr     p adj
## 2-1  0.26180648  0.005725639 0.5178873 0.0416814
## 3-1  0.31708172  0.070916747 0.5632467 0.0033169
## 4-1  0.24999760  0.003726460 0.4962687 0.0442308
## 5-1  0.53906625  0.245988090 0.8321444 0.0000024
## 6-1  1.05751521  0.129218233 1.9858122 0.0148671
## 3-2  0.05527524 -0.148118207 0.2586687 0.9717911
## 4-2 -0.01180888 -0.215330808 0.1917131 0.9999829
## 5-2  0.27725977  0.019061187 0.5354584 0.0269054
## 6-2  0.79570874 -0.122173057 1.7135905 0.1328196
## 4-3 -0.06708412 -0.257979303 0.1238111 0.9174821
## 5-3  0.22198454 -0.026382746 0.4703518 0.1106845
## 6-3  0.74043350 -0.174731397 1.6555984 0.1914659
## 5-4  0.28906865  0.040596139 0.5375412 0.0118156
## 6-4  0.80751761 -0.107675845 1.7227111 0.1197189
## 6-5  0.51844896 -0.410434453 1.4473324 0.6044681
## 
## $sex
##           diff        lwr         upr     p adj
## 2-1 -0.4287245 -0.5448517 -0.31259734 0.0000000
## 3-1 -2.4358522 -4.8301060 -0.04159845 0.0450681
## 3-2 -2.0071277 -4.4013743  0.38711889 0.1209809
## 
## $eastwest
##          diff        lwr        upr p adj
## 2-1 -0.317333 -0.4204898 -0.2141761     0

Exercise 4_1_1: t-test & ANOVA

Johannes Breuer, Stefan Jünger, Veronika Batzdorfer

Introduction to R for Data Analysis

1

Clues

solution

2

Clues

solution

3

Clues

solution

4

Clues

solution