Exercise 4_1_2: Regression analysis

As per usual, we first need to load our data. This time, we also wrangle some variables at the beginning. So please execute the following R commands.

library(dplyr)
library(haven)

allbus_2021_cda_2 <- 
  read_spss("./data/allbus_2021/ZA5280_v1-0-0.sav") %>% 
  mutate(
    trust_parliament = as.numeric(pt03),
    party = pv01,
    political_orientation = as.numeric(pa01),
    satisfaction_democracy = ifelse(as.numeric(ps03) <= 2, 1, 0)
  )

## Converting atomic to factors. Please wait...

In case you have not done so yet, please also install the performance package for this set of exercises.

if (!require(performance)) install.packages("performance")

In the following exercise, we will cover/repeat some of the basics of regression analysis in R.

1

To begin with, run a simple linear regression model with trust in the German federal parliament (trust_parliament) as the outcome variable and choice of party (party) and political orientation (political_orientation) as predictors.

Clues

To get some (more) informative output, you can use summary() again.

solution

reg_linear <- 
  lm(trust_parliament ~ party + political_orientation, data = allbus_2021_cda_2)

summary(reg_linear)

## 
## Call:
## lm(formula = trust_parliament ~ party + political_orientation, 
##     data = allbus_2021_cda_2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7280 -0.8571  0.1745  1.0988  4.6058 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            5.13222    0.11352  45.212  < 2e-16 ***
## party2                -0.20914    0.09399  -2.225   0.0262 *  
## party3                -0.59221    0.09616  -6.158 8.49e-10 ***
## party4                -0.14834    0.08259  -1.796   0.0726 .  
## party6                -1.14565    0.12212  -9.382  < 2e-16 ***
## party42               -2.15264    0.11833 -18.191  < 2e-16 ***
## party90               -1.49061    0.14501 -10.279  < 2e-16 ***
## party91               -1.74320    0.13065 -13.342  < 2e-16 ***
## political_orientation -0.09756    0.01812  -5.386 7.87e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.395 on 2584 degrees of freedom
##   (2749 Beobachtungen als fehlend gelöscht)
## Multiple R-squared:  0.213,  Adjusted R-squared:  0.2105 
## F-statistic:  87.4 on 8 and 2584 DF,  p-value: < 2.2e-16

2

As the next step in our analyses, we want assess whether these whole previous findings have something to do with satisfaction in democracy. For this purpose, we switch the dependent variable to satisfaction with democracy and move the the trust in parliament variable to the independent ones. Attention: We now have to run a logistic regression.

Clues

This time, you need the glm() function in which you need to specify a link function. The name of the outcome variable is satisfaction_democracy.

solution

reg_logistic <- 
  glm(
    satisfaction_democracy ~ trust_parliament + party + political_orientation,
    family = binomial(link = "logit"),
    data = allbus_2021_cda_2
  )

summary(reg_logistic)

## 
## Call:
## glm(formula = satisfaction_democracy ~ trust_parliament + party + 
##     political_orientation, family = binomial(link = "logit"), 
##     data = allbus_2021_cda_2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4639  -0.7866   0.4881   0.7956   2.6556  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.95856    0.26770  -7.316 2.55e-13 ***
## trust_parliament       0.67886    0.03802  17.857  < 2e-16 ***
## party2                -0.20142    0.16547  -1.217   0.2235    
## party3                -0.69078    0.16135  -4.281 1.86e-05 ***
## party4                -0.27840    0.14632  -1.903   0.0571 .  
## party6                -1.32804    0.20658  -6.429 1.29e-10 ***
## party42               -2.44766    0.28420  -8.612  < 2e-16 ***
## party90               -1.23118    0.25110  -4.903 9.44e-07 ***
## party91               -1.41016    0.23566  -5.984 2.18e-09 ***
## political_orientation  0.03852    0.03240   1.189   0.2344    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3474.9  on 2578  degrees of freedom
## Residual deviance: 2611.9  on 2569  degrees of freedom
##   (2763 Beobachtungen als fehlend gelöscht)
## AIC: 2631.9
## 
## Number of Fisher Scoring iterations: 5

3

Reviewer 2 is at it again… As Cauchit links are all the rage in her/his field, she/he wants us to run the same model with the sole difference of using a cauchit link…. sigh!

Clues

Have a look at the help page ?family to see how you can include a cauchit link.

solution

reg_cauchit <- 
   glm(
    satisfaction_democracy ~ trust_parliament + party + political_orientation,
    family = binomial(link = "cauchit"),
    data = allbus_2021_cda_2
  )

summary(reg_cauchit)

## 
## Call:
## glm(formula = satisfaction_democracy ~ trust_parliament + party + 
##     political_orientation, family = binomial(link = "cauchit"), 
##     data = allbus_2021_cda_2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1584  -0.7185   0.5479   0.7551   2.2811  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -2.07312    0.30902  -6.709 1.96e-11 ***
## trust_parliament       0.73461    0.05343  13.748  < 2e-16 ***
## party2                -0.29184    0.18114  -1.611   0.1072    
## party3                -0.68554    0.17267  -3.970 7.18e-05 ***
## party4                -0.29669    0.16647  -1.782   0.0747 .  
## party6                -1.45171    0.22357  -6.493 8.40e-11 ***
## party42               -2.99393    0.48022  -6.235 4.53e-10 ***
## party90               -1.15162    0.27056  -4.256 2.08e-05 ***
## party91               -1.62878    0.27738  -5.872 4.30e-09 ***
## political_orientation  0.01956    0.03589   0.545   0.5857    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3474.9  on 2578  degrees of freedom
## Residual deviance: 2623.6  on 2569  degrees of freedom
##   (2763 Beobachtungen als fehlend gelöscht)
## AIC: 2643.6
## 
## Number of Fisher Scoring iterations: 6

4

Compare both regression models using an ANOVA. Use the argument test = "LRT" in the function we need for this to perform a likelihood ratio test. What’s your interpretation?

Clues

A p-value considered as statistically significant would indicate a difference between the models.

solution

anova(reg_logistic, reg_cauchit, test = "LRT")

## Analysis of Deviance Table
## 
## Model 1: satisfaction_democracy ~ trust_parliament + party + political_orientation
## Model 2: satisfaction_democracy ~ trust_parliament + party + political_orientation
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      2569     2611.9                     
## 2      2569     2623.6  0  -11.754

# Interpretation of difference:
# It seems there is no big difference. We can make the reviewer happy with the
# cauchit models and still keep our main findings.

5

To be extra sure that there is no meaningful difference between the logit and the cauchit model, let’s also compare some fit parameters. We want the following model fit metrics: AIC, BIC, R², and RMSE

Clues

The performance package provides a function for comparing the performance of different models in terms of their fit.

solution

compare_performance(
  reg_logistic,
  reg_cauchit,
  metrics = c("AIC", "BIC", "R2", "RMSE")
)

## Some of the nested models seem to be identical

## # Comparison of Model Performance Indices
## 
## Name         | Model |      AIC | AIC weights |      BIC | BIC weights |  RMSE | Tjur's R2 | Nagelkerke's R2
## ------------------------------------------------------------------------------------------------------------
## reg_logistic |   glm | 2631.874 |       0.997 | 2690.425 |       0.997 | 0.409 |     0.305 |                
## reg_cauchit  |   glm | 2643.628 |       0.003 | 2702.180 |       0.003 | 0.409 |           |           0.380

Exercise 4_1_2: Regression analysis

Johannes Breuer, Stefan Jünger, Veronika Batzdorfer

Introduction to R for Data Analysis

1

Clues

solution

2

Clues

solution

3

Clues

solution

4

Clues

solution

5

Clues

solution