As per usual, we first need to load our data. This time, we also wrangle some variables at the beginning. So please execute the following R commands.
library(dplyr)
library(haven)
allbus_2021_cda_2 <-
read_spss("./data/allbus_2021/ZA5280_v1-0-0.sav") %>%
mutate(
trust_parliament = as.numeric(pt03),
party = pv01,
political_orientation = as.numeric(pa01),
satisfaction_democracy = ifelse(as.numeric(ps03) <= 2, 1, 0)
)
## Converting atomic to factors. Please wait...
In case you have not done so yet, please also install the
performance package for this set of exercises.
if (!require(performance)) install.packages("performance")
In the following exercise, we will cover/repeat some of the basics of
regression analysis in R.
trust_parliament) as the outcome
variable and choice of party (party) and political
orientation (political_orientation) as predictors.
summary() again.
reg_linear <-
lm(trust_parliament ~ party + political_orientation, data = allbus_2021_cda_2)
summary(reg_linear)
##
## Call:
## lm(formula = trust_parliament ~ party + political_orientation,
## data = allbus_2021_cda_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7280 -0.8571 0.1745 1.0988 4.6058
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.13222 0.11352 45.212 < 2e-16 ***
## party2 -0.20914 0.09399 -2.225 0.0262 *
## party3 -0.59221 0.09616 -6.158 8.49e-10 ***
## party4 -0.14834 0.08259 -1.796 0.0726 .
## party6 -1.14565 0.12212 -9.382 < 2e-16 ***
## party42 -2.15264 0.11833 -18.191 < 2e-16 ***
## party90 -1.49061 0.14501 -10.279 < 2e-16 ***
## party91 -1.74320 0.13065 -13.342 < 2e-16 ***
## political_orientation -0.09756 0.01812 -5.386 7.87e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.395 on 2584 degrees of freedom
## (2749 Beobachtungen als fehlend gelöscht)
## Multiple R-squared: 0.213, Adjusted R-squared: 0.2105
## F-statistic: 87.4 on 8 and 2584 DF, p-value: < 2.2e-16
glm() function in which you need to
specify a link function. The name of the outcome variable is
satisfaction_democracy.
reg_logistic <-
glm(
satisfaction_democracy ~ trust_parliament + party + political_orientation,
family = binomial(link = "logit"),
data = allbus_2021_cda_2
)
summary(reg_logistic)
##
## Call:
## glm(formula = satisfaction_democracy ~ trust_parliament + party +
## political_orientation, family = binomial(link = "logit"),
## data = allbus_2021_cda_2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4639 -0.7866 0.4881 0.7956 2.6556
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.95856 0.26770 -7.316 2.55e-13 ***
## trust_parliament 0.67886 0.03802 17.857 < 2e-16 ***
## party2 -0.20142 0.16547 -1.217 0.2235
## party3 -0.69078 0.16135 -4.281 1.86e-05 ***
## party4 -0.27840 0.14632 -1.903 0.0571 .
## party6 -1.32804 0.20658 -6.429 1.29e-10 ***
## party42 -2.44766 0.28420 -8.612 < 2e-16 ***
## party90 -1.23118 0.25110 -4.903 9.44e-07 ***
## party91 -1.41016 0.23566 -5.984 2.18e-09 ***
## political_orientation 0.03852 0.03240 1.189 0.2344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3474.9 on 2578 degrees of freedom
## Residual deviance: 2611.9 on 2569 degrees of freedom
## (2763 Beobachtungen als fehlend gelöscht)
## AIC: 2631.9
##
## Number of Fisher Scoring iterations: 5
?family to see how you can
include a cauchit link.
reg_cauchit <-
glm(
satisfaction_democracy ~ trust_parliament + party + political_orientation,
family = binomial(link = "cauchit"),
data = allbus_2021_cda_2
)
summary(reg_cauchit)
##
## Call:
## glm(formula = satisfaction_democracy ~ trust_parliament + party +
## political_orientation, family = binomial(link = "cauchit"),
## data = allbus_2021_cda_2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1584 -0.7185 0.5479 0.7551 2.2811
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.07312 0.30902 -6.709 1.96e-11 ***
## trust_parliament 0.73461 0.05343 13.748 < 2e-16 ***
## party2 -0.29184 0.18114 -1.611 0.1072
## party3 -0.68554 0.17267 -3.970 7.18e-05 ***
## party4 -0.29669 0.16647 -1.782 0.0747 .
## party6 -1.45171 0.22357 -6.493 8.40e-11 ***
## party42 -2.99393 0.48022 -6.235 4.53e-10 ***
## party90 -1.15162 0.27056 -4.256 2.08e-05 ***
## party91 -1.62878 0.27738 -5.872 4.30e-09 ***
## political_orientation 0.01956 0.03589 0.545 0.5857
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3474.9 on 2578 degrees of freedom
## Residual deviance: 2623.6 on 2569 degrees of freedom
## (2763 Beobachtungen als fehlend gelöscht)
## AIC: 2643.6
##
## Number of Fisher Scoring iterations: 6
test = "LRT" in the function we need for this to perform a
likelihood ratio test. What’s your interpretation?
anova(reg_logistic, reg_cauchit, test = "LRT")
## Analysis of Deviance Table
##
## Model 1: satisfaction_democracy ~ trust_parliament + party + political_orientation
## Model 2: satisfaction_democracy ~ trust_parliament + party + political_orientation
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2569 2611.9
## 2 2569 2623.6 0 -11.754
# Interpretation of difference:
# It seems there is no big difference. We can make the reviewer happy with the
# cauchit models and still keep our main findings.
performance package provides a function for comparing
the performance of different models in terms of their fit.
compare_performance(
reg_logistic,
reg_cauchit,
metrics = c("AIC", "BIC", "R2", "RMSE")
)
## Some of the nested models seem to be identical
## # Comparison of Model Performance Indices
##
## Name | Model | AIC | AIC weights | BIC | BIC weights | RMSE | Tjur's R2 | Nagelkerke's R2
## ------------------------------------------------------------------------------------------------------------
## reg_logistic | glm | 2631.874 | 0.997 | 2690.425 | 0.997 | 0.409 | 0.305 |
## reg_cauchit | glm | 2643.628 | 0.003 | 2702.180 | 0.003 | 0.409 | | 0.380