We will continue to work with the same data as in the previous set of
exercises. By now, you should (hopefully) have saved them as an
.rds
file and, hence, be able to easily load them (in case
they are not already/still in your current R
workspace/working environment). This time, in addition to
base R
and the tidyverse
, we also need the
janitor
package.
allbus_2021_eda <- readRDS("./data/allbus_2021_eda.rds")
base R
, print a simple table with the frequencies of
the variable agec
. Also include the counts for missing
values.
useNA = "always"
.
table(allbus_2021_eda$agec, useNA = "always")
##
## <= 25 years 26 to 30 years 31 to 35 years 36 to 40 years 41 to 45 years 46 to 50 years
## 618 1136 1448 1450 617 32
## <NA>
## 41
base R
functions to get the
proportions for the variable sex
rounded to four decimal
places.
table()
into two other functions: One for
creating the proportion table and another one for rounding the decimal
places.
round(prop.table(table(allbus_2021_eda$sex)), 4)
##
## Male Female Non-binary
## 0.4912 0.5083 0.0006
dplyr
package to get the
frequencies and proportions for the sex
variable (without
worrying about the number of decimal places this time).
allbus_2021_eda %>%
count(sex) %>%
mutate(proportion = n/sum(n)) %>%
ungroup()
## sex n proportion
## 1 Male 2614 0.4893298390
## 2 Female 2705 0.5063646574
## 3 Non-binary 3 0.0005615874
## 4 <NA> 20 0.0037439161
janitor
package to display the counts and percentages for the categories in the
agec
variable. The output sbe rounded to 3 decimal places
and include percentage signs.
tabyl()
output using the
adorn_pct_formatting
function.
library(janitor)
allbus_2021_eda %>%
tabyl(agec) %>%
adorn_pct_formatting(digits = 3,
affix_sign = TRUE)
## agec n percent valid_percent
## <= 25 years 618 11.569% 11.658%
## 26 to 30 years 1136 21.265% 21.430%
## 31 to 35 years 1448 27.106% 27.316%
## 36 to 40 years 1450 27.143% 27.353%
## 41 to 45 years 617 11.550% 11.639%
## 46 to 50 years 32 0.599% 0.604%
## <NA> 41 0.768% -