titanic
data from the data
folder.
CSV
. You can use the readr
library and a function that starts with read_...
(of course, you could also use the base R
equivalent if you want to).
library(readr)
titanic <-
read_csv("./data/titanic.csv")
## Rows: 891 Columns: 12
## -- Column specification ------------------------------------------------------------------------------------------
## Delimiter: ","
## chr (5): Name, Sex, Ticket, Cabin, Embarked
## dbl (7): PassengerId, Survived, Pclass, Age, SibSp, Parch, Fare
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
You may have noticed that the function you just used imports factor variables as characters by default. For some analyses, this may not be what we want (for example, if we’re going to use “sex” as a predictor in a regression).
sex
to a factor.
# change column type while importing
titanic <-
read_csv(
"./data/titanic.csv",
col_types = cols(
Sex = col_factor()
)
)
We have already worked with the titanic data quite a bit. Let’s import some other data for a change.
.xlsx
file containing Gapminder data on GDP from the data
folder.
readxl
package for this importing task. The name of the file we need is GDPpercapitaconstant2000US.xlsx
.
library(readxl)
gapminder_GDP <-
read_excel("./data/GDPpercapitaconstant2000US.xlsx")
As you may have noticed, the format of the output of the two importing functions is the same (tibbles in both cases). Sometimes, however, the contents of an Excel file are not that easy to import. We will illustrate this with the help of the Unicorns on Unicycles
dataset. This is what is known about this data according to its creator:
The documents were recently unearthed from a hidden chest in Delft and seem to be written by Rudolphus Hogervorstus, my great great great uncle, in 1681. These documents show that he was a scientist studying the then roaming herds of unicorns in the area around Delft. Unfortunately these animals are extinct now. His work contains multiple tables, carefully written down, documenting the population of unicorns over time in multiple places and related to that the sales and numbers of unicycles in those countries. According to Rudolphus the unicorn populations and unicycles are related “The presence of the cone on the unicorn hints at a very defined sense of equilibrium, it is therefore only natural to assume unicorns ride unicycles”. As part of the archival process these tables were copied, as Rudolphus himself would say: “with the black magic, so vile it could not be discussed for hell would come descent upon us” into satans own spawn: Microsoft Excel.
Source: https://github.com/RMHogervorst/unicorns_on_unicycles
.xlsx
file. As we are not interested in the total_turnover
variable only read in the cell range A1:C43
range = range_definition
(see ?read_excel
).
library(readxl)
unicorn_sales <-
read_excel(
"./data/sales.xlsx",
range = "A1:C43"
)