Essential R packages for data science projects

Leverage genius work from R community in your projects!

Quick reminder: install and use packages

# this command installs tidyr package from CRAN
install.packages("tidyr")
# returns tidyr package version
packageVersion("tidyr")
Update all your packages in a few clicks using RStudio
stringr::str_replace("Hello world!", "Hello", "Hi")
# load a package: it will throw an error if package is not installed
library(stringr)

Fetching data

Great R packages usually have a dedicated hex sticker: https://github.com/rstudio/hex-stickers
# read csv data delimited using comma (,)
input_data <- readr::read_csv("./input_data.csv")
# read csv data delimited using semi-colon (;)
input_data <- readr::read_csv2("./input_data.csv")
# read txt data delimited using whatever symbols (||)
input_data <- readr::read_delim("./input_data.txt", delim = "||")
# read Excel spreadsheets
input_data <- readxl::read_excel("input_data.xlsx", sheet = "page2")

Wrangling data

The pipe operator shipped with the magrittr package is a game changer https://github.com/tidyverse/magrittr
# without pipe operator
paste("Hello", "world!")
# with pipe operator
"Hello" %>% paste("world!)
library(dplyr)
# mtcars is a toy data set shipped with base R
# create a column
mtcars <- mtcars %>% mutate(vehicle = "car")
# filter on a column
mtcars <- mtcars %>% filter(cyl >= 6)
# create a column AND filter on a column
mtcars <- mtcars %>%
mutate(vehicle = "car") %>%
filter(cyl >= 6)
You may find inspirations from this Top 50 ggplot2 visualisation article : http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

Machine learning

library(dplyr)# say we want to predict iris having a big petal width
observations <- iris %>%
mutate(y = ifelse(Petal.Width >= 1.5, "big", "small")) %>%
select(-Petal.Width)
# set up a a 10-fold cross-validation
train_control <- caret::trainControl(method = "cv",
number = 10,
savePredictions = TRUE,
classProbs = TRUE)
# make it reproducible and train the model
set.seed(123)
model <- caret::train(y ~ .,
data = observations,
method = "glm",
trControl = train_control,
metric = "Accuracy")

Final words

Useful documentations and references

🇫🇷 Data scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store