README.md 5.72 KB
Newer Older
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# Initiation aux data science à l'aide de R et du tidyverse

contact@prabi.fr

![L'univers bien rangé](https://thinkr.fr/wp-content/uploads/2019/07/thinkr_tidyverse-first_header.jpg)

## Remerciements

```
If you use the computing resources of LBBE / PRABI, thank you to make it
appear in your scientific publications by including the following sentence:
"This work was performed using the computing facilities of the CC LBBE/PRABI."
```

## Prérequis

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
17
- suivre l'épisode: "Prise en main de R à travers RStudio"
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
18

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
19 20
<p><a href="https://www.rstudio.com/resources/webinars/a-gentle-introduction-to-tidy-statistics-in-r/?wvideo=hltkvqscdz"><img src="https://embedwistia-a.akamaihd.net/deliveries/4d5cca91aba7c7aad975ac5838c69217987a9e83.jpg?image_play_button_size=2x&amp;image_crop_resized=960x540&amp;image_play_button=1&amp;image_play_button_color=71a5d4e0" width="400" height="225" style="width: 400px; height: 225px;"></a></p><p><a href="https://www.rstudio.com/resources/webinars/a-gentle-introduction-to-tidy-statistics-in-r/?wvideo=hltkvqscdz">A Gentle Introduction to Tidy Statistics in R - RStudio</a></p>

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
21
## Apprendre R à travers la pédagogie "tydiverse first" (https://thinkr.fr/pedagogie-de-la-formation-au-langage-r/)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
22

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
23 24 25
https://www.openscapes.org/blog/2020/10/12/tidy-data/

La force de R c'est environ 13000 packages référencés sur l'archive du CRAN https://cran.r-project.org/web/packages/ mais c'est aussi un de ses points faibles quand on débute.
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
26

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
27
Le langage R repose sur de nombreux package/fonctions/opérateurs/objets de base qu'il faut assimiler avant de rentrer dans le vif du sujet.
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
28

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
29
https://larmarange.github.io/analyse-R/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
30

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
31
Nous allons tenter d'apprivoiser R à l'aide du nouvel ordre tidyverse !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
32 33

![](https://static.wikia.nocookie.net/frstarwars/images/9/9f/Premier_Ordre_base_Starkiller.png/revision/latest/scale-to-width-down/1000?cb=20151108134926)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
34 35


NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
36 37 38
En 2016 Hadley Wickham propose un ensemble de packages R répondant à une grammaire cohérante, lisible et intuitive pour l'analyse de données / data science, dont notemment:

- l'importation des données
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
39 40 41 42 43 44 45
- la manipulation des données
- la visualisation des données
- la modélisation des données
- l'exportation des données

![](https://juba.github.io/tidyverse/resources/logos/core_packages.png)

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
46
Ces packages sont aujourd'hui regroupés sous le package tidyverse (https://www.tidyverse.org/), on y retrouve :
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
47 48 49 50 51 52 53 54 55
- ggplot2 (visualisation)
- dplyr (manipulation des données)
- tidyr (remise en forme des données)
- purrr (programmation)
- readr (importation de données)
- tibble (tableaux de données)
- forcats (variables qualitatives)
- stringr (chaînes de caractères)

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
56 57
https://thinkr.fr/c-est-quoi-le-tidyverse/

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
Tout d'abord nous allons installer le package `tidyverse` sous R/RStudio:

```R
install.packages("tidyverse")
```

Pour charger de librairie on utilise la fonction `library` de R avec `tidyverse` comme argument:

```R
library(tidyverse)
── Attaching packages ──────────────────────────────────────────── tidyverse 1.3.1 ──
 ggplot2 3.3.5      purrr   0.3.4
 tibble  3.1.5      dplyr   1.0.7
 tidyr   1.1.4      stringr 1.4.0
 readr   2.0.2      forcats 0.5.1
── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
78 79
Nous allons utiliser le jeux de données `iris` connu également sous le nom d'Iris de Fisher ou d'Iris d'Anderson (https://fr.wikipedia.org/wiki/Iris_de_Fisher, https://rpubs.com/vidhividhi/irisdataeda), pour illustrer les différentes fonctionalités de base du tidyverse.

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
80
http://cbdm-01.zdv.uni-mainz.de/~galanisl/danalysis/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
81 82 83 84 85 86 87 88

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg/440px-Kosaciec_szczecinkowaty_Iris_setosa.jpg" width="100px"> <i>Iris setosa</i><br>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/440px-Iris_versicolor_3.jpg" width="100px"> <i>Iris versicolor</i><br>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/440px-Iris_virginica.jpg" width="100px"> <i>Iris virginica</i><br>

Présentation des principes du tidyverse (https://juba.github.io/tidyverse/06-tidyverse.html)

- tidy data et tibble vs data.frame (voir chappitre 6.3, 6.4)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
89
https://www.openscapes.org/blog/2020/10/12/tidy-data/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
90 91
```R
class(iris)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
92 93 94 95
tidyris <- as_tibble(iris)
class(tidyris)
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
96
# ggplot2
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
97 98

```R
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
99 100
ggplot(data = tidyris, aes(x = Sepal.Length)) + geom_histogram()

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
101
ggplot(data = tidyris, aes(x = Sepal.Length, y = Sepal.Width, color = Species, shape = Species)) +
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
102 103 104 105 106
geom_point() + 
xlab("Sepal Length") +
ylab("Sepal Width") + 
ggtitle("Sepal Length-Width")

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
107
ggplot(data = tidyris, aes(x = Petal.Length, y = Petal.Width, color = Species, shape = Species)) +
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
108 109 110 111
geom_point() + 
xlab("Petal Length") +
ylab("Petal Width") + 
ggtitle("Petal Length-Width")
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
112 113
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
114
# dplyr, le roi de la manipulation !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
115 116 117 118

```R

slice(iris,10)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
119
slice(iris,1:10)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
120 121 122

```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
123
# magritr, hip hip pipe !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
124 125 126

Utiliser le forward pipe opérateur, %>% 
https://magrittr.tidyverse.org/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
127 128 129 130 131 132 133 134 135 136 137


# tidyquery

Pour les afficionados du langage `SQL` (Structured Query Langage), même si loin d'être parfait (par exemple ne gère les join > trois tables), tidyquery est fait pour vous !

https://github.com/ianmcook/tidyquery


# queryparser

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
138 139 140 141 142

# Les Rstudio cheatsheets

https://www.rstudio.com/resources/cheatsheets/

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
143 144 145 146
# Pour être styleR !

https://style.tidyverse.org/pipes.html

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
147 148 149 150
https://juba.github.io/tidyverse/index.html
https://larmarange.github.io/analyse-R/introduction-au-tidyverse.html
https://jcoliver.github.io/learn-r/