README.md 5.83 KB
Newer Older
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# Initiation aux data science à l'aide de R et du tidyverse

contact@prabi.fr

![L'univers bien rangé](https://thinkr.fr/wp-content/uploads/2019/07/thinkr_tidyverse-first_header.jpg)

## Remerciements

```
If you use the computing resources of LBBE / PRABI, thank you to make it
appear in your scientific publications by including the following sentence:
"This work was performed using the computing facilities of the CC LBBE/PRABI."
```

## Prérequis

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
17
- suivre l'épisode: "Prise en main de R à travers RStudio"
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
18 19 20 21 22
- Suivre le tutorial "ex-data-filter"

```R
run_tutorial("ex-data-filter","learnr")
````
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
23

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
24 25
## Pour les fayots !

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
26
<p><a href="https://www.rstudio.com/resources/webinars/tidyverse-visualization-manipulation-basics/?wvideo=jhnn2k6w3g"><img src="https://embedwistia-a.akamaihd.net/deliveries/bb809dff0e80d61ec5f3b1b5e2f870f4410fb38d.jpg?image_play_button_size=2x&amp;image_crop_resized=960x540&amp;image_play_button=1&amp;image_play_button_color=71a5d4e0" width="400" height="225" style="width: 400px; height: 225px;"></a></p><p><a href="https://www.rstudio.com/resources/webinars/tidyverse-visualization-manipulation-basics/?wvideo=jhnn2k6w3g">Tidyverse visualization manipulation basics - RStudio</a></p>
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
27

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
28
## Apprendre R à travers la pédagogie "tydiverse first" (https://thinkr.fr/pedagogie-de-la-formation-au-langage-r/)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
29

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
30 31 32
https://www.openscapes.org/blog/2020/10/12/tidy-data/

La force de R c'est environ 13000 packages référencés sur l'archive du CRAN https://cran.r-project.org/web/packages/ mais c'est aussi un de ses points faibles quand on débute.
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
33

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
34
Le langage R repose sur de nombreux package/fonctions/opérateurs/objets de base qu'il faut assimiler avant de rentrer dans le vif du sujet.
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
35

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
36
https://larmarange.github.io/analyse-R/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
37

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
38
Nous allons tenter d'apprivoiser R à l'aide du nouvel ordre tidyverse !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
39 40

![](https://static.wikia.nocookie.net/frstarwars/images/9/9f/Premier_Ordre_base_Starkiller.png/revision/latest/scale-to-width-down/1000?cb=20151108134926)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
41 42


NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
43 44 45
En 2016 Hadley Wickham propose un ensemble de packages R répondant à une grammaire cohérante, lisible et intuitive pour l'analyse de données / data science, dont notemment:

- l'importation des données
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
46 47 48 49 50 51 52
- la manipulation des données
- la visualisation des données
- la modélisation des données
- l'exportation des données

![](https://juba.github.io/tidyverse/resources/logos/core_packages.png)

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
53
Ces packages sont aujourd'hui regroupés sous le package tidyverse (https://www.tidyverse.org/), on y retrouve :
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
54 55 56 57 58 59 60 61 62
- ggplot2 (visualisation)
- dplyr (manipulation des données)
- tidyr (remise en forme des données)
- purrr (programmation)
- readr (importation de données)
- tibble (tableaux de données)
- forcats (variables qualitatives)
- stringr (chaînes de caractères)

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
63 64
https://thinkr.fr/c-est-quoi-le-tidyverse/

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
Tout d'abord nous allons installer le package `tidyverse` sous R/RStudio:

```R
install.packages("tidyverse")
```

Pour charger de librairie on utilise la fonction `library` de R avec `tidyverse` comme argument:

```R
library(tidyverse)
── Attaching packages ──────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5     ✓ purrr   0.3.4
✓ tibble  3.1.5     ✓ dplyr   1.0.7
✓ tidyr   1.1.4     ✓ stringr 1.4.0
✓ readr   2.0.2     ✓ forcats 0.5.1
── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
85 86
Nous allons utiliser le jeux de données `iris` connu également sous le nom d'Iris de Fisher ou d'Iris d'Anderson (https://fr.wikipedia.org/wiki/Iris_de_Fisher, https://rpubs.com/vidhividhi/irisdataeda), pour illustrer les différentes fonctionalités de base du tidyverse.

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
87
http://cbdm-01.zdv.uni-mainz.de/~galanisl/danalysis/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
88 89 90 91 92 93 94 95

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg/440px-Kosaciec_szczecinkowaty_Iris_setosa.jpg" width="100px"> <i>Iris setosa</i><br>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/440px-Iris_versicolor_3.jpg" width="100px"> <i>Iris versicolor</i><br>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/440px-Iris_virginica.jpg" width="100px"> <i>Iris virginica</i><br>

Présentation des principes du tidyverse (https://juba.github.io/tidyverse/06-tidyverse.html)

- tidy data et tibble vs data.frame (voir chappitre 6.3, 6.4)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
96
https://www.openscapes.org/blog/2020/10/12/tidy-data/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
97 98
```R
class(iris)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
99 100 101 102
tidyris <- as_tibble(iris)
class(tidyris)
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
103
# ggplot2
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
104 105

```R
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
106 107
ggplot(data = tidyris, aes(x = Sepal.Length)) + geom_histogram()

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
108
ggplot(data = tidyris, aes(x = Sepal.Length, y = Sepal.Width, color = Species, shape = Species)) +
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
109 110 111 112 113
geom_point() + 
xlab("Sepal Length") +
ylab("Sepal Width") + 
ggtitle("Sepal Length-Width")

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
114
ggplot(data = tidyris, aes(x = Petal.Length, y = Petal.Width, color = Species, shape = Species)) +
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
115 116 117 118
geom_point() + 
xlab("Petal Length") +
ylab("Petal Width") + 
ggtitle("Petal Length-Width")
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
119 120
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
121
# dplyr, le roi de la manipulation !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
122 123 124 125

```R

slice(iris,10)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
126
slice(iris,1:10)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
127 128 129

```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
130
# magritr, hip hip pipe !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
131 132 133

Utiliser le forward pipe opérateur, %>% 
https://magrittr.tidyverse.org/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
134 135 136 137 138 139 140 141 142 143 144


# tidyquery

Pour les afficionados du langage `SQL` (Structured Query Langage), même si loin d'être parfait (par exemple ne gère les join > trois tables), tidyquery est fait pour vous !

https://github.com/ianmcook/tidyquery


# queryparser

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
145 146 147 148 149

# Les Rstudio cheatsheets

https://www.rstudio.com/resources/cheatsheets/

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
150 151 152 153
# Pour être styleR !

https://style.tidyverse.org/pipes.html

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
154 155 156 157
https://juba.github.io/tidyverse/index.html
https://larmarange.github.io/analyse-R/introduction-au-tidyverse.html
https://jcoliver.github.io/learn-r/