README.md 5.75 KB
Newer Older
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# Initiation aux data science à l'aide de R et du tidyverse

contact@prabi.fr

![L'univers bien rangé](https://thinkr.fr/wp-content/uploads/2019/07/thinkr_tidyverse-first_header.jpg)

## Remerciements

```
If you use the computing resources of LBBE / PRABI, thank you to make it
appear in your scientific publications by including the following sentence:
"This work was performed using the computing facilities of the CC LBBE/PRABI."
```

## Prérequis

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
17
- suivre l'épisode: "Prise en main de R à travers RStudio"
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
18

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
19 20
## Pour les fayots !

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
21 22
<p><a href="https://www.rstudio.com/resources/webinars/a-gentle-introduction-to-tidy-statistics-in-r/?wvideo=hltkvqscdz"><img src="https://embedwistia-a.akamaihd.net/deliveries/4d5cca91aba7c7aad975ac5838c69217987a9e83.jpg?image_play_button_size=2x&amp;image_crop_resized=960x540&amp;image_play_button=1&amp;image_play_button_color=71a5d4e0" width="400" height="225" style="width: 400px; height: 225px;"></a></p><p><a href="https://www.rstudio.com/resources/webinars/a-gentle-introduction-to-tidy-statistics-in-r/?wvideo=hltkvqscdz">A Gentle Introduction to Tidy Statistics in R - RStudio</a></p>

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
23
## Apprendre R à travers la pédagogie "tydiverse first" (https://thinkr.fr/pedagogie-de-la-formation-au-langage-r/)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
24

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
25 26 27
https://www.openscapes.org/blog/2020/10/12/tidy-data/

La force de R c'est environ 13000 packages référencés sur l'archive du CRAN https://cran.r-project.org/web/packages/ mais c'est aussi un de ses points faibles quand on débute.
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
28

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
29
Le langage R repose sur de nombreux package/fonctions/opérateurs/objets de base qu'il faut assimiler avant de rentrer dans le vif du sujet.
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
30

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
31
https://larmarange.github.io/analyse-R/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
32

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
33
Nous allons tenter d'apprivoiser R à l'aide du nouvel ordre tidyverse !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
34 35

![](https://static.wikia.nocookie.net/frstarwars/images/9/9f/Premier_Ordre_base_Starkiller.png/revision/latest/scale-to-width-down/1000?cb=20151108134926)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
36 37


NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
38 39 40
En 2016 Hadley Wickham propose un ensemble de packages R répondant à une grammaire cohérante, lisible et intuitive pour l'analyse de données / data science, dont notemment:

- l'importation des données
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
41 42 43 44 45 46 47
- la manipulation des données
- la visualisation des données
- la modélisation des données
- l'exportation des données

![](https://juba.github.io/tidyverse/resources/logos/core_packages.png)

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
48
Ces packages sont aujourd'hui regroupés sous le package tidyverse (https://www.tidyverse.org/), on y retrouve :
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
49 50 51 52 53 54 55 56 57
- ggplot2 (visualisation)
- dplyr (manipulation des données)
- tidyr (remise en forme des données)
- purrr (programmation)
- readr (importation de données)
- tibble (tableaux de données)
- forcats (variables qualitatives)
- stringr (chaînes de caractères)

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
58 59
https://thinkr.fr/c-est-quoi-le-tidyverse/

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
Tout d'abord nous allons installer le package `tidyverse` sous R/RStudio:

```R
install.packages("tidyverse")
```

Pour charger de librairie on utilise la fonction `library` de R avec `tidyverse` comme argument:

```R
library(tidyverse)
── Attaching packages ──────────────────────────────────────────── tidyverse 1.3.1 ──
 ggplot2 3.3.5      purrr   0.3.4
 tibble  3.1.5      dplyr   1.0.7
 tidyr   1.1.4      stringr 1.4.0
 readr   2.0.2      forcats 0.5.1
── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
80 81
Nous allons utiliser le jeux de données `iris` connu également sous le nom d'Iris de Fisher ou d'Iris d'Anderson (https://fr.wikipedia.org/wiki/Iris_de_Fisher, https://rpubs.com/vidhividhi/irisdataeda), pour illustrer les différentes fonctionalités de base du tidyverse.

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
82
http://cbdm-01.zdv.uni-mainz.de/~galanisl/danalysis/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
83 84 85 86 87 88 89 90

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg/440px-Kosaciec_szczecinkowaty_Iris_setosa.jpg" width="100px"> <i>Iris setosa</i><br>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/440px-Iris_versicolor_3.jpg" width="100px"> <i>Iris versicolor</i><br>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/440px-Iris_virginica.jpg" width="100px"> <i>Iris virginica</i><br>

Présentation des principes du tidyverse (https://juba.github.io/tidyverse/06-tidyverse.html)

- tidy data et tibble vs data.frame (voir chappitre 6.3, 6.4)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
91
https://www.openscapes.org/blog/2020/10/12/tidy-data/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
92 93
```R
class(iris)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
94 95 96 97
tidyris <- as_tibble(iris)
class(tidyris)
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
98
# ggplot2
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
99 100

```R
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
101 102
ggplot(data = tidyris, aes(x = Sepal.Length)) + geom_histogram()

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
103
ggplot(data = tidyris, aes(x = Sepal.Length, y = Sepal.Width, color = Species, shape = Species)) +
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
104 105 106 107 108
geom_point() + 
xlab("Sepal Length") +
ylab("Sepal Width") + 
ggtitle("Sepal Length-Width")

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
109
ggplot(data = tidyris, aes(x = Petal.Length, y = Petal.Width, color = Species, shape = Species)) +
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
110 111 112 113
geom_point() + 
xlab("Petal Length") +
ylab("Petal Width") + 
ggtitle("Petal Length-Width")
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
114 115
```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
116
# dplyr, le roi de la manipulation !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
117 118 119 120

```R

slice(iris,10)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
121
slice(iris,1:10)
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
122 123 124

```

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
125
# magritr, hip hip pipe !
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
126 127 128

Utiliser le forward pipe opérateur, %>% 
https://magrittr.tidyverse.org/
NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
129 130 131 132 133 134 135 136 137 138 139


# tidyquery

Pour les afficionados du langage `SQL` (Structured Query Langage), même si loin d'être parfait (par exemple ne gère les join > trois tables), tidyquery est fait pour vous !

https://github.com/ianmcook/tidyquery


# queryparser

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
140 141 142 143 144

# Les Rstudio cheatsheets

https://www.rstudio.com/resources/cheatsheets/

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
145 146 147 148
# Pour être styleR !

https://style.tidyverse.org/pipes.html

NAVRATIL VINCENT's avatar
NAVRATIL VINCENT committed
149 150 151 152
https://juba.github.io/tidyverse/index.html
https://larmarange.github.io/analyse-R/introduction-au-tidyverse.html
https://jcoliver.github.io/learn-r/