R4DS visualisés avec des gaphiques en escalier

Pour le #Tidytuesday de cette semaine, on regarde les statistiques de la communauté R4DS avec des graphiques en escalier. Au menu: tendance, corrélation et mini-dashboard pour visualiser les données.

CONTEXTE

Le R4DS est une communauté en ligne pour toutes les personnes qui travaillent avec R et qui désirent améliorer leur compétences. Plus tôt ce mois-ci, la fondatrice, Jesse Mostipak, a donnée une présentation à la conférence de useR-2019 de Toulouse (France). On peut rejoindre cette communauté sur Slack.

OBJECTIFS

1) Explorer les données avec inspectdf
2) Visualiser l’évolution du nombre de membres depuis la création de la communauté avec un graphique en escaliser
3) Visualiser l’activité des membres depuis la création de la communauté aussi avec un graphique en escalier
4) Créer un mini-dashboard avec mes deux graphiques pour créer une viz qui explique mon message


IMPORTER

r4ds_members <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-16/r4ds_members.csv")
FALSE Parsed with column specification:  
FALSE cols(  
FALSE   .default = col_double(),  
FALSE   date = col_date(format = "")  
FALSE )
FALSE See spec(...) for full column specifications.



EXPLORER

glimpse(r4ds_members)
## Observations: 678  
## Variables: 21  
## $ date                                 <date> 2017-08-27, 2017-08-28, 20…  
## $ total_membership                     <dbl> 1, 1, 1, 1, 1, 188, 284, 32…  
## $ full_members                         <dbl> 1, 1, 1, 1, 1, 188, 284, 32…  
## $ guests                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …  
## $ daily_active_members                 <dbl> 1, 1, 1, 1, 1, 169, 225, 21…  
## $ daily_members_posting_messages       <dbl> 1, 0, 1, 0, 1, 111, 110, 96…  
## $ weekly_active_members                <dbl> 1, 1, 1, 1, 1, 169, 270, 30…  
## $ weekly_members_posting_messages      <dbl> 1, 1, 1, 1, 1, 111, 183, 21…  
## $ messages_in_public_channels          <dbl> 4, 0, 0, 0, 1, 252, 326, 20…  
## $ messages_in_private_channels         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …  
## $ messages_in_shared_channels          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …  
## $ messages_in_d_ms                     <dbl> 1, 0, 0, 0, 0, 119, 46, 71,…  
## $ percent_of_messages_public_channels  <dbl> 0.8000, 0.0000, 0.0000, 0.0…  
## $ percent_of_messages_private_channels <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …  
## $ percent_of_messages_d_ms             <dbl> 0.2000, 0.0000, 0.0000, 0.0…  
## $ percent_of_views_public_channels     <dbl> 0.2857, 1.0000, 1.0000, 1.0…  
## $ percent_of_views_private_channels    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …  
## $ percent_of_views_d_ms                <dbl> 0.7143, 0.0000, 0.0000, 0.0…  
## $ name                                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …  
## $ public_channels_single_workspace     <dbl> 10, 10, 11, 11, 12, 12, 12,…  
## $ messages_posted                      <dbl> 35, 35, 37, 38, 66, 1101, 1…
summary(r4ds_members)
##       date            total_membership  full_members        guests   
##  Min.   :2017-08-27   Min.   :   1.0   Min.   :   1.0   Min.   :0    
##  1st Qu.:2018-02-12   1st Qu.: 978.2   1st Qu.: 978.2   1st Qu.:0    
##  Median :2018-07-31   Median :1605.0   Median :1605.0   Median :0    
##  Mean   :2018-07-31   Mean   :1567.8   Mean   :1567.8   Mean   :0    
##  3rd Qu.:2019-01-16   3rd Qu.:2142.8   3rd Qu.:2142.8   3rd Qu.:0    
##  Max.   :2019-07-05   Max.   :3029.0   Max.   :3029.0   Max.   :0    
##  daily_active_members daily_members_posting_messages weekly_active_members  
##  Min.   :  1.00       Min.   :  0.00                 Min.   :  1.0          
##  1st Qu.: 63.00       1st Qu.:  6.00                 1st Qu.:206.0          
##  Median : 88.00       Median : 11.00                 Median :239.0          
##  Mean   : 91.39       Mean   : 13.24                 Mean   :249.7          
##  3rd Qu.:110.00       3rd Qu.: 16.00                 3rd Qu.:307.8          
##  Max.   :258.00       Max.   :111.00                 Max.   :525.0          
##  weekly_members_posting_messages messages_in_public_channels  
##  Min.   :  1.00                  Min.   :  0.00               
##  1st Qu.: 35.00                  1st Qu.:  9.25               
##  Median : 48.00                  Median : 19.00              
##  Mean   : 52.16                  Mean   : 28.46               
##  3rd Qu.: 59.00                  3rd Qu.: 35.00               
##  Max.   :278.00                  Max.   :326.00               
##  messages_in_private_channels messages_in_shared_channels messages_in_d_ms  
##  Min.   : 0.000               Min.   :0                   Min.   :  0.00    
##  1st Qu.: 0.000               1st Qu.:0                   1st Qu.:  1.00    
##  Median : 0.000               Median :0                   Median :  4.00    
##  Mean   : 1.718               Mean   :0                   Mean   : 13.05    
##  3rd Qu.: 0.000               3rd Qu.:0                   3rd Qu.: 12.00    
##  Max.   :75.000               Max.   :0                   Max.   :227.00    
##  percent_of_messages_public_channels percent_of_messages_private_channels  
##  Min.   :0.0000                      Min.   :0.0000                        
##  1st Qu.:0.5840                      1st Qu.:0.0000                        
##  Median :0.8000                      Median :0.0000                        
##  Mean   :0.7248                      Mean   :0.0305                        
##  3rd Qu.:0.9444                      3rd Qu.:0.0000                        
##  Max.   :1.0000                      Max.   :1.0000                        
##  percent_of_messages_d_ms percent_of_views_public_channels  
##  Min.   :0.0000           Min.   :0.2726                    
##  1st Qu.:0.0345           1st Qu.:0.9115                    
##  Median :0.1595           Median :0.9519                    
##  Mean   :0.2270           Mean   :0.9285                    
##  3rd Qu.:0.3478           3rd Qu.:0.9744                    
##  Max.   :1.0000           Max.   :1.0000                    
##  percent_of_views_private_channels percent_of_views_d_ms      name    
##  Min.   :0.000000                  Min.   :0.00000       Min.   :0    
##  1st Qu.:0.000000                  1st Qu.:0.02235       1st Qu.:0    
##  Median :0.000000                  Median :0.04170       Median :0    
##  Mean   :0.009773                  Mean   :0.06176       Mean   :0    
##  3rd Qu.:0.006450                  3rd Qu.:0.07433       3rd Qu.:0    
##  Max.   :0.267400                  Max.   :0.72170       Max.   :0    
##  public_channels_single_workspace messages_posted  
##  Min.   :10.0                     Min.   :   35    
##  1st Qu.:15.0                     1st Qu.:20543    
##  Median :19.0                     Median :33828    
##  Mean   :17.8                     Mean   :32936    
##  3rd Qu.:21.0                     3rd Qu.:40104    
##  Max.   :27.0                     Max.   :59627

Pas de données manquante ici dans les données et les variables sont stockés sous le bon format. Pour visualiser rapidement les corrélations entre les variables, j’ai dû réduire le nombre de variables et utiliser seulement celles qui me semblaient le plus pertinentes.

r4ds_cor<-r4ds_members %>%
     select('total_membership', 'daily_active_members', 'daily_members_posting_messages', 'weekly_active_members', 'weekly_members_posting_messages', 'messages_posted') %>%
     inspect_cor() %>%
     show_plot()

Comme on pouvait s’y attendre la corrélation est très forte entre le nombre de message publié et le nombre de membre…



PRÉPARER:

r4ds<-r4ds_members %>%
     select('date','total_membership','messages_posted') %>%
     mutate(quarter=quarter(date, with_year = TRUE)) %>%
     group_by(quarter) %>%
     summarise(tot_m_1000=sum(total_membership/1000)) %>%
     filter(!quarter %in% 2019.3)

 r4ds_point<-r4ds %>%
     filter(quarter %in% c(2017.3, 2019.2))

r4ds_active<-r4ds_members %>%
     select('date','total_membership','daily_active_members') %>%
     mutate(quarter=quarter(date, with_year = TRUE)) %>%
     group_by(quarter) %>%
     summarise(active_1000=sum(daily_active_members/1000)) %>%
     filter(!quarter %in% 2019.3)

r4ds_point_active<-r4ds_active %>%
     filter(quarter %in% c(2017.3, 2019.2))



VISUALISER

#Graphique   
gg1<-ggplot(data=r4ds, aes(x = quarter, y=tot_m_1000))  
gg1<-gg1 + geom_step(linetype=5, color="#A9A9A9", size=2.5)  
gg1<-gg1 + geom_step(data=r4ds_active, aes(x = quarter, y=active_1000),linetype=5, color="#A9A9A9", size=2.5)
gg1<-gg1 +  geom_rect(data=r4ds,
                     mapping=aes(xmin=2018.1,xmax=2018.4,ymin=0,ymax=Inf),
                     fill='#01A7C2',alpha=0.05)  
gg1<-gg1 + geom_point(data=r4ds_point,
                      mapping=(aes(x=quarter,y=tot_m_1000)),
                      color="#A9A9A9", size=5)
gg1<-gg1 + geom_point(data=r4ds_point_active,
                      mapping=(aes(x=quarter,y=active_1000)),
                      color="#A9A9A9", size=5)
#ajuster les axes   
gg1<-gg1 + scale_x_yearqtr(breaks = seq(from = min(r4ds$quarter), to = max(r4ds$quarter), by = 0.25),
                    format = "%Y-%q")  
gg1<-gg1 + scale_y_continuous(breaks=seq(0,300,50), limits = c(0, 300))  
#modifier la légende  
gg1<-gg1 + theme(legend.position="none") 
#modifier le thème  
gg1<-gg1 +theme(panel.border = element_blank(),
                panel.background = element_blank(),
                plot.background = element_blank(),
                panel.grid.major.y= element_blank(),
                panel.grid.major.x= element_blank(),
                panel.grid.minor = element_blank(),
                axis.line.x = element_line(color="#A9A9A9"),
                axis.line.y = element_line(color="#A9A9A9"),
                axis.ticks= element_blank())  
#ajouter les titres  
gg1<-gg1 + labs(title="",
                subtitle=" ",
                y="Membres (x1000)",
                 x=" ")  
gg1<-gg1 + theme(plot.title= element_text(hjust=0,size=15, color="#A9A9A9", face="bold"),
                 plot.subtitle = element_text(hjust=0,size=12, color="#A9A9A9"),
                 axis.title.y  = element_text(hjust=1,size=12, color="#A9A9A9", angle=90),
                 axis.title.x  = element_blank(),
                 axis.text.y   = element_text(hjust=0.5, size=10, color="#A9A9A9"),
                  axis.text.x   = element_text(hjust=0.5, size=10, color="#A9A9A9"))  
#ajouter les étiquettes  
gg1<-gg1 + annotate(geom="text", x=2019.2,y=270, label="Total", color="#A9A9A9", size=5, hjust=0.5,vjust=0, fontface="bold")  
gg1<-gg1 + annotate(geom="text", x=2019.2,y=18, label="Actif", color="#A9A9A9", size=5, hjust=0.5,vjust=0, fontface="bold")



PRÉPARER:

r4ds<-r4ds_members %>%
     mutate(daily_message=messages_posted-shift(messages_posted)) %>%
     filter(daily_message>0 & daily_message<5000) %>%
     mutate(activity=(daily_message/daily_active_members)) %>%
     mutate(quarter=quarter(date, with_year = TRUE)) %>%
     group_by(quarter) %>%
     summarise(active=sum(activity)) %>%
     filter(!quarter %in% 2019.3)

r4ds_point_activity<-r4ds %>%
     filter(quarter %in% c(2017.3, 2019.2))



VISUALISER

#Graphique   
gg2<-ggplot(data=r4ds, aes(x = quarter, y=active))  
gg2<-gg2 + geom_step(linetype=5, color="#A9A9A9", size=2.5)  
gg2<-gg2 +  geom_rect(data=r4ds,
              mapping=aes(xmin=2018.1,xmax=2018.4,ymin=0,ymax=Inf),
              fill='#01A7C2',alpha=0.05)  
gg2<-gg2 + geom_point(data=r4ds_point_activity,
                      mapping=(aes(x=quarter,y=active)),                       color="#A9A9A9", size=5)  
#ajuster les axes   
gg2<-gg2 + scale_x_yearqtr(breaks = seq(from = min(r4ds$quarter), to = max(r4ds$quarter), by = 0.25),
                    format = "%Y-%q")  
gg2<-gg2 + scale_y_continuous(breaks=seq(0,100,25), limits = c(0, 100))  
#modifier la légende  
gg2<-gg2 + theme(legend.position="none")  
#modifier le thème  gg2<-gg2 +theme(panel.border = element_blank(),
                panel.background = element_blank(),
                plot.background = element_blank(),
                panel.grid.major.y= element_blank(),
                panel.grid.major.x= element_blank(),
                panel.grid.minor = element_blank(),
                axis.line.x = element_line(color="#A9A9A9"),                axis.line.y = element_line(color="#A9A9A9"),
                axis.ticks= element_blank()) 
#ajouter les titres  
gg2<-gg2 + labs(title="",
                subtitle=" ",
                y="Messages quotidien/membre actif",
                 x=" ")  
gg2<-gg2 + theme(plot.title    = element_text(hjust=0,size=15, color="#A9A9A9", face="bold"),
                 plot.subtitle = element_text(hjust=0,size=12, color="#A9A9A9"),
                 axis.title.y  = element_text(hjust=1,size=12, color="#A9A9A9", angle=90),
                 axis.title.x  = element_blank(),
                 axis.text.y   = element_text(hjust=0.5, size=10, color="#A9A9A9"),
                  axis.text.x   = element_text(hjust=0.5, size=10, color="#A9A9A9"))

Voici ce que ça donne:

graphique en escalier

Tu veux en savoir plus sur ma démarche? Va écouter l’épisode de podcast dans lequel j’explique mes réflexions pour arriver à ce résultat.

Publicités

Laisser un commentaire

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.