Pour le #Tidytuesday de cette semaine, j’ai créer un nuage de point et dégager une tendance dans les données des comics et animations qui sont catalogués sur le site MyAnimeList.net.
CONTEXTE
Ce site offre à ses utilisateurs un système de type liste pour organiser et marquer des comics et des animations selon le gout de l’utilisateur et fournit une grande base de données sur les animations et les comics. Le site prétend avoir 4,4 millions d’animations et 775 000 comics. En 2015, le site a reçu 120 millions de visiteurs par mois.
OBJECTIFS
1) Visualiser la relation entre la popularité des animations et leur score.
IMPORTER
tidy_anime <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-23/tidy_anime.csv")
FALSE Parsed with column specification:
FALSE cols(
FALSE .default = col_character(),
FALSE animeID = col_double(),
FALSE episodes = col_double(),
FALSE airing = col_logical(),
FALSE start_date = col_date(format = ""),
FALSE end_date = col_date(format = ""),
FALSE score = col_double(),
FALSE scored_by = col_double(),
FALSE rank = col_double(),
FALSE popularity = col_double(),
FALSE members = col_double(),
FALSE favorites = col_double()
FALSE )
FALSE See spec(...) for full column specifications.
EXPLORER
glimpse(tidy_anime)
## Observations: 77,911
## Variables: 28
## $ animeID <dbl> 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6…
## $ name <chr> "Cowboy Bebop", "Cowboy Bebop", "Cowboy Bebop", "…
## $ title_english <chr> "Cowboy Bebop", "Cowboy Bebop", "Cowboy Bebop", "…
## $ title_japanese <chr> "カウボーイビバップ", "カウボーイビバップ", "カウボーイビバップ", "カウボーイビバップ…
## $ title_synonyms <chr> "[]", "[]", "[]", "[]", "[]", "[]", "[\"Cowboy Be…
## $ type <chr> "TV", "TV", "TV", "TV", "TV", "TV", "Movie", "Mov…
## $ source <chr> "Original", "Original", "Original", "Original", "…
## $ producers <chr> "Bandai Visual", "Bandai Visual", "Bandai Visual"…
## $ genre <chr> "Action", "Adventure", "Comedy", "Drama", "Sci-Fi…
## $ studio <chr> "Sunrise", "Sunrise", "Sunrise", "Sunrise", "Sunr…
## $ episodes <dbl> 26, 26, 26, 26, 26, 26, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ status <chr> "Finished Airing", "Finished Airing", "Finished A…
## $ airing <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ start_date <date> 1998-04-03, 1998-04-03, 1998-04-03, 1998-04-03, …
## $ end_date <date> 1999-04-02, 1999-04-02, 1999-04-02, 1999-04-02, …
## $ duration <chr> "24 min per ep", "24 min per ep", "24 min per ep"…
## $ rating <chr> "R - 17+ (violence & profanity)", "R - 17+ (viole…
## $ score <dbl> 8.81, 8.81, 8.81, 8.81, 8.81, 8.81, 8.41, 8.41, 8…
## $ scored_by <dbl> 405664, 405664, 405664, 405664, 405664, 405664, 1…
## $ rank <dbl> 26, 26, 26, 26, 26, 26, 164, 164, 164, 164, 164, …
## $ popularity <dbl> 39, 39, 39, 39, 39, 39, 449, 449, 449, 449, 449, …
## $ members <dbl> 795733, 795733, 795733, 795733, 795733, 795733, 1…
## $ favorites <dbl> 43460, 43460, 43460, 43460, 43460, 43460, 776, 77…
## $ synopsis <chr> "In the year 2071, humanity has colonized several…
## $ background <chr> "When Cowboy Bebop first aired in spring of 1998 …
## $ premiered <chr> "Spring 1998", "Spring 1998", "Spring 1998", "Spr…
## $ broadcast <chr> "Saturdays at 01:00 (JST)", "Saturdays at 01:00 (…
## $ related <chr> "{'Adaptation': [{'mal_id': 173, 'type': 'manga',…
summary(tidy_anime)
## animeID name title_english title_japanese
## Min. : 1 Length:77911 Length:77911 Length:77911
## 1st Qu.: 3052 Class :character Class :character Class :character
## Median :13667 Mode :character Mode :character Mode :character
## Mean :16863
## 3rd Qu.:31452
## Max. :39197
##
## title_synonyms type source
## Length:77911 Length:77911 Length:77911
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## producers genre studio episodes
## Length:77911 Length:77911 Length:77911 Min. : 1.0
## Class :character Class :character Class :character 1st Qu.: 1.0
## Mode :character Mode :character Mode :character Median : 12.0
## Mean : 15.8
## 3rd Qu.: 13.0
## Max. :3057.0
## NA's :987
## status airing start_date
## Length:77911 Mode :logical Min. :1917-01-01
## Class :character FALSE:76528 1st Qu.:2002-09-01
## Mode :character TRUE :1383 Median :2011-01-22
## Mean :2007-03-14
## 3rd Qu.:2015-09-18
## Max. :2019-02-03
## NA's :238
## end_date duration rating
## Min. :1962-02-02 Length:77911 Length:77911
## 1st Qu.:2005-06-02 Class :character Class :character
## Median :2012-06-02 Mode :character Mode :character
## Mean :2009-03-29
## 3rd Qu.:2016-03-02
## Max. :2019-09-02
## NA's :33824
## score scored_by rank popularity
## Min. : 1.000 Min. : 0 Min. : 1 Min. : 1
## 1st Qu.: 6.360 1st Qu.: 597 1st Qu.: 1530 1st Qu.: 1064
## Median : 7.020 Median : 7130 Median : 3685 Median : 3033
## Mean : 6.894 Mean : 43495 Mean : 4557 Mean : 4567
## 3rd Qu.: 7.550 3rd Qu.: 39876 3rd Qu.: 6724 3rd Qu.: 7394
## Max. :10.000 Max. :1107955 Max. :13838 Max. :15474
## NA's :174
## members favorites synopsis background
## Min. : 6 Min. : 0 Length:77911 Length:77911
## 1st Qu.: 1968 1st Qu.: 2 Class :character Class :character
## Median : 18214 Median : 40 Mode :character Mode :character
## Mean : 85051 Mean : 1468
## 3rd Qu.: 88560 3rd Qu.: 413
## Max. :1610561 Max. :120331
##
## premiered broadcast related
## Length:77911 Length:77911 Length:77911
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
Les données ont déjà été travaillées et tout est déjà sous un format prêt à travailler. Faire attention aux NA’s ici.
PRÉPARER:
data<-tidy_anime%>%
select(name, start_date, score, rating, popularity)%>%
filter(!is.na(start_date) & !is.na(score))%>%
filter(!rating=="None")%>%
distinct()
VISUALISER
#Graphique
gg<-ggplot(data=data, aes(x=popularity, y=score))
gg<-gg + geom_point(size=2, color=alpha("#80FF72", 0.1))
gg<-gg + geom_smooth(size=2.5, color="#E8EBE4")
#ajuster les axes #gg<-gg + scale_y_continuous(breaks=seq(1,7,1), limits = c(1, 7))
gg<-gg + scale_x_continuous(breaks=seq(0, 18000, 2000),limits = c(0, 16000))
#modifier la légende
gg<-gg + theme(legend.position="none")
#modifier le thème
gg<-gg +theme(panel.border = element_blank(),
panel.background = element_rect(fill = "#292E1E", colour = "#292E1E"),
plot.background = element_rect(fill = "#292E1E", colour = "#292E1E"),
panel.grid.major.y= element_blank(),
panel.grid.major.x= element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(size=1, color="#E8EBE4", linetype="solid"),
axis.ticks= element_line(size=0.5, color="#E8EBE4", linetype="solid"))
#ajouter les titres
gg<-gg + labs(title="Existe-t-il une relation entre la popularité et le score des animations et des comics?",
subtitle="Il semblerait que plus les animations sont populaires, c'est-à-dire plus il y a de personnes qui les ont dans leurs listes\npersonnelles, plus le score de l'animation diminue.",
y="Score",
x="Popularité")
gg<-gg + theme(plot.title = element_text(hjust=0,size=17, color="#E8EBE4"),
plot.subtitle = element_text(hjust=0,size=12, color="#E8EBE4"),
axis.title.y = element_text(hjust=1, size=12, color="#E8EBE4"),
axis.title.x = element_text(hjust=0, size=12, color="#E8EBE4"),
axis.text.y = element_text(hjust=0.5, size=10, color="#E8EBE4"),
axis.text.x = element_text(hjust=0.5, size=10, color="#E8EBE4"))
Voici ce que
Voici ce que ça donne:
Tu veux en savoir plus sur ma démarche? Va écouter l’épisode de podcast dans lequel j’explique mes réflexions pour arriver à ce résultat.
Pingback: Podcast ADV #7: Qu’est-ce qu’on fait quand ce qu’on pensait visualiser dans les données ne donne rien de bon? | Johanie Fournier, agr.