TidyTuesdayYarn

TidyTuesday
Author

Willa Van Liew

Tidy Tuesday (2022) Week 44 Analysis: Ravelry Yarns

image from: Jean-Marc Vieregge via Unsplash

Background

Ravelry is one of the largest online fiber works sites available.
I use the platform to find new and exciting patterns to knit for myself and my friends and family.
In October of 2022 Tidy Tuesday shared yarn data available on Ravelry to the data science community for analysis.

The code I reviewed for my analysis comes from GitHub user Alice Walsh

Original Data Set

The Data Available from Ravelry consists of many variables of interest. For my analysis I chose to primarily look at the average rating, yarn weight name and the company that sells the specific yarns.

Show the code
df <- yarn %>%
  select(yarn_company_name, rating_average, yarn_weight_name) %>%
  tidyr::drop_na()

I have been interested in learning more about different yarn companies so I chose to look at which brands had the highest number of perfect ratings in the data set.

I began by filtering only the yarns with a perfect rating and summarized the number of yarns each company had. Some brands only had a few yarns that fit that requirement so I narrowed my scope to only those with more than thirty perfectly rated yarns.

Show the code
high_ratings = df %>%
  filter(rating_average == 5.00) %>%
  group_by(yarn_company_name) %>%
  summarize(totalnumber = n()) %>%
  arrange(desc(totalnumber)) %>%
  filter(totalnumber > 30) 

In order to display the top brands I created a separate dataframe that held only the top three brands.

Show the code
top3 = df %>%
  filter(rating_average == 5.00) %>%
  group_by(yarn_company_name) %>%
  summarize(totalnumber = n()) %>%
  arrange(desc(totalnumber)) %>%
  filter(totalnumber > 30) %>%
  head(3)

My final graphical analysis shows that ColourMart, Ice Yarns, and Lana Grossa have the highest number of 5 point reviews for their yarns

Show the code
ggplot() +
  geom_col(data = high_ratings,aes(y = fct_reorder(yarn_company_name, totalnumber), x = totalnumber), fill = "grey") +
  geom_col(data = top3, aes(y = fct_reorder(yarn_company_name, totalnumber), x = totalnumber), fill = "darkred") +
  labs(
    title= "Which Yarn Company has the highest number of perfect ratings?",
    x = "Total Number of 5 star ratings",
    y = "Company Name"
  ) +
  theme_minimal()

References

Müller, Kirill, and Hadley Wickham. 2023. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2022a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2022b. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2022c. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2022. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Lionel Henry. 2023. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2022. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.