Sport Informatics and Analytics/Pattern Recognition/Using R

From WikiEducator
Jump to: navigation, search
7554750172 b816f274b8 b.jpg


Introduction

This topic develops issues raised in Pattern Recognition, Theme 2 of this course. It starts a conversation about the use of R in sport analytics.

R is a programming language and a software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing.[1]

Kurt Hornik and Friedrich Leisch[2] introduce R in the first edition of the R Newsletter. The R Core Team provide a brief background report about R in that newsletter.[3]

There is a detailed description of R on this Wikipedia page.

There is a vibrant R community on Twitter that includes RStudio and RLadies Global.

Learning about R

Icon reading line.svg
R resources

You can find introductions to R in this resource (including Garrett Grolemund and Hadley Wickham's R for Data Science, Chester Ismay's Getting used to R, RStudio, and R Markdown, and Chester Ismay and Albert Kim's Introduction to Staistical and Data Sciences via R). See, for example, the list of manuals shared by the R Development Core Team that includes a guide to use R for statistical analysis and graphics. There is a link to a comprehensive set of guides to using R prepared by R-bloggers and Garrett Grolemund's RStudio start here resource and R packages. Mark Sellors has provided a Field Guide to the R Ecosystem that is written for a broad and general audience. Stephanie Hicks and Rafael Irizarry[4] provide an example of integrating R in an introduction to data science. Paul Campbell (2018)[5] shares a whirlwind tour of working with data in R. Matt Dancho (2018)[6] produced an interactive cheatsheet for data science workflows with R. R-Ladies Sydney share a range of resources from the group's talks, tutorials and open sharing (see, for example, an introductory unit to "lay down the foundational knowledge and skills that will carry you through your R journey". Data Carpentry provide an example of using R with ecology data. You might find this account of the use of R in a business setting of interest.

Hadley Wickham (2019)[7] has written a guide to Shiny "a framework for creating web applications using R code". The book complements Shiny documentation available.

Hasse Walum and Desiree de Leon (2019)[8] have provided a novel approach to learning ggplot2. Their guide includes ways to customise plots.



Icon activity line.svg
Download and Install R and RStudio

After you have read about R, we suggest you download and install R and RStudio for your use. You can find instructions about how to download and install R on this Comprehensive R Archive Network page and RStudio Desktop on this R Studio page.



Using R in sport contexts

Icon reading line.svg
Examples from sport

You can find information about people using R in sport contexts in this bibliography. Note the use of Github to share data and code. See, for example, the Scottish Hill Races data (more information about these races in this resource). Other examples are: Andy Field's analysis of a football referee's behaviour; Stephanie Kovalchik's analysis of tennis performance and of the use of R in sports analytics; Robert Nguyen's discussion of winning margins in Australian Rules football; Alice Sweeting's investigation into the movement sequences of elite and junior elite netball athletes; and Max Chapman and Jim Albert's analysis of baseball. Todd Scheider developed an interactive shot chart for the NBA, BallR,that combined R and Shiny and "lets you select a player and season, then creates a customizable chart that shows shot patterns across the court. Additionally, it calculates aggregate statistics like field goal percentage and points per shot attempt, and compares the selected player to league averages at different areas of the court"[9]. Alex Bresler created a package for the analysis of NBA data, nbastatR. Steven Wu and Luke Bornn use R to analyse attacking behaviours in basketball with the help of a secondary data set. In addition to these examples, we present three case studies for your consideration. James Curley provides an example of using R with play-by-play data in football. Hannah Frick and Ioannis Kosmidis (2017)[10] have investigated the use of use of GPS-enabled tracking devices and heart rate monitors and have shared their trackeR package that "implements core infrastructure for relevant summaries and visualizations, as well as support for handling units of measurement".[11] FC rSTATS used association football ideas and concepts to introduce R. Jacquie Tran (2018)[12] shared her introduction to sports analytics in R with worked examples from the Winter Olympics and AFLW. Ryo Kakagawara (2018a,[13] 2018b[14]) shared his visualisation of 2018 FIFA World Cup goal scoring data. Luke Benz (2018)[15] shared his ncaahoopR package for working with NCAA basketball play-by-play data. Positive Residual (2019)[16] has shared a range of charts, dashboards and apps with a basketball focus. Luis Verde Arregoitia (2019)[17] has used gganimate to provide visualisations of shot distances in the NBA.

Patrick Ward (2019)[18] shared his approach to the analysis of athlete data in applied sport settings. His post includes examples of R code he uses to analyse data in order to evaluate "whether an athlete has or has not improved in some key performance indicator is critical to understanding the success of a prescribed training or rehabilitation program".

Mara Averick (2019)[19] shared her analysis of NBA advanced metrics using the nbastatR package (developed and maintained by Alex Bresler). Mara supplies her full code for her analysis.

Martin Frigaard and Peter Spangler (2019)[20] described their analysis of data released by the City of Chicago. Their post provides a detailed account of their use of R.

Mitchell O'Hara-Wild (2019)[21] introduced a tsibbledata package that provided a diverse collection of data for use with tidy time series data. Mitchell included twelve sets of data in his release. Two of these are of direct relevance here: nyc_bikes trips; and olympic_running. The former contains individual trips for ten NYC Citi Bikes in 2018. The latter contains the fastest running time for women and men’s 100m - 10000m races in the Olympics.



Icon reading line.svg
Parkrun

Each weekend the Parkrun organisers around the world provide opportunities for people to take part in a 5 kms run. The data from these runs are shared on public websites. As a gentle introduction to the use of R in sport contexts, you might like to have a look at the data shared by Keith Lyons (2019)[22]. The data provide information about 385 parkruns that were under 40 minutes that took place in Braidwood, NSW, in 2018. There is a GitHub repository for the data.



Australian rules football

Icon casestudy line.svg
Mladen Jovanović

In 2015, Mladen Jovanović [23] analysed the data shared in the Pattern Recognition introduction in this unit. He cleaned the data, produced a new .csv file and shared his analysis of the data. He provides a step by step guide that uses a variety of visualisations to bring a raw data file to life.



Icon casestudy line.svg
fitzRoy package

Jacquie Tran (2019) [24] shared her use of the CRAN package, fitzRoy, developed by James Day. The package aims to provide a set of functions to access Australian rules football data and includes tools to process and clean the data. The package includes access to women's Australia rules data (see vignette). Jacquie provided an example of the use of the package and shared how to: install the package; explore the data; and prepare the data. Her worked examples arise from attendance data at games.



Netball

Icon casestudy line.svg
Movement sequences

In 2017, Alice Sweeting completed her PhD titled Discovering the Movement Sequences of Elite and Junior Elite Netball Athletes[25]. Alice wrote a number of posts on her Sport Statistics R Sweet blog about her introduction to R and her use of R to visualise data. You might find her insights helpful as you explore R for use in your work contexts. See, for example, Alice's introduction to R and her basic analysis of athlete load[26] and her discussion of k-means clustering.[27]



Association football

Icon casestudy line.svg
Performance outcomes

Thomas Loridan used R to build predictive models of performance outcomes in association football. He shared his models on his blog site, téouch analytics[28]. We suggest you explore Thomas's posts on his blog to get a feel for how someone passionately interested in association football uses their professional insights (in Thomas's case as a catastrophe risk modeler[29]) to model performance. In 2017, Thomas wrote six posts to share his model process: feature engineering[30]; assessing feature importance[31]; building and testing a model[32]; tuning a prediction model[33]; betting on football[34]; next steps[35]. Thomas's aim in writing his six posts was "to provide the sort of basic concepts I would have liked to read about when I started with this hobby"[36].



Icon casestudy line.svg
Social network analysis

Robbie Wilson and his colleagues (2017)[37] investigated whether skill or athleticism predict individual variation in match performance of football players. They report their use of the CRAN package igraph to quantify individual player performance and connectedness in a team. Data were gathered from ten games. (For an extended discussion of the use of network visualisation, see Sam Tyner, François Briatte and Heike Hofmann's (2017)[38].)



Icon casestudy line.svg
Data sets

James Curley is the author of the engsoccerdata[39] R package that is available on CRAN. The package provides a repository of complete football data sets and includes data from English football from 1888 onwards. In 2015, James created a number of tutorials for the data sets:

One example of James using his data sets is a 2014 article written for FiveThirtyEight on the 13,475 games (in 126 years) that resulted in a 0v0 scorline.



Icon casestudy line.svg
Expected goals

Ewen [40] shared his understatr package and the ways in which he uses R to collect data about expected goals. Ewen's scripts shares his open source scripts in his GitHub repository.

The Last Man Analytics (2019)[41] provided a tutorial on getting started with free StatsBomb event data.



Icon casestudy line.svg
English Premier League predictions

Ben Torvaney (2019)[42] shared his approach to predicting English Premier League outcomes in the 2018-2019 season. He used Monte Carlo simulation to do so. You might find it interesting to look at Ben's approach and his use of his R package regista



Icon casestudy line.svg
Plotting football outputs

Ben Torvaney (2019)[43] provided a GitHub repository that shared pitch mapping in football. ggsoccer is available via CRAN. As Ben indicated, ggsoccer is one of a number of pitch mapping tools. Three other examples, shared by Ben, are (each on GitHub): soccermatics; SBpitch; fc.rstats.



Cricket

Icon casestudy line.svg
CricketR and YorkR

Tinniam Ganesh is the author of the cricketr[44] and yorkr[45] R packages that are available on CRAN. Tinniam has produced two books about his use of R to analyse cricket[46]. For an example of Tinnesh's use of cricketR, see his analysis of allrounders[47]. For an example of Tinnesh's use of yorkR, see his analysis of IPL games in 2017[48].



Basketball

Icon casestudy line.svg
Expected possession values and interactive visualisation of shots

Daniel Cervone, Alex D'Amour, Luke Born and Kirk Goldsberry (2014)[49] "propose a framework for using optical player tracking data to estimate, in real time, the expected number of points obtained by the end of a possession". They share a data sample and R code on Github[50] for those interested to explore their model and results.

Todd Schneider (2016)[51] built a BallR tool with a Shiny framework to develop an interactive resource that "enables you to select a player and season, then creates a customizable chart that shows shot patterns across the court". The tool calculates aggregate statistics (such as, field goal percentage, points per shot attempt) and compares individual players (with data back to 1996) to league averages in different areas of the court. His blog post[52] leads the reader through the use of BallR.

Luis Verde Arregoita (2019a)[53] shared his workflow to present grid-based richness metrics to visualise shots. His post includes all the code he used to do this. For an example of Luis's gganimate approach to shot data, see Luis Verde Arregoita (2019b)[54]



Icon casestudy line.svg
March Madness

Each year, the NCAA's women's and men's basketball championships are the focus of intensive analysis. The championships are single-elimination tournaments. In 2019, Kaggle hosted a machine learning competitions for the women's and men's tournaments. Both competitions had two parts: the use of past results to build and test models; and the forecast of the outcomes of tournament play. Participants in the competition had access to historical and play-by-play data.

The scale of the task facing participants in the machine learning challenges was noted by Neil Greenberg (2019)[55]:

If you want a sure thing in your men’s NCAA tournament pool, you’ll need to fill out the 9,223,372,036,854,775,808 brackets necessary to guarantee a winner.

Despite this, there is enormous interest in predicting the outcomes of the tournaments. For an example of how this prediction can take place in R, you might find it informative to read Sam Firke's (2019)[56] approach to basketball prediction and Dan Brooks and Keith Folsom's (2016)[57] project. In doing so, you might like to consider Michael Lopez and Gregory Matthews' (2014)[58] discussions of building a predictive model after their experiences of winning the 2014 Kaggle machine learning competition.



Tennis

Icon casestudy line.svg
Monitoring fatigue

Stephanie Kovalchik uses R to provide data-driven stories for men's and women's tennis. One example of her work is her analysis of tennis players' fatigue levels. Stephanie introduces her methodology in a discussion of serving in men's tennis[59]. In a second post[60], she used data from the 2017 US Open to investigate if fatigue is cumulative as evidenced by service speeds during games. We recommend you read both articles on fatigue. Stephanie's approach uses insights gained from baseball[61] and the use of a dose-response model[62] with an R extension package drc. You might also find it interesting to read about Stephanie's use of R and explore the data she used at an R day in Cape Town, South Africa in 2018[63][64] and at the UseR conference in Brisbane in the same year.[65]



Salaries in sport

Icon casestudy line.svg
How much do you get paid?

Jacquie Tran[66] used questionnaire data collected by Aiden Oakley to investigate salary levels for sport science and strength and conditioning coaches in association football in the United Kingdom. The data aggregated responses from 100 respondents. Jacquie used R to analyse and visualise these data mindful of the sensitivity required in such reporting. Her csv file for this analysis was made available on Github. We recommend you have a look at Jacquie's account of the process she used with R for these data.



Strava

Icon casestudy line.svg
Strava

Marcus Voltz has used R to analyse and visualise Strava data. His Github account provides details of his R package Strava. David Smith's post (2018)[67] about Marcus's Strava package provided a number of examples of the use of the Strava package in R and visualisation in ggplot2.



Olympic medals

Icon casestudy line.svg
Data scraping

Tyler Rinker (2018) [68] shared the process he used to scrape data from the wikipedia page for the 2018 Winter Olympics "and subsequently used the tools of the tidyverse to get the data in a format in which they can be analyzed". Note Tyler's concluding remarks:

A disadvantage of using web data as a source, is that the layout of the data might change. My pipeline broke several times, because changes were made to the wiki. Because of this, it is not assured this code will run in future times. For this example I kept the pipeline live, because I wanted to do this blog post including the scraping. However, it would have saved me a good deal of trouble if I stored the set in a csv the first time I had a proper version of the data.[69]

Tyler (2014)[70] used scraping methods for the Sochi Winter Olympics too. You might find it interesting to trace back Tyler's approach to a StackOverflow discussion about the 2012 Summer Olympic Games' medal table as an example of how open sharing supports personal learning.



Extreme skiing and snowboarding

Icon casestudy line.svg
Collecting data

Matthew Oldach (2018) [71] shared the process he used to gather data about performances in the Freeride world tour 1996 to 2018. Matt collected data from Twitter with the TwitteR package and from web scraping the Freeride world tour rankings page. We suggest you look at Matt's transparent account of his process if you are considering creating your own database of performances in a sport of your choice. His R code is shared on Github.



Baseball

Icon casestudy line.svg
Analysing baseball data

Bill Petti (2015) [72] has provided a comprehensive introduction to the use of R to analyse baseball data sets. His R code is shared on Github and includes an introduction to the use of R. He is the author of the CRAN package baseballr.



NFL

Icon casestudy line.svg
Fantasy football

Isaac Petersen [73] shared his NFL data and the ways in which he uses R to calculate fantasy NFL projections using a 'wisdom of the crowd' approach. Issac's scripts are made available as open source scripts and are shared in his GitHub repository.



Ice hockey

Icon casestudy line.svg
NHWL play-by-play data

Jake Flancer [74] shared his nwhlR package to work with National Women's Hockey League play-by-play data. The package has five functions: scrape for game ids (date and schedule); scrape for play-by-play; compile play-by-play (player summary, team summary.



Visualising data with R

One of the options you have with R is to visualise your data. R has a number of functions and libraries to support your visualisations.

If you would like to explore the potential of R to visualise data, you might find Remko Duursma, Jeff Powell and Glenn Stone's (2017)[75] introduction to learning R very helpful. Their Chapter 4 refers explicitly to visualizing data and the use of RStudio and includes discussion of: scatterplot; bar plot; histogram; curves; pie chart; box and whisker plot; and symbols.

A powerful visualisation tool in R is ggplot2[76].

ggplot2 was inspired by Leland Wilkinson's (1999) The Grammar of Graphics[77] and is available as a CRAN package in R and RStudio.

Edwin Chen (2012)[78] provides "a bare-bones introduction to ggplot2" that "assumes no knowledge of R". A definitive introduction to ggplot2 is provided by Hadley Wickham (2016)[79].

Icon reading line.svg
An introduction to R and ggplot2

Analysing Scottish Hill Race Data with R provides a basic introduction to R and ggplot2. This resource uses the data published by Anthony Atkinson in 1986[80] and discusses some of the uses made of these data subsequently in the R literature. ggplot2 is presented as a visualisation resource that has implications for the discussions in this course about audiences and messages. Kieran Healey (2017)[81] shared a practical introduction to R and ggplot2 as did John MacKintosh (2016)[82]. You might also find Garrett Grolemund's introduction to tidyverse of interest too. For additional discussion see the section on ggplot2 in the Visualisation topic. See too, Sam Tyner, François Briatte and Heike Hofmann's (2017)[83] introduction to network visualisation with ggplot2 that includes examples from American football and bike sharing. See also, Chris Fry's (2015)[84] discussion of graphing in R to visualise field hockey data. Asmae Toumi (2018)[85] shared her insights into using R and ggplot2 and provided her NHL data[86] to accompany her presentation at the 2018 MIT Sloan workshop.

For generic discussions about ggplot2, you might find Emil Hvitfeldt's (2018)[87] post and Danielle Navarro's (2019)[88] presentation on ggplot2 and gganimate helpful.



R as an ePortfolio resource

Icon inter line.svg
Using some of the functionality of R

R has some interesting resources to help you build your ePortfolio for the course. Blogdown provides a blog platform for you to use. Alison Hill (2017) shared this guide to Blogdown. Larie Byrd (2018)[89] used Blogdown for her post about machine learning. David Robinson (2017[90]) provided advice to aspiring data scientists about blogging about their work. If you are contemplating adding GIFs to your blogdown posts, you might find this article of interest. Maelle Salmon (2018)[91] discussed the connections that could be made between blogging and the R community. For a wider discussion of why you might find blogging helpful in your development, see William Koehrsen's (2018)[92] post.

This brief discussion of computer vision, for example, made use of the R Markdown functionality in RStudio to create an open educational resource hosted by RPubs. If you would like to extend your use of R you might find Yihui Xie's guides to authoring books and technical documents with R Markdown and bookdownplus of interest. Jenny Bryan has a comprehensive guide to Markdown too. You might find the learnr package of interest if you a considering creating interactive tutorials from R Markdown documents. SportSciData (2019a[93],2019b[94] shared a detailed account of using Markdown to create interactive reports. Part 1 (2019a)[95] looked at interactive reports. Part 2 (2019b)[96] looked at visualisation.

If you use Google Docs, you might find Aleszu Bajak's tutorial[97] on how to convert a Google Doc into R Markdown helpful. Jenny Bryan and Lucy McGowan have created a Google Drive CRAN package.

This section of the course has its own Markdown document stored on RPubs as an example of the use of some basic Markdown functionality. Neil Collins published a number of guides to R Markdown[98] on his blog.

You might consider using Shiny, an R package that enables you to build interactive web apps straight from R to share your data. Martin Monkman (2017)[99][100], for example, shared a Shiny app for per-game baseball data (from 1901 to 2016) and Scott Davis (2018)[101] used Shiny to share data from the 2018 NBA basketball final series. For a discussion of integrating RMarkdown and Shiny, you might like to have a look at Chris Berndsen's (2018)[102] video introduction.

If you are thinking of writing papers about your work, you might find Przemysław Biecek and Marcin Kosiński's (2017)[103][104] discussion of an R package, archivist, of interest. Archivist is a package for managing, recording and restoring data analysis results transparently and contributes to the reproducibility of your research. You might find Nan Xiao's work of interest too. He has discussed processes for persistent reproducible research using a range of R tools.[105][106]

James Turnbull (2018)[107] discussed the ways in which documentation acts as a gateway for open source activity.

Matti Vuorre and James Curley (2018)[108] have discussed curating research assets in their report of using the Git version control system.

Alice Sweeting (2019)[109] has provided an example pf how Markdown can be used to create a variety of resources to share practice openly.



References

  1. Hornik, Kurt; Leisch, Friedrich (November 26, 2015). "R FAQ". https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-is-R_003f. Retrieved 9 February 2016.
  2. Hornik, Kurt; Leisch, Friedrich (1 January, 2001). "Editorial". R-project. https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf. Retrieved 9 February 2016.
  3. The R Core Team (1 January, 2001). "What is R?". R-project. https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf. Retrieved 9 February 2016.
  4. Hicks, Stephanie; Irizarry, Rafael (2016). "A Guide to Teaching Data Science". https://arxiv.org/ftp/arxiv/papers/1612/1612.07140.pdf.
  5. Campbell, Paul (September 2018). "A whirlwind tour of working with data in R". https://paulc91.github.io/intro_to_r/#1. Retrieved 23 September 2018.
  6. Dancho, Matt (4 November 2018). "New R cheatsheet: data science workflow with R". https://www.business-science.io/learning-r/2018/11/04/data-science-r-cheatsheet.html. Retrieved 5 November 2018.
  7. Wickham, Hadley (August 2019). "Mastering Shiny". https://mastering-shiny.org/. Retrieved 14 August 2019.
  8. Walum, Hasse; De Leon, Desiree (August 2019). "Introduction". https://tinystats.github.io/teacups-giraffes-and-statistics/02_bellCurve.html. Retrieved 15 August 2019.
  9. Schneider, Todd (2016). https://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/. Retrieved 18 October 2017.
  10. Frick, Hannah; Kosmidis, Ioannis (2017). "trackeR: Infrastructure for Running and Cycling Data from GPS-Enabled Tracking Devices in R". Journal of Statistical Software 82 (7).
  11. Frick, Hannah; Kosmidis, Ioannis (2017). "trackeR: Infrastructure for Running and Cycling Data from GPS-Enabled Tracking Devices in R". Journal of Statistical Software 82 (7): 1.
  12. Tran, Jacquie (15 February 2018). "Sport analytics in R". https://jacquietran.neocities.org/acu-gcpa-2018-02/presentation.html. Retrieved 15 February 2018.
  13. Nakagawara, Ryo (4 July 2018). https://datascienceplus.com/visualize-the-world-cup-with-r-part-1-recreating-goals-with-ggsoccer-and-ggplot2/. Retrieved 8 August 2018.
  14. Nakagawara, Ryo (6 August 2018). https://www.r-bloggers.com/animating-the-goals-of-the-world-cup-comparing-the-old-vs-new-gganimate-and-tweenr-api/. Retrieved 8 August 2018.
  15. Benz, Luke. https://github.com/lbenz730/ncaahoopR. Retrieved 8 August 2018.
  16. Postive Residual (2019). "Portfolio". https://positiveresidual.com/. Retrieved 7 January 2019.
  17. Arregoitia, Luis (January 2019). "Animate shot distances for NBA games". https://luisdva.github.io/rstats/bball-shots/. Retrieved 7 January 2019.
  18. Ward, Patrick (20 January 2019). "A Simple Approach to Analyzing Athlete Data in Applied Sports Science". http://optimumsportsperformance.com/blog/testing-syntax-highlighter-evolved/. Retrieved 21 January 2019.
  19. Averick, Mara (27 February 2019). "NBA Advanced Metrics". http://rpubs.com/maraaverick/470388. Retrieved 28 February 2019.
  20. Frigaard, Martin; Spangler, Peter (7 May 2019). "Exploring Chicago rideshare data in R". http://www.storybench.org/exploring-chicago-rideshare-data/. Retrieved 9 May 2019.
  21. O'Hara-Wild, Mitchell (17 June 2019). "Introducing tsibbledata". https://www.mitchelloharawild.com/blog/tsibbledata/. Retrieved 15 June 2019.
  22. Lyons, Keith (5 January 2019). "Braidwood Showground Parkruns 2018". https://keithlyons.me/blog/2019/01/05/braidwood-showground-parkruns-2018/. Retrieved 5 January 2019.
  23. Jovanović, Mladen (13 March 2015). "AFL Data Analysis Report". http://complementarytraining.net/wp-content/uploads/2015/03/AFL_Analysis.html. Retrieved 26 March 2016.
  24. Tran, Jacquie (12 January 2019). "Getting to know the fitzRoy package (AFL game statistics". https://underthehood.jacquietran.com/2019/01/12/getting-to-know-the-fitzroy-package-afl-game-statistics/. Retrieved 13 January 2019.
  25. Sweeting, Alice (2017). "Discovering the Movement Sequences of Elite and Junior Elite Netball Athletes" (PhD). Institute of Sport, Exercise and Active Living, Victoria University, Melbourne, Australia. http://trove.nla.gov.au/work/227110648?q&versionId=249204357. Retrieved 18 July 2017.
  26. Sweeting, Alice (11 June 2016). "Introduction to R and A Basic Analysis of Athlete Load". https://sportstatisticsrsweet.wordpress.com/2016/06/. Retrieved 18 July 2017.
  27. Sweeting, Alice (29 January 2018). "k-means Clustering in R". https://sportstatisticsrsweet.wordpress.com/2018/01/29/k-means-clustering-in-r/. Retrieved 30 January 2018.
  28. Loridan, Thomas. "téouch analytics". https://teouchanalytics.wordpress.com/. Retrieved 8 September 2017.
  29. Loridan, Thomas. "Google Scholar Profile". https://scholar.google.com.au/citations?user=VVRMn3cAAAAJ&hl=en. Retrieved 8 September 2017.
  30. Loridan, Thomas. "Episode 1: feature engineering (and some data to play with". https://teouchanalytics.wordpress.com/2017/07/08/episode-1-feature-engineering-and-some-data-to-play-with/. Retrieved 8 September 2017.
  31. Loridan, Thomas. "Episode 2: Assessing feature importance". https://teouchanalytics.wordpress.com/2017/07/10/episode-2-assessing-feature-importance/. Retrieved 8 September 2017.
  32. Loridan, Thomas. "Episode 3: Building and testing a predictive model". https://teouchanalytics.wordpress.com/2017/07/13/episode-3-building-and-testing-a-predictive-model/. Retrieved 8 September 2017.
  33. Loridan, Thomas. "Episode 4: Tuning a football predictive model with caret". https://teouchanalytics.wordpress.com/2017/07/18/tuning-a-football-prediction-model-with-caret/. Retrieved 8 September 2017.
  34. Loridan, Thomas. "Episode 5: how to bet on football using a prediction model". https://teouchanalytics.wordpress.com/2017/07/21/episode-5-how-to-bet-on-football-using-a-prediction-model/. Retrieved 8 September 2017.
  35. Loridan, Thomas. "Episode 6: where to from here?". https://teouchanalytics.wordpress.com/2017/08/04/episode-6-where-to-from-here/. Retrieved 8 September 2017.
  36. Loridan, Thomas. "Episode 6: where to from here?". https://teouchanalytics.wordpress.com/2017/08/04/episode-6-where-to-from-here/. Retrieved 8 September 2017.
  37. Wilson, Robbie et al (2017). "Skill not athleticism predicts individual variation in match performance of soccer players". Proceedings of the Royal Society B Biological Sciences 284(1869).
  38. Tyner, Sam; Briatte, François; Hofmann, Henke (2017). "Network Visualization with ggplot2". The R Journal 9(1).
  39. Curley, James. "Introducing engsoccerdata". https://github.com/jalapic/engsoccerdata. Retrieved 8 November 2017.
  40. . https://ewen.io/2018/12/10/understatr/. Retrieved 12 December 2018.
  41. "#15: Getting Started with Free StatsBomb Event Data – xG Shot Map Tutorial". 16 June 2019. https://thelastmananalytics.home.blog/2019/06/16/15-getting-started-with-free-statsbomb-event-data-xg-shot-map-tutorial/. Retrieved 18 June 2019.
  42. Torvaney, Ben (1 January 2019). https://stats-and-snakeoil.herokuapp.com/2019/01/01/predicting-the-premier-league-with-dixon-coles/. Retrieved 12 December 2018.
  43. Torvaney, Ben (6 August 2019). ggsoccer. https://github.com/Torvaney/ggsoccer. Retrieved 7 August 2019.
  44. Ganesh, Tinniam. "Introducing cricketr! : An R package to analyze performances of cricketers". https://gigadom.wordpress.com/2015/07/04/introducing-cricketr-a-r-package-to-analyze-performances-of-cricketers/. Retrieved 25 October 2017.
  45. Ganesh, Tinniam. "The making of cricket package yorkr – Part 1". https://gigadom.wordpress.com/2016/03/05/the-making-of-cricket-package-yorkr-part-1-2/. Retrieved 25 October 2017.
  46. Ganesh, Tinniam. "More book, more cricket! 2nd edition of my books now on Amazon". https://gigadom.wordpress.com/2017/03/26/more-book-more-cricket-2nd-edition-of-my-books-now-on-amazon/. Retrieved 25 October 2017.
  47. Ganesh, Tinniam. "cricketr sizes up legendary All-rounders of yesteryear". https://gigadom.wordpress.com/2016/09/10/cricketr-sizes-up-legendary-all-rounders-of-yesteryear/. Retrieved 25 October 2017.
  48. Ganesh, Tinniam. "Analysis of IPL T20 matches with yorkr templates". https://gigadom.wordpress.com/2017/03/04/analysis-of-ipl-t20-matches-with-yorkr-templates/. Retrieved 25 October 2017.
  49. Cervone, Daniel et al (4 August 2014). "A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes". https://arxiv.org/pdf/1408.0777.pdf. Retrieved 21 November 2017.
  50. Cervone, Daniel. "EPVDemo". https://github.com/dcervone/EPVDemo. Retrieved 21 November 2017.
  51. Schneider, Todd (8 March 2016). "BallR: Interactive NBA Shot Charts with R and Shiny". http://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/. Retrieved 4 April 2018.
  52. Schneider, Todd (8 March 2016). "BallR: Interactive NBA Shot Charts with R and Shiny". http://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/. Retrieved 4 April 2018.
  53. Arregoita, Luis (14 February 2019). "Quantifying point overlap for NBA shot chart data". https://luisdva.github.io/rstats/nba-overlap/. Retrieved 27 February 2019.
  54. Arregoita, Luis (9 January 2019). "Animate shot distances for NBA games". https://luisdva.github.io/rstats/bball-shots/. Retrieved 27 February 2019.
  55. Greenberg, Neil (18 March 2019). "2019 NCAA tournament: The perfect bracket to win your March Madness pool". https://www.washingtonpost.com/sports/2019/03/18/ncaa-tournament-perfect-bracket-win-your-march-madness-pool/. Retrieved 19 March 2019.
  56. Firke, Sam (18 March 2019). "Predicting March Madness". https://github.com/sfirke/predicting-march-madness. Retrieved 19 March 2019.
  57. Brooks, Dan; Folsom, Keith (11 May 2016). "Predicting March Madness". https://rstudio-pubs-static.s3.amazonaws.com/180553_8d12f96839b74f4aa3b562beb54dff25.html. Retrieved 20 March 2019.
  58. Lopez, Michael; Matthews, Gregory (30 November 2014). "Building an NCAA men's basketball predictive model and quantifying its success". https://arxiv.org/abs/1412.0248. Retrieved 19 March 2019.
  59. Kovalchik, Stephanie (13 October 2017). "Measuring Match Fatigue". http://on-the-t.com/2017/10/13/fatigue-effects/. Retrieved 9 December 2017.
  60. Kovalchik, Stephanie (20 October 2017). "Is Fatigue Cumulative?". http://on-the-t.com/2017/10/20/cumulative-fatigue-effects/. Retrieved 9 December 2017.
  61. Burris, Kyle (7 September 2017). "Relief-Fatigue". https://github.com/burrisk/Relief-Fatigue. Retrieved 9 December 2017.
  62. Ritz, Christian et al (2015). "Dose-Response Analysis Using R". PLoS ONE 10(12).
  63. Kovalchik, Stephanie (18 March 2018). "Cape Town celebrates R and tennis data science at satRday". http://on-the-t.com/2018/03/16/satrday-capetown/. Retrieved 24 March 2018.
  64. Kovalchik, Stephanie (18 March 2018). "satRday". https://github.com/skoval/satRday. Retrieved 24 March 2018.
  65. Kovalchik, Stephanie (10 July 2018). "Material from 2018 UseR Conference: Statistical Models for Sport in R". https://github.com/skoval/UseR2018. Retrieved 24 July 2018.
  66. Tran, Jacquie (2 January 2018). "How much do you get paid? Part I - An initial exploration". http://underthehood.jacquietran.com/2018/01/02/how-much-do-you-get-paid-part-1/. Retrieved 3 January 2018.
  67. Smith, David (23 January 2018). http://blog.revolutionanalytics.com/2018/01/strava-visualization.html. Retrieved 24 January 2018.
  68. Rinker, Tyler (20 March 2018). "Building the Olympics blog: tidy data preparation". https://edwinth.github.io/olympics-dataprep/. Retrieved 22 March 2018.
  69. Rinker, Tyler (20 March 2018). "Building the Olympics blog: tidy data preparation". https://edwinth.github.io/olympics-dataprep/. Retrieved 22 March 2018.
  70. Rinker, Tyler (9 February 2014). "Sochi Olympic Medals". https://trinkerrstuff.wordpress.com/2014/02/09/sochi-olympic-medals-2/. Retrieved 22 March 2018.
  71. Oldach, Matthew (8 May 2018). "Analyzing extreme skiing and snowboarding in R: Freeride World Tour 1996–2018". https://medium.com/@MattOldach_65321/analyzing-extreme-skiing-and-snowboarding-in-r-freeride-world-tour-1996-2018-ffde401fb3ae. Retrieved 10 May 2018.
  72. Petti, Bill (21 September 2015). "A Short(-ish) Introduction to Using R Packages for Baseball Research". https://www.fangraphs.com/tht/a-short-ish-introduction-to-using-r-for-baseball-research/. Retrieved 2 June 2018.
  73. Petersen, Isaac. "Fantasy Football Analytics". https://fantasyfootballanalytics.net/. Retrieved 6 September 2018.
  74. . https://github.com/jflancer/nwhlR. Retrieved 12 December 2018.
  75. Duursma, Remko; Powell, Jeff; Stone, Glenn (28 August 2017). https://www.westernsydney.edu.au/__data/assets/pdf_file/0011/830909/Rnotes_20170828_web.pdf. Retrieved 26 November 2017.
  76. Wickham, Hadley (2011). "ggplot2". WIREs Computational Statistics 3 (2): 180-185.
  77. Wickham, Hadley (2007). http://ggplot2.org/resources/2007-past-present-future.pdf. Retrieved 26 November 2017.
  78. Chen, Edwin (17 January 2012). http://blog.echen.me/2012/01/17/quick-introduction-to-ggplot2/. Retrieved 26 November 2017.
  79. Wickham, Hadley (2016). ggplot2: Elegant Graphics for Data Analysis. Berlin: Springer.
  80. Atkinson, Anthony (1986). "Comment: Aspects of Diagnostic Regression Analysis". Statistical Science 1(3): 379-402.
  81. Healy, Kieran (2017). "Data Visualization for Social Science: A practical introduction with R and ggplot2". http://socviz.co/index.html. Retrieved 9 December 2017.
  82. MacKintosh, John (16 May 2016). "Intro to ggplot2". https://cdn.rawgit.com/johnmackintosh/ggplot2_demo/a18cc631/pres.html#1. Retrieved 22 February 2018.
  83. Tyner, Sam; Briatte, François; Hofmann, Henke (2017). "Network Visualization with ggplot2". The R Journal 9(1).
  84. Fry, Chris (9 April 2015). "Graphing in R". https://chrisfryperformanceanalyst.wordpress.com/2015/04/09/graphing-in-r/. Retrieved 21 February 2018.
  85. Toumi, Asmae (February 2018). "R for data visualization". https://docs.google.com/presentation/d/1f5PGhzkW0ouqvtow9JbnpNe9AKATKXJac5CLV7JSWbU/edit#slide=id.gc6f90357f_0_0. Retrieved 25 February 2018.
  86. Toumi, Asmae (February 2018). "R for data visualization". https://drive.google.com/drive/folders/1A-yoLHJ7VJHlo0QL28LMDg0CGogF6xeq. Retrieved 25 February 2018.
  87. Hvitfeldt, Emil (12 June 2018). "ggplot2 trial and error - US trade data". https://www.hvitfeldt.me/2018/06/ggplot2-trial-and-error-us-trade-data/. Retrieved 14 June 2018.
  88. Navarro, Danielle (6 April 2019). "Data visualisation in R". https://djnavarro.github.io/satrdayjoburg/. Retrieved 7 April 2019.
  89. Byrd, Larie (8 February 2018). "The First (and Namesake) Post: Is It Cake?". https://aczane.netlify.com/2018/02/08/the-first-and-namesake-post-is-it-cake/. Retrieved 10 February 2018.
  90. Robinson, David (14 November 2017). "Advice to aspiring data scientists: start a blog". http://varianceexplained.org/r/start-blog/. Retrieved 15 February 2018.
  91. Salmon, Maelle (15 March 2018). "Get on your soapbox!". http://www.masalmon.eu/rladiesct/slides#1. Retrieved 16 March 2018.
  92. Koehrsen, William (11 August 2018). "The most important part of a data science project is writing a blog post". https://towardsdatascience.com/the-most-important-part-of-a-data-science-project-is-writing-a-blog-post-50715f37833a. Retrieved 15 August 2018.
  93. SportSciData (4 April 2019). "How to Create Interactive Reports with R Markdown Part I:". https://www.sportscidata.com/2019/04/04/how-to-create-interactive-reports-with-r-markdown-part-i/. Retrieved 16 April 2016.
  94. SportSciData (12 April 2019). "How to Create Interactive Reports in R Markdown Part II: Data Visualisation". https://www.sportscidata.com/2019/04/12/using-data-visualisation-in-r-markdown/. Retrieved 16 April 2016.
  95. SportSciData (4 April 2019). "How to Create Interactive Reports with R Markdown Part I:". https://www.sportscidata.com/2019/04/04/how-to-create-interactive-reports-with-r-markdown-part-i/. Retrieved 16 April 2016.
  96. SportSciData (12 April 2019). "How to Create Interactive Reports in R Markdown Part II: Data Visualisation". https://www.sportscidata.com/2019/04/12/using-data-visualisation-in-r-markdown/. Retrieved 16 April 2016.
  97. Bajak, Aleszu (25 August 2017). "How to convert a Google Doc to RMarkdown and publish on Github pages". http://www.storybench.org/convert-google-doc-rmarkdown-publish-github-pages/. Retrieved 15 November 2017.
  98. Collins, Neil. "How to Create Reports In R Markdown I: Data Tables". https://www.sportscidata.com/2019/04/04/how-to-create-interactive-reports-with-r-markdown-part-i/. Retrieved 17 June 2019.
  99. Monkman, Martin. "Per-game run scoring by league". https://monkmanmh.shinyapps.io/MLBrunscoring_shiny/. Retrieved 17 February 2018.
  100. Monkman, Martin (26 March 2017). "Updated Shiny app". https://bayesball.blogspot.com.au/2017/03/updated-shiny-app.html. Retrieved 17 February 2018.
  101. Davis, Scott (9 June 2018). "NBA Finals Gamecast Summary". https://sdavis.shinyapps.io/NBAFinals/. Retrieved 10 June 2018 2018.
  102. Berndsen, Chris (8 March 2018). "Introduction to RMarkdown and Shiny". https://youtu.be/O04l-LpmoE8. Retrieved 13 March 2018.
  103. Biecek, Przemysław; Kosiński, Marcin (2017). "archivist: An R Package for Managing, Recording and Restoring Data Analysis Results". Journal of Statistical Software 82(11): 10.18637/jss.v082.i11.
  104. Biecek, Przemysław (14 December 2017). "archivist: Boost the reproducibility of your research". http://smarterpoland.pl/index.php/2017/12/boost-the-reproducibility-of-your-research-with-archivist/. Retrieved 16 December 2017.
  105. Xiao, Nan (20 May 2017). "Persistent Reproducible Reporting with Docker and R". https://nanx.me/talks/#talk-chinar-2017. Retrieved 31 July 2018 2018.
  106. Xiao, Nan (30 July 2018). "liftr: an R Package for Persistent Reproducible Research". https://nanx.me/talks/#talk-jsm-2018. Retrieved 31 July 2018.
  107. Turnbull, Jamres (August 2018). "Documentation as a gateway to open source". https://increment.com/documentation/documentation-as-a-gateway-to-open-source/. Retrieved 10 August 2018.
  108. Vuorre, Matti; Curley, James (11 April 2018). "Curating Research Assets: A Tutorial on the Git Version Control System". Advances in Methods and Practices in Psychological Science https://doi.org/10.1177/2515245918754826.
  109. Sweeting, Alice (29 January 2019). "A little about me…". https://sportstatisticsrsweet.rbind.io/#about. Retrieved 29 January 2019.