Sport Informatics and Analytics/Pattern Recognition/Using R

Introduction
This topic develops issues raised in Pattern Recognition, Theme 2 of this  course. It starts a conversation about the use of R in sport analytics.

R is a programming language and a software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing.

Kurt Hornik and Friedrich Leisch introduce R in the first edition of the R Newsletter. The R Core Team provide a brief background report about R in that newsletter.

There is a detailed description of R on this Wikipedia page.

There is a vibrant R community on Twitter that includes RStudio and RLadies Global.

Using R in sport contexts
{{IDevice Positive Residual (2019) has shared a range of charts, dashboards and apps with a basketball focus. Luis Verde Arregoitia (2019) has used gganimate to provide visualisations of shot distances in the NBA.
 * theme=Line
 * type=Reading
 * title=Examples from sport
 * body=You can find information about people using R in sport contexts in this bibliography. Note the use of Github to share data and code. See, for example, the Scottish Hill Races data (more information about these races in this resource). Other examples are: Andy Field's analysis of a football referee's behaviour; Stephanie Kovalchik's analysis of tennis performance and of the use of R in sports analytics; Robert Nguyen's discussion of winning margins in Australian Rules football; Alice Sweeting's investigation into the movement sequences of elite and junior elite netball athletes; and Max Chapman and Jim Albert's analysis of baseball. Todd Scheider developed an interactive shot chart for the NBA, BallR,that combined R and Shiny and "lets you select a player and season, then creates a customizable chart that shows shot patterns across the court. Additionally, it calculates aggregate statistics like field goal percentage and points per shot attempt, and compares the selected player to league averages at different areas of the court" . Alex Bresler created a package for the analysis of NBA data, nbastatR. Steven Wu and Luke Bornn use R to analyse  attacking behaviours in basketball with the help of a secondary data set. In addition to these examples, we present three case studies for your consideration. James Curley provides an example of using R with play-by-play data in football. Hannah Frick and Ioannis Kosmidis (2017) have investigated the use of  use of GPS-enabled tracking devices and heart rate monitors and have shared their trackeR package that "implements core infrastructure for relevant summaries and visualizations, as well as support for handling units of measurement". FC rSTATS used association football ideas and concepts to introduce R. Jacquie Tran (2018) shared her introduction to sports analytics in R with worked examples from the Winter Olympics and AFLW. Ryo Kakagawara (2018a, 2018b ) shared his visualisation of 2018 FIFA World Cup goal scoring data. Luke Benz (2018) shared his ncaahoopR package for working with NCAA basketball play-by-play data.

Patrick Ward (2019) shared his approach to the analysis of athlete data in applied sport settings. His post includes examples of R code he uses to analyse data in order to evaluate "whether an athlete has or has not improved in some key performance indicator is critical to understanding the success of a prescribed training or rehabilitation program".

Mara Averick (2019) shared her analysis of NBA advanced metrics using the nbastatR package (developed and maintained by Alex Bresler). Mara supplies her full code for her analysis.

Martin Frigaard and Peter Spangler (2019) described their analysis of data released by the City of Chicago. Their post provides a detailed account of their use of R.

Mitchell O'Hara-Wild (2019) introduced a tsibbledata package that provided a diverse collection of data for use with tidy time series data. Mitchell included twelve sets of data in his release. Two of these are of direct relevance here: nyc_bikes trips; and olympic_running. The former contains individual trips for ten NYC Citi Bikes in 2018. The latter contains the fastest running time for women and men’s 100m - 10000m races in the Olympics.

Mark Padgham (2019) released bikedata is an R package for downloading and aggregating data from public bicycle hire, or bike share, systems. He noted "The bikedata package aims to enable ready importing of data from all systems which provide it, and will be expanded on an ongoing basis as more systems publish open data". In 2019, data from eleven systems were available. There is a vignette to exemplify the use of bike data} }}

Visualising data with R
One of the options you have with R is to visualise your data. R has a number of functions and libraries to support your visualisations.

If you would like to explore the potential of R to visualise data, you might find Remko Duursma, Jeff Powell and Glenn Stone's (2017) introduction to learning R very helpful. Their Chapter 4 refers explicitly to visualizing data and the use of RStudio and includes discussion of: scatterplot; bar plot; histogram; curves; pie chart; box and whisker plot; and symbols.

A powerful visualisation tool in R is ggplot2.

ggplot2 was inspired by Leland Wilkinson's (1999) The Grammar of Graphics and is available as a CRAN package in R and RStudio.

Edwin Chen (2012) provides "a bare-bones introduction to ggplot2" that "assumes no knowledge of R". A definitive introduction to ggplot2 is provided by Hadley Wickham (2016).