Sport Informatics and Analytics/Introductions

Introduction
This is the first theme in our course. We are keen to introduce you to our approach to the open sharing of educational resources    and present some of the ideas central in a shift from "not invented here" to "proudly borrowed from there".

Let them eat cake
Mine Cetinkaya-Rundel recommends a backward design approach to sharing course information. This is an exciting idea for those interested in non-linear learning. When this course was first written it was with a specific institution audience in mind. Since that time a great deal has changed.

We have explored Mine's ideas for this course and have an example of the approach using bicycle hire data. You can find the slide presentation at this link.

This theme

 * Explores our approach to open sharing and the narratives of such an approach.
 * Introduces people, perspectives, products and processes in sport informatics and analytics.
 * Draws attention to an Informatik tradition and its links with sport informatics.
 * Discusses the emergence of sport analytics.
 * Explores microlearning.

In addition to this introduction, the course includes these topics as part of this theme:
 * Ethical issues
 * Communities of practice
 * Capstone

Evidence
We are mindful that throughout this course we must be sensitive to what is to count as evidence, how we record data   , our objectivity in the analysis of performance     , including how we address objective reality  , and evidence-based practice. We need to be clear too about how we use the terms reproducibilty and replicability in our research and practice.

Thomas Kelly provides an introduction to the concept of evidence and notes: ‘Evidence’ is hardly a philosopher's term of art: it is not only, or even primarily, philosophers who routinely speak of evidence, but also lawyers and judges, historians and scientists, investigative journalists and reporters, as well as the members of numerous other professions and ordinary folk in the course of everyday life.

Kevin Gray, amongst others, points out that all evidence is not equal and can differ in quantity and quality. He raises a fundamental issue for anyone involved in sport informatics and analytics: Some results can be calculated precisely or are determined by rules. Others can be estimated probabilistically with statistics and machine learning tools. However, decision-makers are often confronted with situations in which they must rely on their gut. We encourage you to reflect on the decisions you make about evidence (including the contents of this course) as you analyse performance. These may be decisions that are: These three approaches are interconnected in informatics and analytics in the ethical decisions we make about our practice and inform our work as an analyst "to reconcile conflicting ideas while still producing something useful".
 * Deterministic (results can be calculated precisely or are determined by rules)
 * Probabilistic (estimated probabilistically with statistics and machine learning tools)
 * Intuitive (reliance on 'gut instinct')

A good starting point for our reflections about evidence is Kevin Gray's observation "humans frequently misconstrue conjecture as evidence. We also readily reject evidence that contradicts our opinions, and cherry-pick data and analytics to support decisions we’ve already made".

As we consider what is to count as evidence, it might be helpful to revisit William Deming's (1975) paper to contemplate how we assign probability to evidence. In the paper, William distinguishes between enumerative ("an estimate of the number of units of a frame that belong to a specified class" ) and analytical ("a basis for action on the cause-system or the process, in order to improve product of the future" ) approaches. William adds: The basic supposition here is that any statis­tical investigation is carried out for purposes of action. New knowledge modifies existing knowledge.

As David Yarrow and Matthias Kranke (2016) indicate, such action is not exclusively objective and value neutral. They suggest that a critical, interdisciplinary performative  understanding  of statistics enables "an unpacking of the socio-material mechanisms through which data-heavy analytical technologies shape processes of valuation, commercialisation and regulation" in sport. This understanding recognises, as Jeff Leek (2017) suggests, "data analysis is not purely computational and algorithmic — it is a human behaviour" and that when we share evidence we must be conscious of the narrative we use to discuss about our findings.

Galit Shmueli (2010) proposes that when we construct narratives we must distinguish between explanation and prediction. (See also, her discussion of description (2018) in statistical modelling.)

We might reflect also on the the contextual intelligence we bring to our practice of observing and analysing performance in sport. This reflection could include 'good enough practices in scientific computing', the role of a data analyst as an artist , wanderer , our relationship to data humanism and an awareness of confirmation bias.

Martin Fowler (2015) has written about the volume of data that is now available. He identified the appearance of a data lake as an idea "to have a single store for all of the raw data that anyone in an organization might need to analyze". The data lake stores raw data, in whatever form the data source provides. James Dixon (2010) introduced the concept of a data lake when he observed "the traditional solutions we have created a concept called the Data Lake to describe an optimal solution". He added that the "contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples".

Martin Fowler (2015) observed: It is important that all data put in the lake should have a clear provenance in place and time. Every data item should have a clear trace to what sformatystem it came from and when the data was produced. The data lake thus contains a historical record.

Informatik and informatics
In this course, we acknowledge the connection between informatik and informatics. In your reading, you will come across a number of terms used to describe how we live with information in a digital age.

Karl Steinbuch used the term informatik ("die automatische Dataverarbeitung wir nennen sie heute informatik") in a 1957 paper that became the German term for computer science. For more information about the emergence of the Informatik tradition you might like to look at Daniel Link and Martin Lames' (2009) paper.

In 1962, Philippe Dreyfus created the French term l'informatique as a combination of information and automatique. In 1966, L'Academie française defined 'l'informatique' as: Science du traitement rationnel, notamment par des machines automatiques, de l' information considérée comme le support des connaissances humaines et des communications dans les domaines techniques, économiques et sociaux. In the same year as Philippe Dreyfus used l'informatique, Walter Bauer, Werner Frank, Richard Hill and Frank Wagner formed the Informatics company in the United States of America, to contribute to the "science of information handling".

In 1963, F.E. Temnikov produced a paper titled Informatika. Three years later, A.I. Mikhailov, A.I. Chernyl and R.S. Gilyarevski used the word Informatika as the name for the theory of scientific information.

Each of these terms, created in their own cultural contexts, described activities that: are essentially the everyday activities that have been enacted throughout history and across cultures: selecting, communicating, discovering, recording, organising, problem-solving, deciding and learning.

Daniel Link and Martin Lames (2009) provide a detailed account of the origins of sport informatics in Germany. They note that: The term covers all activities at the interface of computer science and sport science, ranging from simple tools for handling data and controlling sensors on to the modelling and simulation of complex sport-related phenomena.

Examples of where the informatik tradition has led researchers and practitioners can be found in Daniel Memmert and Dominik Raabe's (2017) Revolution im Profifußball.

Arnold Baca (2006) provides an introduction to the emergence of Sportinformatik.

Sport analytics
In the last two decades there has been a gradual change in how we refer to the observation, recording and analysis of performance in sport. We tend to hear and read less about notational analysis now and talk more about analytics. This indicates an important change in the community of practice that analyses performance in sport. Jay Coleman (2012) identifies some of the 'players' in sports analytics research. Bill Gerard (2015 has provided an overview of this change in the community. Felix Lebed (2017) locates this change in the context of the discipline of analytics. Erin Wasserman and her colleagues (2018) provide an overview of the fundamentals of sport analytics. Jacquie Tran (2019) shared her macro view of sports analytics.

In 2005, the Journal of Quantitative Analysis in Sports appeared "as the first academic journal dedicated to statistical analysis in sports". There was an announcement in 2019 for the Journal of Sport Analytics as "a new high-quality research journal that aims to be the central forum for the discussion of practical applications of sports analytics research, serving team owners, general managers, coaches, fans, and academics".

Benjamin Alamar and Vijay Mehrotra (2011) define sport analytics as: the management of structured historical data, the application of predictive analytic models that utilize that data, and the use of information systems to inform decision makers and enable them to help their organizations in gaining a competitive advantage on the field of play.

Their definition has three components: data management; predictive models; and information systems.

Thomas Davenport and Jeanne Harris (2007) proposed that analytics are a subset of business intelligence. They defined analytics as: The extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions. The analytics may be input for human decisions or may drive fully automated decisions. (2007:7)

In 2014, Chris Anderson proposed that sport analytics is: The discovery, communication, and implementation of actionable insights derived from structured information in order to improve the quality of decisions and performance in an organization.

Chris's definition refers to actionable insights. This is a component of Adam Cooper's (2012) wide ranging definition of analytics as: Analytics is the process of developing actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data.

More recently, Bill Gerard (2016) argues for "a narrow definition of sports analytics" as the analysis of tactical data to support tactics-related sporting decisions. He suggests "this narrow definition captures the uniqueness and the innovatory nature of sports analytics as the analysis of tactical performance data."

Felix Lebed (2017) has extended the discussion about sport analytics through "the prism of the complexity approach to all human subjects of games playing, training, coaching and managing".

Patrick Ward, Johann Windt and Thomas Kempton (2019) draw attention to business intelligence opportunities for sport scientists "to develop systematic analysis frameworks to enhance performance within their organisation". These opportunities combine data collection and organisation, analytic models to drive insight and interface through communication.

Rasmus Jørnø and Karsten Gynther (2018) discuss actionable insights (in learning analytics). You might find their paper of interest as you explore the relationships between observation, analysis and decision-support in sport analytics.

Video signpost
Our Introductions theme is presented by Trent Hopkinson.

Resources
The theme overview provides a framework to our approach to Sport Informatics and Analytics.

There is a slide presentation.

There is a mind map for this theme that includes resources up to 2015. For more recent resources (2016 onward) see | this site.

There is more background information about Informatics and Analytics on this wiki.

There are some video suggestions. (See slides 2-4)

Daniel Link's (2009) presentation Interdisciplinarity in Sport Informatics.

There are some additional resources.