Sport Informatics and Analytics/Pattern Recognition/Knowledge Discovery

From WikiEducator
Jump to: navigation, search
Rontgen rays and Finsen light.jpg


Introduction

This topic explores how we can extract useful information and actionable insights from sport data.

There has been a variety of labels used to characterise processes that extract of useful information from data. These include "data mining, knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing"[1].

Gregory Piatetsky-Shapiro [2] introduced the term knowledge discovery in a report of a workshop in 1989 that brought together practitioners from "expert systems, machine learning, intelligent databases, knowledge acquisition, case-based reasoning and statistics"[3]. The report of the workshop concluded "knowledge discovery in databases is an idea whose time has come"[4].

William Frawley, Gregory Piatetsky-Shapiro, and Christopher Matheus (1992)[5] provided one of the earliest overviews of knowledge discovery in databases. They defined knowledge discovery in databases (KDD) as:

Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Given a set of facts (data) F, a language L, and some measure of certainty C, we define a pattern as a statement S in L that describes relationships among a subset Fs of F with a certainty c, such that S is simpler (in some sense) than the enumeration of all facts in Fs. A pattern that is interesting (according to a user-imposed interest measure) and certain enough (again according to the user’s criteria)is called knowledge. The output of a program that monitors the set of facts in a database and produces patterns in this sense is discovered knowledge.[6]

They added "Patterns are interesting when they are novel, useful, and non-trivial to compute"[7].

In 1996, Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth discussed "an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases"[8]. Their paper distinguishes KDD from data mining. They note:

In our view, KDD refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step in this process. Data mining is the application of specific algorithms for extracting patterns from data[9].

They argue that KDD is a process and data mining is a step within that process. The derivation of useful knowledge from data requires:

  • data preparation
  • data selection
  • data cleaning
  • incorporation of appropriate prior knowledge
  • proper interpretation of the results of data mining[10]

Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth provide the conceptual and practical foundation for the the KDD process in sport contexts. They propose:

KDD focuses on the overall process of knowledge discovery from data, including how the data are stored and accessed, how algorithms can be scaled to massive data sets and still run efficiently, how results can be interpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported[11].

Twenty years after the publication of their paper there is still a tendency to regard data mining and KDD as interchangeable terms. During this unit we have used the term analytics as a shorthand for KDD.

Our discussion of analytics used this definition:

The discovery, communication, and implementation of actionable insights derived from structured information in order to improve the quality of decisions and performance in an organization.

As we develop our KDD skills this activity will include unstructured data too. Whatever is included, it will be part of a process that the literature of the 1990s foresaw.

Sport examples

We present two examples here for your consideration.

Chris Anderson and David Sally discuss the potential of an analytics approach to association football in their study of The Numbers Game[12].

In the introduction to their book, they write:

The clue to analytics is in the name. To make (those) numbers mean something, to learn something from them, they must be analysed. The key, for those at the vanguard of what some have called a data 'revolution and what we think of as football's reformation, is to work out what they need to be counting, and to discover why, exactly, what they are counting counts.[13]

Their book explores the analytics process and raises important empirical and methodological issues for this course.

Icon reading line.svg
A Numbers Game?

Read the introductory chapter in The Numbers Game, Football for Sceptics - The Counter)s) Reformation . Does their suggestion resonate with your experience of sport?

A storm is gathering in football. It is one that will wash away old certainties and change the game we know and love. It will be a game we view more analytically, more scientifically, where we do not accept what we have always been taught, but where we always ask why. The game will look the same, but the way we think about it will be almost unrecognizable[14].



The second example presented here is the paper written in 1997 by Inderpal Bhandari and his colleagues at the IBM TJ Watson Research Center. The paper is titled Advanced Scout: Data Mining and Knowledge Discovery in NBA Data. In the paper, they report their analysis of data gathered by a software program, Advanced Scout, that "seeks out and discovers interesting patterns in game data"[15]. We have chosen this paper to connect with the spirit of the literature of the time. The editor of the journal within which the paper was accepted was Gregory Piatetsky-Shapiro.

Icon activity line.svg
Another coach on the team?

Read Inderpal Bhandari and his colleagues' paper.

  • Can you see any parallels with the generic discussion in Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth's paper[16]?
  • Mindful of your consideration of the Audiences and Messages theme of this course, how might you go about sharing insights from the data you have gathered to offer the team another coach as suggested by Bob Salmi in the paper?



References

  1. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17 (3): 39. http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/.
  2. Piatetsky-Shapiro, Gregory (1990). "Knowledge Discovery in Real Databases: A Report on the IJCAI-89 Workshop". AI Magazine 11 (5): 68-70. https://www.aaai.org/ojs/index.php/aimagazine/article/download/873/791. Retrieved 1 March 2016.
  3. Piatetsky-Shapiro, Gregory (1990). "Knowledge Discovery in Real Databases: A Report on the IJCAI-89 Workshop". AI Magazine 11 (5): 68. https://www.aaai.org/ojs/index.php/aimagazine/article/download/873/791. Retrieved 1 March 2016.
  4. Piatetsky-Shapiro, Gregory (1990). "Knowledge Discovery in Real Databases: A Report on the IJCAI-89 Workshop". AI Magazine 11 (5): 70. https://www.aaai.org/ojs/index.php/aimagazine/article/download/873/791. Retrieved 1 March 2016.
  5. Frawley, William; Piatetsky-Shapiro, Gregory; Matheus, Christopher (1992). "Knowledge Discovery in Databases: An Overview". AI Magazine 13 (3): 57-70. http://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1011/929. Retrieved 29 February 2016.
  6. Frawley, William; Piatetsky-Shapiro, Gregory; Matheus, Christopher (1992). "Knowledge Discovery in Databases: An Overview". AI Magazine 13 (3): 58. http://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1011/929.
  7. Frawley, William; Piatetsky-Shapiro, Gregory; Matheus, Christopher (1992). "Knowledge Discovery in Databases: An Overview". AI Magazine 13 (3): 58. http://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1011/929.
  8. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17 (3): 37-54. http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/.
  9. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17 (3): 39. http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/.
  10. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17 (3): 39. http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/.
  11. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17 (3): 39ff. http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/.
  12. Anderson, Chris; Sally, David (2013). The numbers game: why everything you know about football is wrong. London: Penguin.
  13. Anderson, Chris; Sally, David (2013). The numbers game: why everything you know about football is wrong. London: Penguin. pp. np.
  14. Anderson, Chris; Sally, David (2013). The numbers game: why everything you know about football is wrong. London: Penguin. pp. np.
  15. Bhandari, Inderpal et al. (1997). "Advanced Scout: Data Mining and Knowledge Discovery in NBA Data". Data Mining and Knowledge Discovery 1 (1): 121. http://www.cse.unr.edu/~sushil/class/ml/papers/local/nba.pdf. Retrieved 1 March 2016.
  16. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17 (3): 37-54. http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/.