Sport Informatics and Analytics/Pattern Recognition/Knowledge Discovery/Introduction

Introduction
This topic explores how we can extract useful information and actionable insights from sport data.

There has been a variety of labels used to characterise processes that extract of useful information from data. These include "data mining, knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing".

Gregory Piatetsky-Shapiro introduced the term "knowledge discovery" in a report of a workshop in 1989 that brought together practitioners from "expert systems, machine learning, intelligent databases, knowledge acquisition, case-based reasoning and statistics". The report of the workshop concluded "knowledge discovery in databases is an idea whose time has come".

William Frawley, Gregory Piatetsky-Shapiro, and Christopher Matheus provided one of the earliest overviews of knowledge discovery in databases in 1992. They defined knowledge discovery in databases (KDD) as: Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Given a set of facts (data) F, a language L, and some measure of certainty C, we define a pattern as a statement S in L that describes relationships among a subset Fs of F with a certainty c, such that S is simpler (in some sense) than the enumeration of all facts in Fs. A pattern that is interesting (according to a user-imposed interest measure) and certain enough (again according to the user’s criteria)is called knowledge. The output of a program that monitors the set of facts in a database and produces patterns in this sense is discovered knowledge.

They added "Patterns are interesting when they are novel, useful, and non-trivial to compute".

In 1996, Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth discussed "an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases". Their paper distinguishes KDD from data mining. They note: In our view, KDD refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step in this process. Data mining is the application of specific algorithms for extracting patterns from data. They argue that KDD is a process and data mining is a step within that process. The derivation of useful knowledge from data requires: Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth provide the conceptual and practical foundation for the the KDD process in sport contexts. They propose: KDD focuses on the overall process of knowledge discovery from data, including how the data are stored and accessed, how algorithms can be scaled to massive data sets and still run efficiently, how results can be interpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. Twenty years after the publication of their paper there is still a tendency to regard data mining and KDD as interchangeable terms. During this unit we have used the term analytics as a shorthand for KDD.
 * data preparation
 * data selection
 * data cleaning
 * incorporation of appropriate prior knowledge
 * proper interpretation of the results of data mining

Our discussion of analytics used this definition: The discovery, communication, and implementation of actionable insights derived from structured information in order to improve the quality of decisions and performance in an organization. As we develop our KDD skills this activity will include unstructured data too. Whatever is included, it will be part of a process that the literature of the 1990s foresaw.