Sport Informatics and Analytics/Pattern Recognition/SQL

From WikiEducator
Jump to: navigation, search

Introduction

This page introduces Structured Query Language (SQL). There is a wikipedia page on the topic of SQL

An increasing number of job descriptions in sport analytics are requiring experience of SQL and related command and programming skills.[1]

SQL enables the user to communicate with databases. A database is "a tool for collecting and organizing information".[2] Rahul Batra (2018) points out that the most prevalent database model is the relational database. Steve Suehring (2002)[3] pointed out that such databases are speedy, accurate and thorough. They make for the flexible reporting of information.

Chad Brooks (2014) reports:

The SQL programming language was first developed in the 1970s by IBM researchers Raymond Boyce and Donald Chamberlin. The programming language, was known then as SEQUEL.[4]

Relational Databases

In 1970, Edgar Codd[5] published his seminal paper on a relational model of data for large shared data banks. Edgar's paper "is concerned with the application of elementary relation theory to systems which provide shared access to large banks of formatted data".[6] Edgar's use of the term relation in the paper is used in its mathematical sense.

Edgar adds:

The adoption of a relational model of data, as described, permits the development of a universal data sub-language based on an applied predicate calculus.[7]

In section 1.3 of his paper, Edgar observes:

it is proposed that most users should interact with a relational model of the data consisting of a collection of time-varying relationships (rather than relations). Each user need not know more about any relationship than itsname together with the names of its domains (role qualified whenever necessary).[8]

SQL

There is a wikipedia page on the topic of SQL

In 1979, Relational Software released its own version of the SQL language, Oracle V2. Since that time the American National Standards Institute and the International Standards Organization have regarded the SQL language the standard language in relational database communication.

Edgar Codd (1979)[9] published his account of extending the database relational model to capture more meaning. In it, he confirmed:

A database system is fully relational if it supports: (1) the structural aspects of the relational model; (2) the insert-update-delete rules; (3) a data sublanguage at least as powerful as the relational algebra, even if all facilities the language may have for iterative loops and recursion were deleted from that language.

The 1981 ACM Turing Award was presented to Edgar Codd. At the Award lecture, Edgar presented a paper titled Relational Database: A Practical Foundation for Productivity.[10]

In his paper, Edgar looked back at database systems and observed:

Looking back at the database management systems of the late sixties, we may readily observe that there was no sharp distinction between the programmer's (logical) view of the data and the (physical) representation of data in storage.[11]

He added:

installation of these systems was otten agonizingly slow, due to the large amount of time spent in 1earning about the systems and in planning the organization of the data at both logical and physical levels, prior to database activation.[12]

Edgar described the research motivatiuon for relational data models.[13] He the objective was to provide "a sharp and clear boundary between the logical and physical aspects of database management". Edgar defined three objectives of the research:

  • data independence
  • communicability
  • set-processing

He also asked why call the database relational , why not call it tabular? His answer was:

There are two reasons: (1) At the time the relational model was introduced, many people in data processing felt that a relation (or relationship) among two or more objects must be represented by a linked data structure (so the name was selected to counter this misconception); (2) Tables are at a lower level of abstraction than relations, since they give the impression that positional (array-type) addressing is applicable (which is not true of n-ary relations), and they fail to show that the information content of a table is independent of row order. Nevertheless, even with these minor flaws, tables are the most important conceptual representation of relations, because they are universally understood.[14]

Edgar concluded his paper with this observation:

We have presented a series of arguments to support the claim that relational database technology offers dramatic improvements in productivity both for end users and for application programmers. The arguments center on the data independence, structural simplicity, and relational processing defined in the relational model and implemented in relational database management systems.[15]

Roman Orac (2019)[16] discussed five common mistakes when writing SQL queries.

Bryan Feng (2019)[17] discussed how to learn SQL by using Excel as a bridge.

MySQL

There is a wikipedia page on the topic of MySQL.

MySQL is an open source, relational database management system.[18] It is shared under the terms of the GNU General Public License.It is mostly written in C and C++. MySQL was developed in 1994 by David Axmark and Michael Widenius. In an intereview, Michael was quoted:

I have some talents in writing working C code and know a little bit about databases. Some years ago (1994) I put these talents to some use and wrote the MySQL database on top of my old database project that I started in 1982. David Axmark (the second founder of MySQL) and I released it 'open source like' (this was before open source) in 1995 because we wanted do some good and we thought that 'at least we wouldn't get less business by doing this'. The end result did way exceed our expectations.[19]

The first version of MySQL appeared on 23 May 1995.In 2018-2019, 143 developers contributed new code to MySQL.[20]

MySQL is available for different operating systems on a variety of computer architectures.[21]

Chad Brooks (2014) noted:

Rather than trying to write an SQL for their own databases, many companies use a database management system that has SQL already built in to it. Developed and distributed by Oracle, MySQL is one of the most popular SQL database management systems currently available. The software is an open source version, which means it can be downloaded and used for free.[22]

Steve Suehring (2002)[23] suggested that what MySQL did best with:

  • Web applications
  • Enterprise-level applications
  • Open-source support
  • Low overhead
  • Large table size
  • Stability

Examples from sport

Motion data

Janez Pers and his colleagues (2005) [24] sought to develop an "inexpensive and flexible way to analyze large amounts of sport motion data, which are generated by automated motion tracking methods and complemented with manual annotations". Their paper uses a database of 100 squash plays to "detect certain kinds of play, activities, certain predefined scenarios and to generate various statistics about these activities – without hard-coding them in the executable code". Their paper provides examples of SQL queries for sport analysis and they provide a tutorial on aspects of SQL language.

References

  1. Lyons, Keith (14 October 2019). "Player Insights Scientists". https://keithlyons.me/blog/2019/10/14/player-insights-scientists/. Retrieved 15 October 2019.
  2. Microsoft. "Database basics". https://support.office.com/en-us/article/database-basics-a849ac16-07c7-4a31-9948-3c8c94a7c204. Retrieved 15 October 2019.
  3. Suehring, Steve (2002). "Advantages of using databases". http://justpain.com/eBooks/Databases/MySQL/MySQL%20Bible.pdf. Retrieved 16 October 2019.
  4. Brooks, Chad (21 January 2014). "What is SQL?". https://www.businessnewsdaily.com/5804-what-is-sql.html. Retrieved 15 October 2019.
  5. Codd, Edgar (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 59 (13(6)): 377-387.
  6. Codd, Edgar (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 59 (13(6)): 377-387.
  7. Codd, Edgar (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 59 (13(6)): 377-387.
  8. Codd, Edgar (1970). "A Relational Model of Data for Large Shared Data Banks". ACM Transactions on Database System 59 (13(6)): 377-387.
  9. Codd, Edgar (1979). "Extending the Database Relational Model to Capture More Meaning". Communications of the ACM 4(4): 397-434.
  10. Codd, Edgar (9 November 1981). "Relational Database: A Practical Foundation for Productivity". http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.613.4121&rep=rep1&type=pdf. Retrieved 15 October 2019.
  11. Codd, Edgar (9 November 1981). "Relational Database: A Practical Foundation for Productivity". http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.613.4121&rep=rep1&type=pdf. Retrieved 15 October 2019.
  12. Codd, Edgar (9 November 1981). "Relational Database: A Practical Foundation for Productivity". http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.613.4121&rep=rep1&type=pdf. Retrieved 15 October 2019.
  13. Codd, Edgar (9 November 1981). "Relational Database: A Practical Foundation for Productivity". http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.613.4121&rep=rep1&type=pdf. Retrieved 15 October 2019.
  14. Codd, Edgar (9 November 1981). "Relational Database: A Practical Foundation for Productivity". http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.613.4121&rep=rep1&type=pdf. Retrieved 15 October 2019.
  15. Codd, Edgar (9 November 1981). "Relational Database: A Practical Foundation for Productivity". http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.613.4121&rep=rep1&type=pdf. Retrieved 15 October 2019.
  16. Orac, Roman (11 November 2019). "5 mistakes when writing SQL queries". https://towardsdatascience.com/5-mistakes-when-writing-sql-queries-b34ce31fcbe4. Retrieved 16 November 2019.
  17. Feng, Bryan (22 December 2019). "Learn SQL By Using Excel As A Bridge". https://towardsdatascience.com/learn-sql-by-using-excel-as-a-bridge-30cea4d314e0. Retrieved 23 December 2019.
  18. MySQL. "MySQL Editions". https://www.mysql.com/products/. Retrieved 15 October 2019.
  19. Open Source Relrease (13 March 2009). "Five Questions With Michael Widenius". http://www.opensourcereleasefeed.com/interview/show/five-questions-with-michael-widenius-founder-and-original-developer-of-mysql. Retrieved 16 October 2019.
  20. Blackduck. "Very large, active development team". https://www.openhub.net/p/mysql/factoids#FactoidTeamSizeVeryLarge. Retrieved 15 October 2019.
  21. Suehring, Steve (2002). "Introducing MySQL". http://justpain.com/eBooks/Databases/MySQL/MySQL%20Bible.pdf. Retrieved 16 October 2019.
  22. Brooks, Chad (21 January 2014). "What is SQL?". https://www.businessnewsdaily.com/5804-what-is-sql.html. Retrieved 15 October 2019.
  23. Suehring, Steve (2002). "What MySQL Does Best". http://justpain.com/eBooks/Databases/MySQL/MySQL%20Bible.pdf. Retrieved 16 October 2019.
  24. Pers et al, Janez (2005). "Analysis and pattern detection on large amounts of annotated sport motion data using standard SQL". Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005: 339-344.