Sport Informatics and Analytics/Pattern Recognition/Python

From WikiEducator
Jump to: navigation, search

Introduction

This topic develops issues raised in Pattern Recognition, Theme 2 of this course. It starts a conversation about the use of Python, a dynamic, general purpose programming language[1], in sport analytics.

Guido van Rossum compiled a history of Python in blog posts written between 2009 and 2013[2].

In this blog, I will shine the spotlight on Python's history. In particular, how Python was developed, major influences in its design, mistakes made, lessons learned, and future directions for the language.[3]

There is a detailed description of Python on this Wikipedia page.

A growing number of sport analytics practitioners combine Python and R in their work.[4] We introduce you to both platforms in this course. Your choice or combination of either language can be guided by these questions posed by Kevin Markham[5]:

  • Do you have experience of programming in other languages?
  • Are you working in academia or industry?
  • Machine learning or statistical learning?
  • Gentle introduction?
  • Data exploration focus?
  • Data cleaning focus?
  • Data visualisation priorities?

Ana Crisan observes of these questions:

If you are capable of thinking about your problem in abstract terms (and thinking statistically is, to me, generalized and abstract terms), then the choice of language has little baring. While there is always some fussing about to get accustomed to a language's nuances, and there are some benefits of one languages over another, the truth is if you know how to analyze data you know how to find what you need within a language.[6]

In this course we encourage you to keep an open mind about the tools you use to analyse performance but are keen to support your use of Python and R.

Florencia D'Andrea (2019)[7] helped with some of these choices with the publication of Python tutorial resources that includes an R bootcamp. Oleksil Kharkovyna (2019)[8] has shared a beginner's guide to Python to extend the choices you have when you use Python. Costas Andreou (2020)[9] describes how to use the xlwings python library to connect with Excel VBA to enable the passing of data between both platforms.

Learning Python

If you are new to programming, there are lots of open resources to help you on your learning journey.

Python.org has an introduction to Python that presents the reader informally to the basic concepts and features of the Python language and system. The authors suggest that after the introduction "you will be able to read and write Python modules and programs, and you will be ready to learn more about the various Python library modules"[10]

Ola Sendecka[11] has created a series of video tutorials that you might find to be a gentle introduction.

See, for example, a guide to the Command Line


Her second video introduces Python.


Other tutorials in the series (Coding is for Girls) include:

  • Python basics
  • Dictionaries
  • Comparisons
  • Saving files and 'if' statements
  • Functions
  • User input and dealing with errors
  • Documentation
  • Reading and writing files
  • Reading data from csv file

There is a Django Girls tutorial that you might find of interest that provides a step by step introduction that leads to the production of your own website.

Ben Hamner[12] and Elena Kirzhner[13], among others, have provided an introduction to visualisation with Python.

Al Sweigart[14] has made available an online introduction to Python for beginners.

FC Python provides an example of using association football to introduce Python basics.

Using Python in sport contexts

Researchers in sport have used Python in a variety of contexts. Some of them are presented here as case studies.

Case study 1: team pursuit cycling

Icon casestudy line.svg
Tracking changeovers

In 2010, Jun Burden and his colleagues[15] reported their use of Python to develop a system to provide an automated video tracking system in team pursuit races. Their paper outlines how they used Python to provide detailed analysis of changeovers in race contexts.



Case study 2: basketball

Icon casestudy line.svg
Mapping Kobe Bryant's shots.

In 2016, Joe Fox, Ryan Menzies and Armand Emamdjomeh[16] produced an interactive visualisation for the Los Angeles Times of Kobe Bryant's 30,699 shots in NBA games. In a subsequent article, Joe Fox[17] shared how they undertook the project and the use they made of Python. The Python code used is available on Github.

Ryan Davis (2019)[18] provided six NBA tutorials on Github: matching player identifications between different data sources; determining players on court at the start of each period; finding endpoints on stats.nba.com; analyzing play by play data; play by play parser; regularized adjusted plus minus (RAPM).



Case study 3: NFL

Icon casestudy line.svg
Predicting game outcomes.

FiveThirtyEight curates an NFL prediction game. FiveThirtyEight assesses every NFL game and publishes forecasts showing the chance each team has to win based on Elo ratings. The system estimates each team’s skill level through the final score and locations of each game. The prediction game requires participants to make a probabilistic forecast for each matchup in the NFL. The Python code used in the prediction game is available on Github. This includes:

  • Historical NFL scores back to 1920 in with FiveThirtyEight's Elo win probabilities for each game.
  • Code to generate the Elo win probabilities contained in the data.
  • Code to evaluate alternative forecasts against Elo using the historical data and the rules of our game.
  • Game schedule and results from the 2017-18 season.
  • Anonymized reader forecasts for completed 2017-18 games.



Case study 4: Web scraping

Icon casestudy line.svg
Using Python to collect information from web sites

Wikipedia observes that web scraping "is the process of automatically mining data or collecting information from the World Wide Web". Python is particularly well suited to web scraping. In this case study we use two examples of data scraping. The first is Daniel Forsyth's account of exploring NBA data. In his discussion, Daniel focused on "collecting were the distance the shot was taken from, the distance of the closest defender, the number of dribbles taken before the shot was taken, and the amount of time the player possessed the ball before shooting".

The second is Daniel Kim's example of scraping NBA player statistics from an established basketball website.

Daniel scrapes the data with a combination of Python and BeautifulSoup (a Python library for pulling data out of HTML and XML files). He shares his step by step code as well as an all-in-one notebook version.

For further discussion of the use of Python, you might like to have a look at Viraj Parekh's discussion of data wrangling to construct a datapipe for NBA data and Noah Gift's exploration of individual player performance.



Case study 5: bicycle journeys

Icon casestudy line.svg
Using Binder to share bicycle data

Tim Head has combined his knowledge of Python coding with his interest in sharing data. One example of his work is the Binder notebook (2017) that looks at "how the total number of cyclists varies across the year and what the distribution of cyclists looks like on an average day" in Zurich in 2016. We recommend that you look at the way Tim integrates data and coding and if you find it of interest you might respond to Tim's challenge: based on the weekend example he shares, can you show what the distribution of cyclists looks like during a working week?



Case study 6: association football

Icon casestudy line.svg
FC Python

FC Python shares resources for learning basic Python, programming and data skills through association football examples. The open course introduces Python basics, data analysis and visualisations. There is an FC Python blog that discusses a range of issues related to Python.



Icon casestudy line.svg
Pandas

Stephen Fordham (2019)[19] has identified eight useful pandas features for data-set handling. The data-set Stephen uses is International football results from 1872 to 2019.[20] Stephen observed of his post:

I have demonstrated a variety of methods and their parameters that can be intuitively used to investigate a data-set. There are multiples ways of doing each of these steps presented, but I stick to the methods shown to aid readability.[21]

Mart says of his dataset:

This dataset includes 40,838 results of international football matches starting from the very first official match in 1972 up to 2019. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men's full internationals and the data does not include Olympic Games or matches where at least one of the teams was the nation's B-team, U-23 or a league select team.[22]



Case study 7: sleep patterns

Icon casestudy line.svg
Markov Chain Monte Carlo

William Koehrsen is an ultramarathon runner. He has used his activity tracker to monitor times when he falls asleep and wakes up. In a discussion of these data, William used Python to build sleep and wake models and duration of sleep. He concluded his discussion of his approach with this observation "Data science is about constantly adding tools to your repertoire and the most effective way to do that is to find a problem and get started!".



References

  1. van Rossum, Guido (13 January, 2009). "Introduction and Overview". https://python-history.blogspot.com.au/2009/01/introduction-and-overview.html. Retrieved 13 October 2017.
  2. van Rossum, Guido (13 January, 2009). "The History of Python". https://python-history.blogspot.com.au/. Retrieved 13 October 2017.
  3. van Rossum, Guido (13 January, 2009). "Introduction and Overview". https://python-history.blogspot.com.au/2009/01/introduction-and-overview.html. Retrieved 13 October 2017.
  4. Koerhsen, Will (28 December 2017). "Random Forest in Python". https://towardsdatascience.com/random-forest-in-python-24d0893d51c0. Retrieved 117 December 2019.
  5. Markham, Kevin (2 February, 2005). "Should you teach Python or R for data science?". http://www.dataschool.io/python-or-r-for-data-science/. Retrieved 13 October 2017.
  6. Crisan, Ana (2 February, 2005). "Comment: Should you teach Python or R for data science?". http://www.dataschool.io/python-or-r-for-data-science/. Retrieved 13 October 2017.
  7. D'Andrea, Florencia (2019). "Tutorials". https://github.com/flor14/tutorials/blob/master/README.md. Retrieved 25 October 2019.
  8. Kharkovyna, Oleksil (17 July 2019). "A Beginner’s Guide to Python for Data Science". https://towardsdatascience.com/a-beginners-guide-to-python-for-data-science-60ef022b7b67. Retrieved 15 Decemer.
  9. Andreou, Costas (20 January 2020). "How to Supercharge Excel With Python". https://towardsdatascience.com/how-to-supercharge-excel-with-python-726b0f8e22c2. Retrieved 29 January 2020.
  10. Python Software Foundation (12 September 2014). "The Python Tutorial". https://docs.python.org/3/tutorial/. Retrieved 13 October 2017.
  11. Ossowski, Annabell (8 October 2017). "Your Django Story: Meet Ola Sendecka". http://blog.djangogirls.org/post/97295303273/your-django-story-meet-ola-sendecka. Retrieved 13 October 2017.
  12. Hamner, Ben (2017). "Python Data Visualizations". https://www.kaggle.com/benhamner/python-data-visualizations. Retrieved 13 December 2017.
  13. Kirzhner, Elena (6 December 2017). "Python Data Visualization — Comparing 5 Tools". https://codeburst.io/overview-of-python-data-visualization-tools-e32e1f716d10. Retrieved 13 December 2017.
  14. Sweigart, Al. "Automate the boring stuff with Python". https://automatetheboringstuff.com/. Retrieved 6 February 2018.
  15. Burden et al, Jun (2010). "Tracking a single cyclist during a team changeover on a velodrome track with Python and OpenCV". Procedia Engineering 2(2): 2931-2935.
  16. Fox, Joe; Menzies, Ryan; Emamdjomeh, Armand (14 April 2016). "Every shot Kobe Bryant ever took. All 30,699 of them". http://graphics.latimes.com/kobe-every-shot-ever/. Retrieved 13 October 2017.
  17. Fox, Joe (19 April 2016). "How we mapped Kobe's 30,699 shots". http://www.latimes.com/visuals/graphics/la-g-kobe-how-we-did-it-20160419-snap-htmlstory.html. Retrieved 13 October 2017.
  18. Davis, Ryan (6 August 2019). "NBA Data Processing Tutorials". https://github.com/rd11490/NBA_Tutorials. Retrieved 7 August 2019.
  19. Fordham, Stephen (6 October). "8 Useful P". https://towardsdatascience.com/8-useful-pandas-features-for-data-set-handling-753e9d8ba8ff. Retrieved 11 October 2019.
  20. Jürisoo, Mart (20 April 2018). "International football results from 1872 to 2019". https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017. Retrieved 20 April 2018.
  21. Fordham, Stephen (6 October). "8 Useful P". https://towardsdatascience.com/8-useful-pandas-features-for-data-set-handling-753e9d8ba8ff. Retrieved 11 October 2019.
  22. Jürisoo, Mart (20 April 2018). "International football results from 1872 to 2019". https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017. Retrieved 20 April 2018.