A few thoughts on the peer evaluation proposal.
First off -- Akash belated congratulations on being accepted for this years GSoC on WikiEducator. I was travelling offshore at the time the announcement was made.
Reading your document a few thoughts come to mind;
- I think its appropriate to have each learner "opt in" at the individual task level for peer review rather than the course level. In the MOOC environment some learners may only be interested in parts of a course. Opting in at the task level will avoid problems of folk who leave the course and do not complete their peer-evaluation tasks.
- Mapping to rubrics is an excellent idea - perhaps a link to a wiki page. Here is an example of a rubric, which is too detailed for a shorter task - but illustrates how grades can be associated with different criteria. The model of specifying a rubric will create flexibility without overcomplicating the coding interface.
- I would suggest that the course designer specifies the weightings for grade versus participation rather than the system pre-determining this weighting.
- If learners opt in at the task level - we will need to think of a mechanism to aggregate the participation for all tasks within a course.
- Should we decide on a maximum number of categories - or is this indefinite?
- If we have catagories, we should also have the ability to apply weightings to the categories where the sum of the weightings of all categories = 100%.
- What do you mean by 3 grades from a scale of 1-10? Do you mean, for example A, B and C?
- Your point is well made about the problems of using the average score where there our outliers. My gut feel is that an absolute deviation, for example 1.5 points is going to be problematic for a small number of responses. Perhaps the standard deviation or a z-score might be a better measure for reliability. I think we should consult with a statistician for advice on the best measure.
- The other idea I have is that there is research to support the validity and reliability of self-evaluations when compared with the teacher evaluations. There are also sound pedagogical reasons for implementing self evaluations. It would be a good idea to implement a self-evaluation option and then decisions on "counting" or "flagging" problem evaluations where the rater is not being fair could be based on deviations from the self-evaluation.
- I think random assignment will mitigate against "gaming" the system. If only 3 people are evaluating discarding outliers may not generate reliable results - -the sample is too small. Perhaps using a broad scale parameters like "Poor" "Satisfactory", and "Excellent" could help or mitigate the problem. Ultimately, if there is a major difference in the grades, it would be better to flag the grade as problematic requiring review by the instructor or some other mechanism.
- Deadlines are important. I suggest we have a mechanism to email reminders that a peer evaluation is due. Open learners are notorious in forgetting deadlines ;-)
- For the prototype phases - I suggest that we apply peer grading for formative assessment (i.e. feedback on the learning process) rather than contribution to final grades. Most institutions would be uncomfortable with using peer evaluation at this time as a contribution to final grades, however many institutions would recognise the participation metrics as part of the final grade.
This is a great start! Looking forward to this development.
9. Rather than random assignment (which might be problematic with course dropout rates) an alternative might be a karma system:
- that values being among the first n people to evaluate a submission (to reduce the chance of some submissions being reviewed at all)
- that reduces the karma earned for
- each review you do of a particular peer (reduces grading your friends)
- each review you do of a particular assignment (discourages someone trying to review everyone)
Peer evaluation works well in cohort based course, and we implemented the same in a social science course sometime back.Our approach was to make groups of 10 students, ask each one of them to assess others contribution in a scale of 1-10 using a set of guidelines, and average the score. We made it compulsory for each one in the group to respond to 9 other responses. For the final grade/mark, we gave 50% weight to the peer assessment, and the rest 50% to teacher evaluation, which included criteria related to number of posts by an individual to others work, and quality of contribution.
All the best, Aakash. --Missan 04:39, 12 May 2014 (UTC)
Thank you for sharing your valuable experience. In my opinion such an approach would work well for a course which is well moderated and with a relatively low number of students. The problem with it could be that the students among the same group may be friends or become so during the duration of the course and give very good grades to everyone they are to evaluate. Also, in some courses it may not be possible for the teacher to look at all submissions.
2. I agree. Mapping to rubrics will also let us ask better questions than asking to grade on a scale of 1-10 to improve the reliability, as suggested by Mika. For example, for grading a blog post on open education, questions like whether it talks on scalability (Yes/No), Content is well researched (Yes,No), etc can be asked. I think that using such questions the participation metrics can be accurately and reliably measured by peer evaluation. For measuring the quality - questions like - quality of the post: A. Unsatisfactory = 1 - 4; B. Acceptable = 5 - 6 ; C. Excellent = 7 - 10 ? may be asked as you suggested.
3. I think it would be also good to leave a default and let the course instructor modify it if needed. Many instructors may not want to spend a lot of time configuring the system or think about what is a good distribution, but prefer to use it as is.
- Yes, we would need such a mechanism. We need something which would predict the overall participation in the course based on individual tasks. In my opinion this would not be trivial. There might be some learners who participate in say activities 1 and 3 and others say only in activity 5 (suppose it is the last activity of the course). Now, the last activity may be the only major one in the course while others being trivial. The learners who had participated in the last activity only may have done so as they had some previous experience and for that course it is sufficient participation. How do we decide on the participation of these two type of learners is an important question. We, would need to think of such issues at an individual course level and judge how to aggregate the participation.
4-5. There should be a limited number of these that add up to 100% and should be specified by the course instructor/moderator. But, instead of categories being rated from 1-10, we could also have a whole activity judged by say 10-15 questions of the Yes/No or "Poor"/"Satisfactory"/"Excellent form.
6. Assuming that n people evaluate an assignment we would have n answers/points for a particular category of an activity. It would not be the case if something like the karma system would be used.
7. I agree it is naive to use an absolute deviation of 1.5 points.
8. We could have Self Evaluation along with Peer Evaluation. The workshop module of Moodle requires both to be done. In it, if the self grade is close enough to the peer grade then self evaluated grade is assigned otherwise there is some form of mathematical calibration of the grade. Now, we could also have the teachers grade for a subset of the submissions for each activity. Everyone could be able to flag both self and peer grades based on deviations.
11. Starting off with formative assessment would be great.