Comments from Brian
Comments from Mika in response to Brian
To add to Brian's ideas: if you have a clear rubric for evaluators to follow, you'll improve the reliability. You could even eventually build a library, for each task, of sample work exemplifying different levels of rating, so that everyone can calibrate themselves. But if all you do is ask people to evaluate on a 10-point scale, you're going to get a lot of variability.
Response to Mika's comments
Agreed -- the system will need to cater for clear rubrics with criteria and specifications of what is required for each grade level to improve reliability. I also think using broad bands of performance eg Unsatisfactory = 1 - 4; Acceptable = 5 - 6 and Excellent = 7 - 10 would mitigate against some of these challenges. There is always and element of subjectivity when humans are expressing value judgements, even in rigorous systems. I also think that its pedagogically sound to incorporate a self-evaluation. Should the system detect significant deviations of the peer-evaluations when compared with the self-evaluations - these evaluations could be flagged by the system.
Also, in the case of using peer-evaluation to assist with scaling feedback for formative assessment it is possible to focus on more objectively verifiable criteria, for example "did the post meet the minimum word count" (taking into account that we could automate this kind of criterion later in the process) or "did the post respond to the three questions".
In the OCL4Ed course, for instance, we have specified a minimum number of substantive blog posts in order to qualify for certification of participation. Sometimes the learners submit the wrong url for their posts. At this time checking the urls is a manual process and is not scalable. Peer evaluation could assist in scaling the implementation of this kind of participation metric.