Talk:Peer Evaluation

[View source↑]
[History↑]

Thread title	Replies	Last modified
End of project review	1	04:53, 19 August 2014
Rubrics for Peer Evaluation	5	23:34, 3 June 2014
Feedback on user interface Mockups	3	03:23, 21 May 2014
Goals?	2	10:56, 20 May 2014
Comments from Brian	5	20:36, 19 May 2014
A few thoughts on the peer evaluation proposal.	4	17:06, 14 May 2014
Peer Evaluation	1	04:10, 9 May 2014

End of project review

Some thoughts...

nothing to do with coding but wording - Peer Evaluations are a great pedagogical activity for any course - this is important. Wording makes this sound like a problem. "For courses where there are a large number of learners manual grading by instructors will not be possible. Some types of activities can be automatically graded but it also may not be possible for tasks which may not have a very clear deliverable. Peer Evaluation is a possible scalable solution which may be used in most circumstances. "

Vtaylor (talk)‎

Thanks a lot for sharing you thoughts. I agree with your feedback. The wording of the introduction is now changed to - "Peer evaluation provides a scalable solution for assessment of activities. In the process learners are asked to submit their work and then evaluate the work of their peers. It is generally facilitated through the use of rubrics. It is of vital importance in courses where there are a large number of learners and manual grading by instructors is not possible. Wikieducator Peer Evaluation is a minimalistic tool that can be used for student/learner peer review and self evaluation. It can be integrated to wiki content or can be used as a standalone tool. It is very simple to setup the tool for customized rubrics."

Akash Agarwal (talk)‎

Rubrics for Peer Evaluation

Great start. The screen mockups really help.

Peer Evaluations need more guidance.

Including a simple rubric with 3-4 criteria and 3-4 descriptions of completion should be accommodated even if the instructor can elect not to use them.
The instructor should be able to define the rubric. Having some pre-made rubrics would demonstrate good practice.
The evaluation form could/should be integrated into the response - not a separate screen.

Peer Evaluation is a learning opportunity for the evaluator and the instructor as well as the learner whose work is being reviewed.

Peer groups - some courses are using peer groups or teams, either learner-formed or assigned. Then the evaluations are within the peer group. There is more likelihood that the identity of the evaluator would be important and visible.

Vtaylor (talk)‎

Thank you for your views on rubrics. It is true that Peer Evaluation needs more guidance. I will include these in the first prototype version.

Although Peer Groups would promote learning and collaboration, it may not be ideal for evaluation. Learners within a peer group may have a tendency to give the same kind of grades to everyone, for example, everyone part of one particular group may be friends and give everyone else very high grades.

Akash Agarwal (talk)‎

I have had excellent experiences with peer evaluation and believe that if the task to be marked in this way is designed with the peer evaluation in mind you can have great outcomes. I think that personal reflective assignments are very important and relevant to learning and there is no problem with using peer evaluation so long as the reflection is structured more clearly. The reflection task example used to guide this project could - i think - be improved - rather than one blanket open ended question, it needs to be broken down into smaller pieces with rubric/marks to be allocated for each by the peer evaluator. For example, we have used the CARL framework for some time (and i cannot put my finger of the academic reference for this right this moment!) but it is eg: Context: what happened? Action: what did you do? Reflection: what happened next? where you effective? did you get the result you intended? how did you handle any unexpected outcomes? Learning: insight into your strengths and weaknesses and plans for action emerging from this learning experience

As you can imagine, each of these questions above has the potential to have a sliding scale of marks assigned to them via a rubric which provides guidance and reduces the need for subjective judgement on the success of the task/learning.

In summary - i think the design of the task is very important, and it can make or break peer evaluation. In designing tools for it i think we need to have model assignments and model criteria/rubrics that go with it.

On a slightly different tack - relating to the distribution of marks. I like to have a mix of criteria/marks for outcome AND for process.Sometimes students work hard but still don't get a good outcome. Sometime students wing it with a pretty loose process but know enough to get a good outcome. If we want to help students establish good life-long learning skills and self-directed learning skills then i think it is useful to have a series of questions/criteria/marks focussing on the PROCESS of the task rather than just on the outcome. For example you can ask students to submit information about the "how" of their process eg, did you allow yourself sufficient time to undertake this task? how many sources of information did you consider? If you got stuck, did you seek help and if so with whom? What feedback did you get from your co-learners and how did you incorporate this into your work? How does this learning relate to your prior experience at work or in your personal life? Extra marks can be allocated in the rubrics by peer evaluators relatively simply.

In regards to variation in marks - the kinds of methods discussed seem reasonable. I am very keen on the rating the feedback mechanism however. One common frustration from students feeling hard done by peer marking is "i feel like they did not read my submission properly. they said i didn't address X but it was right there on page Y". It would be great if those receiving feedback could rate the feedback and if a summary/aggreate of that was sent back to the evaluator. It might make these students allocate a little more time and care to their evaluations. If we have evaluators who are regularly getting low scores - could we remove them from the pool of assessors? Similarly, if a student had more than one unfair pieces of feebdack could they request via the system to have an additional evaluator assigned instead?

Slambert (talk)‎

Thanks a lot for taking the time to share your valuable experiences and thoughts. These will help in taking the project a step further in the right direction.

Regarding the 'rating the feedback mechanism', we do plan to have a system which tries to judge the credibility of the students with regards to the accuracy of their evaluations, in addition to letting them provide the feedback. Not only, do we plan to let the learners give feedback on the evaluations and flag them if they think it to be incorrect or improperly done, we are also thinking of having a karma system where the credability of the evaluaters is measured over time and then we could have things like the weightage of the evaluations being distubuted on the basis of this. For example, a evaluation by someone who is evaluating the same person every time will be given less importance than another who tends to evaluate a wider section of the learners. This system will provide the learners an extra incentive to do the peer evaluation task the right way and also a mechanism to detect learners who try to take advantage of the system.

Akash Agarwal (talk)‎

Hi Sarah,

Appreciate your feedback.

Output guidelines like: "What happened? Action: what did you do? Reflection: what happened next? where you effective? did you get the result you intended? how did you handle any unexpected outcomes? are extremely valuable to guide and support learners with preparing learning reflections, but I don't think they're useful as evaluation criteria.

It would difficult for a peer to reliably evaluate a learner response to "did you get the result you intended?"

You're right - the design of the task is critically important - but that's something the course developer / designer is responsible for and difficult to integrate into the peer evaluation technology engine. If academics design unreliable evaluation criteria, the peer evaluation outputs will be unreliable.

Agreed - the PROCESS of the task is more important than the outcome -- but harder to evaluate from a peer evaluation perspective. I think we need to be realistic and not expect a peer evaluation tool to evaluate the process. The peer evaluators will see the outputs of the process -- not the process itself. Candidly, we can only ask peers to evaluate what they can realistically observe.

As Akash has indicated - he is planning to incorporate an "appeal" feature where the learner can flag evaluations which they believe are not fair or accurate. I think it would be useful to incorporate a text field for the learner to state why the evaluation is not a fair an accurate representation of their work.

The design of the underlying mathematical model to deal with all the dimensions of reliability and excluding problematic scores etc is a complex challenge - particularly when dealing with small cohort enrolments because inferential models would not be reliable in this context.

My own feeling is that the design needs to be incremental and realistic. The GSoC project is working to a tight deadline - getting the basic functionality of the peer evaluation technology is more important than refining the mathematical model as a first step. If learners are able to flag questionable evaluations - that's sufficient for the first iteration in my view. Perfecting the mathematical model is the next incremental step.

Mackiwg (talk)‎

Hi Valerie,

We have developed prototype rubric examples which accommodate customisable and multiple criteria with corresponding descriptions for the different grade levels.

Would you be in a position to provide additional examples with corresponding rubrics to assist with the design - -that would help us tremendously.

Agreed - where possible integrating the criterion descriptors and requirements for each grade level in the response form would be ideal.

Appreciate the feedback - thanks!

Mackiwg (talk)‎

Feedback on user interface Mockups

Akash, thanks for your effort in preparing the interface Mockups - this is extremely valuable for educators (like myself) to think systematically about the design of the peer evaluation system.

Mockup for submitting / registering an assignment

In the OERu nomenclature, it is better to use the concept "E-learning Activity" (rather than assignment). The concept assignment is typically used for summative assessment, that is the assignments learners are required to submit for formal academic credit.
We need to think about the behaviour in cases where a user does not have a current WikiEducator session - for example a message and link to the user that they must login.
Drawing on my experience from the OCL4Ed courses, a major challenge with inexperienced blog users is that they register the wrong url (for example providing the link to the homepage rather than the individual blog post or providing the link to the editors view rather than the published blog post.) The design could "automate" some checks to minimise these errors, for example:
1. Requiring users to add a unique tag to the post - if the tag is identified, it could pre-populate the url.
2. The URL field could filter for known errors (eg link to home page or edit view of known blog links.)
3. Would it be useful / appropriate to incorporate some feature for the user to "check" the link before checking the "Opt in for evaluation" button.
I assume the agree to Peer Evaluation link will document the "terms of service and conditions" for the peer evaluation system. We need to cover a number of legal aspects here.
Do we need to cater for an option for learners to "opt-out" of peer evaluation at any time? My gut feel is that this should be an option as in the case of providing user autonomy to unsubscribe for an email list.
I think that it's advisable for us to incorporate a self-evaluation option in the system. If a submission is flagged as a self evaluation item, would it be appropriate to trigger the self-evaluation items immediately after the user has submitted to opt-in to the evaluation?
This mockup is a good example of a mix of activities. For example, we would not expect a value judgement on a personal learning reflection, so the evaluation items would be restricted to yes / no validations. Conversely, number activities in the OCL4Ed example would present opportunities for qualitative value judgements. I will make a selection of OCL4Ed activities and develop rubrics which we can use for prototyping.

Screen listing assigned evaluations

Would it be useful to include the due-date and time for the evaluations -- with automated conversion for local time zone. - For example "Please evaluate the following submissions for 2nd learning reflection by 15 June 2014 at <with widget for local time zone>."

Evaluation submission screen

This is a proverbial debate among educators in the wiki community. Users with limited experience in browsing online content frequently get lost (eg they forget that they have a back button on the browser ;-)) Some learning designers would recommend that the link to the post to be evaluated should open in a new window or tab so that the evaluation form remains open. Something for us to think about - I'm not sure what the best solution is.
Regarding the validation item (Is the content of the post in response to the activity concerned):
- I would consider adding an optional comment field for the submitter to indicate a the reason why it is not related, eg incorrect url.
- If the user answers No to the first validation question, the remaining items are redundant. Should we consider a behaviour where the remaining evaluation items only appear after the post is validated as a response to the activity?
In cases which are supported with a rubric, we need to provide a link to the evaluation rubric which specifies the criteria.
To improve completion of all the fields for a "valid record" I would suggest making the evaluation items required fields (comments optional). If not -- we will need to think about what constitutes a valid evaluation.

Do we need Mockup screens for:

The learner's view of the ratings and comments from peers
- An issue which will require considerable debate and discussion is the transparency of ratings. That is: should the evaluee see the identity of the evaluator?, should submissions remain anonymous and the user is provided with aggregated scores? etc. I think the best strategy is to prototype and get feedback from real learners using an online survey.
- Do we provide the capability for the evaluee to flag "spam" comments -- this has been a big problem with some of the xMOOC experiences.

This is progressing well. The mockups have certainly helped us think about prospective issues and hopefully we can get a few decisions before delving into the code - or at least designing the code engine in a way that is flexible enough to tweak as we gain a better understanding of peer evaluation in the OERu context.

Mackiwg (talk)‎

Some thoughts about Wayne's feedback...

Provide the option of a "Review-by date" - good idea to have some time structure. It helps to see several evaluations at one time for comparison.

Learner's view - The learner should see each complete evaluation. Perhaps the identity of the evaluator should be at the discretion of the evaluator or the instructor.

Some courses provide links to all learners' submissions and the cumulative score they received. Good opportunity to view what is considered "best."

Evaluation assignment - jumping ahead to who gets which submissions to evaluate. This a really interesting problem, so everyone gets "good" evaluations. One of courses made everyone do a practice evaluation before they gave you real ones to do.

Vtaylor (talk)‎

Thank You for your thoughts. The evaluation assignment is a really interesting problem. One way to do so is random assignment which is straightforward and gives a solution to most challenges in MOOC's. An alternate approach is a karma system where instead of assigning evaluations, learners are free to evaluate whomever they want to. But, there would be an algorithm where, say they get more "karma points" for the first few evaluations which they evaluate. A simple mockup for selecting submissions to evaluate based on this.Also, they get more points for evaluating those submissions which have least evaluations till now. There can be a system in place where their "karma points" is reduced if they are evaluating the same person for different activities suggesting that they are evaluating their friends. Although, this approach would need serious thought but will help in not only judging the credibility of the assignment but also calibrating them and making them more accurate. Further, if these points could be carried forward across courses it would open more opportunities.

Akash Agarwal (talk)‎

Thank You for the detailed feedback. Some of my personal thoughts about your comments and questions in the same order,

Mockup for submitting / registering an assignment:

I will keep this in mind.
Yes, or we could redirect them to the login page and then back once they are logged in.
Prepopulating the URL's would be possible if unique tags are used for each user and activity. A simple way to avoid wrong URL's could be to show the content of the given URL and ask the user whether the content is correct after checking the opt-in for evaluation button.
Yes, I had thought of it as a page that would mainly explain what peer evaluation is, including the requirements and workings. It will also need to contain the Terms of Service and associated legal aspects which I'm not much aware of.
I agree that there should be this option in case the learner later decides that he does not want to spend time on evaluation or does not want to be judged.
I did not include self evaluation in the UI mockups. We could also ask the learner to evaluate his own activity also in addition to evaluating the peers. We could then use it both for the grade and also to improve the "karma system".
Some prototype activities would be very useful for the project. There will be a lot to learn and iterate from using it in some activities of the OCL4Ed course.

Screen listing assigned evaluations: Yes, I think displaying the due dates and times in the user's time zone would help learners in completing the evaluations in time.

Evaluation submission screen:

I think we could simply open the link in a new tab and perhaps display a pop-up message so that they do not get lost.
If the user checks No, the remaining items should not appear and instead we could show a comment box asking the reason.
We could provide a link, or if it is short enough we could display it along with the questions itself.
I agree.

Do we need Mockup screens for:

Yes, We need to give a thought on the transparency of ratings. I am not sure that whether we should show the evaluator's identity, but we do need to show the detailed evaluation in order to get feedback on them and for evaluee to flag them in case he disagrees with it. I agree that prototyping and then asking real learners would be the best strategy.
Yes, I agree that we should provide it.

In my opinion, the best person to answer and debate about some of these and other aspects of Peer Evaluation would be OERu educators and course moderators/supervisors who have an experience with MOOC's or plan the conduct some in the future. A response from more OERu educators about some of these aspects would be extremely valuable to the project.

Akash Agarwal (talk)‎

Goals?

It would help my understanding of the possible technical solutions to understand the pedagogical goals of "peer evaluation" in the OERu.

Are educators interested in the scores given to other students? The comments?
Is the goal to drive engagement, by almost forcing students to read and respond to other students?
1. And if so, do they expect the student to revise a particular submission prior to the end of the course?
How should evaluations done by people outside the student cohort be weighted?
1. If this is part of the Academic Volunteers International initiative, how is the "community" valued?
2. If a former student of a particular course reviews current work, how is their opinion valued? (And does karma carry over?)
Is participating as an evaluator ever a requirement for completing the course? (certificate of completion? being granted credit by a partner institution?)
Is it useful to be able to report instances of suspected plagiarism?

JimTittsler (talk)‎

A few thoughts relating to the pedagogical aspects raised by Jim above:

Generally speaking, I think the learner (Evaluee) is more interested in the scores and comments than the educator for formative and learning support reasons. For courses operating at scale there may be too many ratings for educators to consider meaningfully. I do think educators would be interested in aggregated results eg number of ratings submitted, average ratings etc. This data would also be of interest to the learner group, so we need to think about how aggregated results are reported. A "live" feed of aggregated stats would be a nice feature. I can imagine scenarios where educators would be interested in the scores, eg a) Dealing with student appeals where they disagree with the ratings in cases where the quantum of the evaluation contributes to final achievement score. b) Cases where the system flags ratings deemed to be questionable.
From my perspective, the goal is to offer a range of options in OERu courses to improve the learning experience, taking into account that OERu learners participate for a range of different reasons. I don't think that learners who are popping in out of self interest should be "forced" to participate - peer evaluation should be an "opt in" component of the types of assessment and certification. For example, learners interested in receiving certification for participation could be required to participate.
Ideally, I would like to see a system which can accommodate evaluations done by individuals outside of the current course. Perhaps the system needs to identify "assigned raters" - those which the system assigns for the evaluation and "non-assigned raters" - then we can decide at a later stage how we implement or recognise these ratings.
I would recommend that the system flags an option whereby participating as an evaluator is a participation requirement for forms of credentialing. There may be instances where courses may not require participation as evaluator as a requirement.
I think it would be valuable to flag instances of suspected plagiarism -- however, we need to think carefully about privacy rights - particularly in cases where the alleged plagiarism is not validated. If we incorporate this kind of assessment -- my gut feel is that it needs to be confidential.

Mackiwg (talk)‎

Comments from Brian

Cumulative calibration

Congratulations Akash on this.

Can we get our hands on an existing specification for a peer assessment system that could be used as a starting point - perhaps a detailed description of the Moodle one? I have used it and found it fairly good. The one thing I think could be usefully added is some measure of competence of reviewers.

Off the top of my head I can think of 3 measurements that would be needed to be kept for each reviewer.

1. A measure of their tendency to err on the high or low side. This could be as simple as a percentage or number between -100 and + 100 (when grading out of 100)

2. A measure of the variability of their assessments. I'm not a statistician so i'm not sure how this might be expressed, but the idea is to quantify their tendancy to vary in their inaccuracy. A reviewer with a high error tendency but low variability is easier to adjust for than one with high variability.

3. Finally, some measure of our confidence in the above 2 measurements. Opportunities will constantly arise for modification of a learners calibration scores above. As the calibration scores are tweaked, each new piece of information carries less weight. A way to handle that might be that each type of calibration has a particular score - eg. I might be calibrated against 3 of my peers assessing the same work. That might have a confidence score of 5. A tutor, perhaps at random, might grade the same piece of work and my ability calibrated against that with a score of 10. I might have my score compared to another student who has already been calibrated by a tutor and calibrated accordingly and this might have a value of 3. After these 3 calibrations, the confidence of my calibration might be scored at 5+10+3 = 13. If another tutor calibrates me, their calibration would be weighted with the existing one in a ration of 13 to 5. Feedback on reviews might also be included. This is not well thought out but I hope you get the idea.

In terms of the algorithm (and I think I've mentioned this before) I think it would be efficient if a tutor could grade an assignment and learners who assessed the same piece of work would be calibrated from this (added confidence 5?). Then their grades of other assignments would be compared with other learners and their ability calibrated (confidence 3?) - this could then be repeated and the next calibration might have a confidence addition of 1. I hope you get the idea.

The idea of assigning a confidence measure to calibration would work best over many assignments so it may be necessary to have a mechanism for transferring between courses.

You would have to be sure that the algorithm did not end up doing silly things like getting into a recursive loop particularly with positive (negative?) feedback.

I can immediately think of a silly outcome where the learner could end up with a score greater than a tutor. Maybe tutor's have a score of 80 and learners move asymtotically towards that score (Perhaps we will be able to prove that some learners can be more reliable than tutors - now there's a challenge - have some measure of the ability of tutors built in as well)

Apologies for the stream of consciousness. I do believe that peer assessment will prove to be the most powerful tool in our arsenal eventually for cutting the cost of accredited education. I may be wrong.

Brian

Mackiwg (talk)‎

Comments from Mika in response to Brian[edit]

Evaluation reliability

To add to Brian's ideas: if you have a clear rubric for evaluators to follow, you'll improve the reliability. You could even eventually build a library, for each task, of sample work exemplifying different levels of rating, so that everyone can calibrate themselves. But if all you do is ask people to evaluate on a 10-point scale, you're going to get a lot of variability.

Mika

Mackiwg (talk)‎

Response to Mika's comments

Agreed -- the system will need to cater for clear rubrics with criteria and specifications of what is required for each grade level to improve reliability. I also think using broad bands of performance eg Unsatisfactory = 1 - 4; Acceptable = 5 - 6 and Excellent = 7 - 10 would mitigate against some of these challenges. There is always and element of subjectivity when humans are expressing value judgements, even in rigorous systems. I also think that its pedagogically sound to incorporate a self-evaluation. Should the system detect significant deviations of the peer-evaluations when compared with the self-evaluations - these evaluations could be flagged by the system.

Also, in the case of using peer-evaluation to assist with scaling feedback for formative assessment it is possible to focus on more objectively verifiable criteria, for example "did the post meet the minimum word count" (taking into account that we could automate this kind of criterion later in the process) or "did the post respond to the three questions".

In the OCL4Ed course, for instance, we have specified a minimum number of substantive blog posts in order to qualify for certification of participation. Sometimes the learners submit the wrong url for their posts. At this time checking the urls is a manual process and is not scalable. Peer evaluation could assist in scaling the implementation of this kind of participation metric.

Mackiwg (talk)‎

Response to Brian's feedback

Brain - thanks for that feedback - valuable ideas and foundations we need to incorporate into the design of the fist step in the project.

An important facet of this opportunity for comment is to gain sufficient information to help Akash develop specifications for the first prototype. We follow an incremental design approach at the OER Foundation whereby we focus on small steps of implementable code following a learn by doing approach.

The idea of developing mathematical models to determine the "confidence of the reviewer" over time is important, similar to Jim's concept of developing a karma system.

Once we get to the point of implementing a specification for the first prototype, I would like to make a call to OERu partners to identify one or two statisticians or applied mathematicians who could advise on decisions for the mathematical model for the first iteration.

We will also need to be realistic - the GSoC project only allows for 3 months of coding and its unlikely that we are going to be able to build a solution which is going to address all the complexities associated with peer evaluation systems. However, I guess that we will be able to build something useful as a first step towards the next iteration.

Mackiwg (talk)‎

A colleague just sent me a link to this paper which is relevant: "Tuned Models of Peer Assessment in MOOCs" I have not got around to reading it but thought it would be useful so I'm posting it immediately. http://www.stanford.edu/~cpiech/bio/papers/tuningPeerGrading.pdf

Chris Piech Stanford University piech@cs.stanford.edu Jonathan Huang Stanford University jhuang11@stanford.com Zhenghao Chen Coursera zhenghao@coursera.org Chuong Do Coursera cdo@coursera.org Andrew Ng Coursera ng@coursera.org Daphne Koller Coursera koller@coursera.org

Brian

Brianmmulligan (talk)‎

Thanks for posting the link to that paper!

As a technician, it suggests several elements that need consideration:

having students review submissions that have been reviewed by "experts" (ground truth) which is a variation on Mika's comment about a library of sample works
partitioning reviewers by native language in an attempt to remove that bias
recording "time spent grading" a submission is challenging in a distributed environment like the OERu courses that have been offered to date
- (Their "sweet spot" of 20 minutes spent grading an assignment sounds like a significant time commitment for our mOOC assignments.)
if karma is used, it maybe necessary to factor the marks an evaluator has received, not just those he has given (and had commented on)
a large discrepancy in scores might signal the need to add additional reviewers of a particular submission
how to present scores in a meaningful way especially if there are different weights being applied, or some evaluations are discarded, etc. in an environment where individual evaluations are open

JimTittsler (talk)‎

A few thoughts on the peer evaluation proposal.

First off -- Akash belated congratulations on being accepted for this years GSoC on WikiEducator. I was travelling offshore at the time the announcement was made.

Reading your document a few thoughts come to mind;

I think its appropriate to have each learner "opt in" at the individual task level for peer review rather than the course level. In the MOOC environment some learners may only be interested in parts of a course. Opting in at the task level will avoid problems of folk who leave the course and do not complete their peer-evaluation tasks.
Mapping to rubrics is an excellent idea - perhaps a link to a wiki page. Here is an example of a rubric, which is too detailed for a shorter task - but illustrates how grades can be associated with different criteria. The model of specifying a rubric will create flexibility without overcomplicating the coding interface.
I would suggest that the course designer specifies the weightings for grade versus participation rather than the system pre-determining this weighting.
- If learners opt in at the task level - we will need to think of a mechanism to aggregate the participation for all tasks within a course.
Should we decide on a maximum number of categories - or is this indefinite?
If we have catagories, we should also have the ability to apply weightings to the categories where the sum of the weightings of all categories = 100%.
What do you mean by 3 grades from a scale of 1-10? Do you mean, for example A, B and C?
Your point is well made about the problems of using the average score where there our outliers. My gut feel is that an absolute deviation, for example 1.5 points is going to be problematic for a small number of responses. Perhaps the standard deviation or a z-score might be a better measure for reliability. I think we should consult with a statistician for advice on the best measure.
The other idea I have is that there is research to support the validity and reliability of self-evaluations when compared with the teacher evaluations. There are also sound pedagogical reasons for implementing self evaluations. It would be a good idea to implement a self-evaluation option and then decisions on "counting" or "flagging" problem evaluations where the rater is not being fair could be based on deviations from the self-evaluation.
I think random assignment will mitigate against "gaming" the system. If only 3 people are evaluating discarding outliers may not generate reliable results - -the sample is too small. Perhaps using a broad scale parameters like "Poor" "Satisfactory", and "Excellent" could help or mitigate the problem. Ultimately, if there is a major difference in the grades, it would be better to flag the grade as problematic requiring review by the instructor or some other mechanism.
Deadlines are important. I suggest we have a mechanism to email reminders that a peer evaluation is due. Open learners are notorious in forgetting deadlines ;-)
For the prototype phases - I suggest that we apply peer grading for formative assessment (i.e. feedback on the learning process) rather than contribution to final grades. Most institutions would be uncomfortable with using peer evaluation at this time as a contribution to final grades, however many institutions would recognise the participation metrics as part of the final grade.

This is a great start! Looking forward to this development.

Mackiwg (talk)‎

9. Rather than random assignment (which might be problematic with course dropout rates) an alternative might be a karma system:

that values being among the first n people to evaluate a submission (to reduce the chance of some submissions being reviewed at all)
that reduces the karma earned for
- each review you do of a particular peer (reduces grading your friends)
- each review you do of a particular assignment (discourages someone trying to review everyone)

JimTittsler (talk)‎

Peer evaluation works well in cohort based course, and we implemented the same in a social science course sometime back.Our approach was to make groups of 10 students, ask each one of them to assess others contribution in a scale of 1-10 using a set of guidelines, and average the score. We made it compulsory for each one in the group to respond to 9 other responses. For the final grade/mark, we gave 50% weight to the peer assessment, and the rest 50% to teacher evaluation, which included criteria related to number of posts by an individual to others work, and quality of contribution.

All the best, Aakash. --Missan 04:39, 12 May 2014 (UTC)

Missan (talk)‎

Thank you for sharing your valuable experience. In my opinion such an approach would work well for a course which is well moderated and with a relatively low number of students. The problem with it could be that the students among the same group may be friends or become so during the duration of the course and give very good grades to everyone they are to evaluate. Also, in some courses it may not be possible for the teacher to look at all submissions.

Akash Agarwal (talk)‎

2. I agree. Mapping to rubrics will also let us ask better questions than asking to grade on a scale of 1-10 to improve the reliability, as suggested by Mika. For example, for grading a blog post on open education, questions like whether it talks on scalability (Yes/No), Content is well researched (Yes,No), etc can be asked. I think that using such questions the participation metrics can be accurately and reliably measured by peer evaluation. For measuring the quality - questions like - quality of the post: A. Unsatisfactory = 1 - 4; B. Acceptable = 5 - 6 ; C. Excellent = 7 - 10 ? may be asked as you suggested.

3. I think it would be also good to leave a default and let the course instructor modify it if needed. Many instructors may not want to spend a lot of time configuring the system or think about what is a good distribution, but prefer to use it as is.

Yes, we would need such a mechanism. We need something which would predict the overall participation in the course based on individual tasks. In my opinion this would not be trivial. There might be some learners who participate in say activities 1 and 3 and others say only in activity 5 (suppose it is the last activity of the course). Now, the last activity may be the only major one in the course while others being trivial. The learners who had participated in the last activity only may have done so as they had some previous experience and for that course it is sufficient participation. How do we decide on the participation of these two type of learners is an important question. We, would need to think of such issues at an individual course level and judge how to aggregate the participation.

4-5. There should be a limited number of these that add up to 100% and should be specified by the course instructor/moderator. But, instead of categories being rated from 1-10, we could also have a whole activity judged by say 10-15 questions of the Yes/No or "Poor"/"Satisfactory"/"Excellent form.

6. Assuming that n people evaluate an assignment we would have n answers/points for a particular category of an activity. It would not be the case if something like the karma system would be used.

7. I agree it is naive to use an absolute deviation of 1.5 points.

8. We could have Self Evaluation along with Peer Evaluation. The workshop module of Moodle requires both to be done. In it, if the self grade is close enough to the peer grade then self evaluated grade is assigned otherwise there is some form of mathematical calibration of the grade. Now, we could also have the teachers grade for a subset of the submissions for each activity. Everyone could be able to flag both self and peer grades based on deviations.

11. Starting off with formative assessment would be great.

Akash Agarwal (talk)‎

Peer Evaluation

Can software like Letsgeddit be used?

Sebastian Panakal (talk)‎

In my view, geddit is more of a tool which can be used mainly in conventional courses to get real time feedback about how a lesson is going and getting an idea of the students level of understanding. It mainly allows the instructor to privately get the views of the learners. I don't see it as directly relevant for peer Evaluation or in the context of Massive online courses. Please throw more light on this.

Akash Agarwal (talk)‎

Talk:Peer Evaluation

Contents

End of project review

Rubrics for Peer Evaluation

Feedback on user interface Mockups

Goals?

Comments from Brian

Comments from Mika in response to Brian[edit]

A few thoughts on the peer evaluation proposal.

Peer Evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Community

Print/export

Tools