Good peer reviews can be very effective at finding bugs .. but bad peer reviews can be nearly useless. I teach a course that involves a semester-long software project and found it really difficult to get students to do proper peer reviews. Basically, they were going through the motions but didn't produce much (i.e., didn't find many bugs). The emphasis on the materials I provided them was checklists for what to look for, but not a lot of formality in reporting results, because I wanted to minimize paperwork. That didn't work. Teams were only finding about 10% of their bugs in reviews, which is just not enough to be worth the effort.
So one year I changed things, and made them to use a variant of the below spreadsheet to report results. The results were dramatic. The first semester I used the spreadsheet, defects found via peer review went from about 10% to about 50% across the class, and have stayed there ever since. (Also, the effort profile changed dramatically for the better, but that's a topic for another posting.) In my experience, finding 50% of bugs or so in peer review is about right. So, while I'm not claiming ultimate academic rigor for this study, it seems based on this experience that adopting a spreadsheet like this can be effective at improving the quality of peer reviews.
Here is the spreadsheet (click to download the .XLS file):
The first few lines are to record the project name, date, artifact (e.g., file name of the code being reviewed), and names of the reviewers present. # Bugs is filled out later. The artifact author is intentionally omitted because the records are about finding bugs, not blaming the author. The issue rows are a place to record issues found in the review in free text, usually only one or two sentences apiece.
The status fields are left blank during the review. However, within 24 hours after the review has taken place the author of the item being reviewed needs to update it to indicate "Fixed", "Not Fixed," or "Not a Bug." The idea here is that if it is easy to fix, the burden to record the bug and fix it is minimal -- it stays in the spreadsheet. But if a bug is too hard to fix immediately, it is "Not Fixed" and must be entered as a formal bug report into the bug tracking system (Bugzilla or otherwise). Some items turn out not to be bugs, and it is OK to record them as such (e.g., a feature request or a misunderstanding by the reviewers). When the program author has updated the status, the # Bugs line is updated to reflect the number of bugs actually found, and that number is rolled up to a project tracking spreadsheet.
This last piece about rolling up the # Bugs to a higher level of visibility is crucial. In my course I have the Teaching Assistants track the number of bugs found weekly for every project team and make sure they asked hard questions if the answers were consistently zero. Really, that's all it took. Apparently if the teams know someone is watching, they'll look a bit harder to find bugs, and once they do it seems that the process boot-strapped into fairly effective reviews with minimal help from the course staff. It's worked several years in a row for me -- almost like a light switch had been flipped for the students in my class. Results have been pretty consistent since we started using the spreadsheet at about 50% of bugs found in peer reviews across dozens of teams. It should be noted that we peer review not only code, but also tests, statecharts, sequence diagrams, and other artifacts for our projects, and the payoff in finding bugs early has been unmistakable. Of course I need to make a "Your Mileage May Vary" disclaimer here, but it's worked for me.
I'd be interested in hearing stories about simple ways to make reviews more effective from industry teams as well. Ideally each team gets solid training on a review process along with help on soft skills for review leaders. But realistically a lot of times a bunch of engineers are just tossed into a room and told to make it happen. Knowing tricks that help is not a bad thing.