Study 2: Logical Reasoning
Table of Contents
Study 3 (Phase I): Grammar
Study 3 was conducted in two phases. The first phase consisted of a replication of the frrst two studies in a third domain, one requiring knowledge of clear and decisive rules and facts: grammar. People may differ in the worth they assign to American Standard Written English (ASWE), but they do agree that such a standard exists, and they differ in their ability to produce and recognize written documents that conform to that standard. Thus, in Study 3 we asked participants to complete a test assessing their knowledge of ASWE. We also asked them to rate their overall ability to recognize correct grammar, how their test performance compared with that of their peers, and finally how many items they had answered correctly on the test. In this way, we could see if those who did poorly would recognize that fact.
Method
Participants. Participants were 84 Cornell University undergraduates who received extra credit toward their course grade for ta!cing part in the study.
Procedure. The basic procedure and primary dependent measures were similar to those of Study 2. One major change was that of domain. Participants completed a 20-item test of grammar, with questions taken from a National Teacher Examination preparation guide (Bobrow et al., 1989). Each test item contained a sentence with a specific portion underlined. Participants were to judge whether the underlined portion was grammatically correct or should be changed to one of four different rewordings displayed.
After completing the test, participants compared their general ability to “identify grammatically correct standard English” with that of other students from their class on the same percentile scale used in the previous studies. As in Study 2, participants also estimated the percentile rank of their test performance among their student peers, as well as the number of individual test items they had answered correctly .
Results and Discussion
As in Studies 1 and 2, participants overestimated their ability and performance relative to objective criteria. On average, participants’ estimates of their grammar ability (M percentile= 71) and performance on the test (M percentile = 68) exceeded the actual mean of 50, one-sample ts(83) = 5.90 and 5.13, respectively, ps < .0001. Participants also overestimated the number of items they answered correctly, M = 15.2 (perceived) versus 13.3 (actual), t(83) = 6.63, p < .0001. Although participants’ perceptions of their general grammar ability were uncorrelated with their actual test scores, r(82) = .14, ns, their perceptions of how their test performance would rank among their peers was correlated with their actual score, albeit to a marginal degree, r(82) = .19,p < .09, as was their direct estimate of their raw test score, r(82) = .54, p < . 0001.
As Figure 3 illustrates, participants scoring in the bottom quartile grossly overestimated their ability relative to their peers. Whereas bottom-quartile participants (n = 17) scored in the 10th percentile on average, they estimated their grammar ability and performance on the test to be in the 67th and 61st percentiles, respectively, ts(16) = 13.68 and 15.75, ps < .0001. Bottomquartile participants also overestimated their raw score on the test by 3.7 points, M = 12.9 (perceived) versus 9.2 (actual), t(l6) = 5.79, p < .0001.
As in previous studies, participants falling in other quartiles overestimated their ability and performance much less than did those in the bottom quartile. However, as Figure 3 shows, those in the top quartile once again underestimated themselves. Whereas their test performance fell in the 89th percentile among their peers, they rated their. ability to be in the 72nd percentile and their test performance in the 70th percentile, ts(l8) = -4.73 and -5.08, respectively, ps < .0001. Top-quartile participants did not, however, underestimate their raw score on the test, M = 16.9 (perceived) versus 16.4 (actual), t(18) = 1.37, ns.
Study 3 (Phase 2): It Takes One to Know One Thus far, we have shown that people who lack the knowledge or wisdom to perform well are often unaware of this fact. We attribute this lack of awareness to a deficit in metacognitive skill. That is, the same incompetence that leads them to make wrong choices also deprives them of the savvy necessary to recognize competence, be it their own or anyone else’s.
Figure 3. Perceived grammar ability and test performance as a function of actual test performance (Study 3).
We designed a second phase of Study 3 to put the latter half of this claim to a test. Several weeks after the first phase of Study 3, we invited the bottom- and top-quartile performers from this study back to the laboratory for a follow-up. There, we gave each group the tests of five of their peers to “grade” and asked them to assess how competent each target had been in completing the test. In keeping with Prediction 2, we expected that bottom-quartile participants would have more trouble with this metacognitive task than would their top-quartile counterparts .
This study also enabled us to explore Prediction 3, that incompetent individuals fail to gain insight into their own incompetence by observing the behavior of other people. One of the ways people gain insight into their own competence is by comparing themselves with others (Festinger, 1954; Gilbert, Giesler, & Morris, 1995). We reasoned that if the incompetent cannot recognize competence in others, then they will be unable to make use of this social comparison opportunity. To test this prediction, we asked participants to reassess themselves after they have seen the responses of their peers. We predicted that despite seeing the superior test performances of their classmates, bottom-quartile participants would continue to believe that they had performed competently.
In contrast, we expected that top-quartile participants, because they have the metacognitive skill to recognize competence and incompetence in others, would revise their self-ratings after the grading task. In particular, we predicted that they would recognize that the performances of the five individuals they evaluated were inferior to their own, and thus would raise their estimates of their percentile ranking accordingly. That is, top-quartile participants would learn from observing the responses of others, whereas bottom-quartile participants would not.
In making these predictions, we felt that we could account for an anomaly that appeared in all three previous studies: Despite the fact that top-quartile participants were far more calibrated than were their less skilled counterparts, they tended to underestimate their performance relative to their peers. We felt that this miscalibration had a different source then the miscalibration evidenced by bottom-quartile participants. That is, top-quartile participants did not underestimate themselves because they were wrong about their own performances, but rather because they were wrong about the performances of their peers. In essence, we believe they fell prey to the false-consensus effect (Ross, Greene, & House, 1977). In the absence of data to the contrary, they mistakenly assumed that their peers would tend provide the same (correct) answers as they themselves-an impression that could be immediately corrected by showing them the performances of their peers. By examining the extent to which competent individuals revised their ability estimates after grading the tests of their less competent peers, we could put this false-consensus interpretation to a test.
Method
Participants. Four to six weeks after Phase 1 of Study 3 was completed, we invited participants from the bottom- (n = 17) and top-quartile (n = 19) back to the laboratory in exchange for extra credit or $5. All agreed and participated.
Procedure. On arriving at the laboratory, participants received a packet of five tests that had been completed by other students in the first phase of Study 3. The tests reflected the range of performances that their peers had achieved in the study (i.e., they had the same mean and standard
Table 1
Self-Ratings (Percentile Scales) of Ability and Performance on Test Before and After Grading Task for Bottom- and Top-Quartile Participants (Study 3, Phase 2)
Participant quartile Bottom Top Rating Percentile ability Percentile test score Raw test score Percentile ability Percentile test score Raw test score Before 66.8 60.5 12.9 After 63.2 65.4 13.7 Difference -3.5 4.9 0.8 Actual 10.l 10.1 9.2
- p ~ .05. ** p < .01. deviation), a fact we shared with participants. We then asked participants to grade each test by indicating the number of questions they thought each of the five test-takers had answered correctly.
After this, participants were shown their own test again and were asked to re-rate their ability and performance on the test relative to their peers, using the same percentile scales as before. They also re-estimated the number of test questions they had answered correctly.
Results and Discussion
Ability to assess competence in others. As predicted, participants who scored in the bottom quartile were less able to gauge the competence of others than were their top-quartile counterparts. For each participant, we co:r;related the grade he or she gave each test with the actual score the five test-takers had attained. Bottomquartile participants achieved lower correlations (mean r = .37) than did top-quartile participants (mean r = .66), t(34) = 2.09, p < .05.3 For an alternative measure, we summed the absolute miscalibration in the grades participants gave the five test-takers and found similar results, M = 17.4 (bottom quartile) vs. 9.2 (top quartile), t(34) = 2.49, p < .02.
Revising self-assessments. Table displays the selfassessments of bottom- and top-quartile performers before and after reviewing the answers of the test-takers shown during the grading task. As can be seen, bottom-quartile participants failed to gain insight into their own performance after seeing the more competent choices of their peers. If anything, bottom-quartile participants tended to raise their already inflated self-estimates, although not to a significant degree, all ts(16) < 1.7.
With top-quartile participants, a completely different picture emerged. As predicted, after grading the test performance of five of their peers, top-quartile participants raised their estimates of theirown general grammar ability, t(l8) = 2.07, p = .05, and their percentile ranking on the test, t(18) = 3.61, p < .005. These results are consistent with the false-consensus effect account we have offered. Armed with the ability to assess competence and incompetence in others, participants in the top quartile realized that the performances of the five individuals they evaluated (and thus their peers in general) were inferior to their own. As a consequence, top-quartile participants became better calibrated with respect to their percentile ranking. Note that a false-consensus interpretation does not predict any revision for estimates of one’s raw score, as learning of the poor performance of one’s peers conveys no
71.6 69.5 16.9 77.2 79.7 16.6 5.6* 10.2** -0.3 88.7 88.7 16.4
information about how well one has performed in absolute terms. Indeed, as Table 1 shows, no revision occurred, t(l8) < 1. Summary. In sum, Phase 2 of Study 3 revealed several effects of interests. First, consistent with Prediction 2, participants in the bottom quartile demonstrated deficient metacognitive skills. Compared with top-quartile performers, incompetent individuals were less able to recognize competence in others. We are reminded of what Richard Nisbett said of the late, great giant of psychology, Amos Tversky. “The quicker you realize that Amos is smarter than you, the smarter you yourself must be” (R. E. Nisbett, personal communication, July 28, 1998). This study also supported Prediction 3, that incompetent individuals fail to gain insight into their own incompetence by observing the behavior of other people. Despite seeing the superior performances of their peers, bottom-quartile participants continued to hold the mistaken impression that they had performed just fine. The story for high-performing participants, however, was quite different. The accuracy of their self-appraisals did improve. We attribute this finding to a false-consensus effect. Simply put, because top-quartile participants performed so adeptly, they assumed the same was true of their peers. After seeing the performances of others, however, they were disabused of this notion, and thus the they improved the accuracy of their self-appraisals. Thus, the miscalibration of the incompetent stems from an error about the self, whereas the miscalibration of the highly competent stems from an error about others.