Study 4: Competence Begets Calibration
Table of Contents
Our central argument is that incompetent individuals lack the metacognitive skills that enable them to tell how poorly they are performing.
As a result, they come to hold inflated views of their performance and ability.
We have shown that incompetents:
- are unaware of their deficient abilities (Studies 1 through 3)
- show deficient metacognitive skills (Study 3).
The best acid test of our proposition, however, is to manipulate competence and see if this improves metacognitive skills and thus the accuracy of self-appraisals (Prediction 4).
This would:
- enable us to speak directly to causality
- help rule out the regression effect alternative account discussed earlier.
If the incompetent overestimate themselves because their test scores are very low (the regression effect), then manipulating competence after they take the test should have no effect.
If instead it takes competence to recognize competence, then manipulating competence should enable the incompetent to recognize that they have performed poorly.
This suggests that the way to make incompetent individuals realize their own incompetence is to make them competent.
In Study 4, that is precisely what we set out to do.
We gave participants a test of logic based on the Wason selection task (Wason, 1966) and asked them to assess themselves.
We then gave half of the participants a short training session designed to improve their logical reasoning skills.
Finally, we tested the metacognitive skills of all participants by asking them to indicate which items they had answered correctly and which incorrectly (after McPherson & Thomas, 1989) and to rate their ability and test performance once more.
We predicted that training would provide incompetent individuals with the metacognitive skills needed to realize that they had performed poorly and thus would help them realize the limitations of their ability.
Specifically, we expected that the training would
- (a) improve the ability of the incompetent to evaluate which test problems they had answered correctly and which incorrectly
- in the process, (b) reduce the miscalibration of their ability estimates.
Method
Participants were 140 Cornell University undergraduates from a single human development course who earned extra credit toward their course grades for participating.
Data from 4 additional participants were excluded because they failed to complete the dependent measures.
They completed the study in groups of 4 to 20 individuals.
On arriving at the laboratory, participants were told that they would be given a test of logical reasoning as part of a study of logic.
The test contained 10 problems based on the Wason selection task (Wason, 1966).
Each problem described 4 cards (e.g., A, 7, B, and 4) and a rule about the cards (e.g., “If the card has a vowel on one side, then it must have an odd number on the other”).
Participants then were instructed to indicate which card or cards must be turned over in order to test the rule.4
After taking the test, participants were asked to rate their logical reasoning skills and performance on the test relative to their classmates on a percentile scale. They also estimated the number of problems they had solved correctly. '
Next, a random selection of 70 participants were given a short logicalreasoning training packet.
Modeled after work by Cheng and her colleagues (Cheng, Holyoak, Nisbett, & Oliver, 1986), this packet described techniques for testing the veracity of logical syllogisms such as the Wason selection task.
The remaining 70 participants encountered an unrelated filler task that took about the same amount of time (10 min) as did the training packet.
Afterward, participants in both conditions completed a metacognition task in which they went through their own tests and indicated which problems they thought they had answered correctly and which incorrectly.
Participants then re-estimated the total number of problems they had answered correctly and compared themselves with their peers in terms of their general logical reasoning ability and their test performance.
Results and Discussion
Retraining self-assessments.
Prior to training, participants displayed a pattern of results strikingly similar to that of the previous 3 studies.
- Participants overall overestimated their logical reasoning ability (M percentile = 64) and test performance (M percentile = 61) relative to their peers, paired ts(139) = 5.88 and 4.53, respectively, ps < .0001. Participants also overestimated their raw score on the test, M = 6.6 (perceived) versus 4.9 (actual), t(l39) = 5.95, p < .0001. As before, perceptions of raw test score, percentile ability, and percentile test score correlated positively with actual test performance, rs(138) = .50, .38, and .40, respectively, ps < .0001.
Once again, individuals scoring in the bottom quartile (n = 37) were oblivious to their poor performance. Although their score on the test put them in the 13th percentile, they estimated their logical reasoning ability to be in the 55th percentile and their performance on the test to be in the 53rd percentile. Although neither of these estimates were significantly greater than 50, t(36) = 1.49 and 0.81, they were considerably greater than their actual percentile ranking, ts(36) > 10, ps < .0001. Participants in the bottom quartile also overestimated their raw score on the test. On average, they thought they had answered 5.5 problems correctly. In fact, they had answered an average of 0.3 problems correctly, t(36) = 10.75, p < .0001.
As Figure 4 illustrates, the level of overestimation once again decreased with each step up the quartile ladder. As in the previous studies, participants in the top quartile underestimated their ability. Whereas their actual performance put them in the 90th percentile, they thought their general logical reasoning ability fell in the 76th percentile and their performance on the test in the 79th percentile, ts(27) < -3.00, ps < .001.
Top-quartile participants also underestimated their raw score on the test (by just over 1 point), but given that they all achieved perfect scores, this is hardly surprising. Impact of training. Our primary hypothesis was that training in logical reasoning would turn the incompetent participants into experts, thus providing them with the skills necessary to recognize the limitations of their ability. Specifically, we expected that the training packet would (a) improve the ability of the incompetent to monitor which test problems they had answered correctly and which incorrectly and, thus, (b) reduce the miscalibration of their self-impressions.
Scores on the metacognition task supported the first part of this prediction. To assess participants’ metacognitive skills, we summed the number of questions each participant accurately identified as correct or incorrect, out of the 10 problems. Overall, participants who received the training packet graded their own tests more accurately (M = 9.3) than did participants who did not receive the packet (M = 6.3), t(l38) = 7.32, p < .0001, a difference even more pronounced when looking at bottom-quartile participants exclusively, Ms = 9.3 versus 3.5, t(36) = 7.18, p < .0001. In fact, the training packet was so successful that those who had originally scored in the bottom quartile were just as accurate in monitoring their test performance as were those who had ini4 A and 4.
Figure 4. Perceived logical reasoning ability and test performance as a function of actual test performance (Study 4). tially scored in the top quartile, Ms = 9.3 and 9.9, respectively, t(30) = 1.38, ns. In other words, the incompetent had become experts.
To test the second part of our prediction, we examined the impact of training on participants’ self-impressions in a series of 2 (training: yes or no) x 2 (pre- vs. postmanipulation) X 4 (quartile: 1 through 4) mixed-model analyses of variance (ANOVAs). These analyses revealed the expected three-way interactions for estimates of general abil:ity, F(3, 132) = 2.49, p < .07, percentile score on the test, F(3, 132) = 8.32, p < .001, and raw test score,
Table 2
F(3, 132) = 19.67, p < .0001, indicating that the impact of training on self-assessment depended on participants’ initial test performance. Table 2 displays how training influenced the degree of miscalibration participants exhibited for each measure.
To examine these interactions in greater detail, we conducted two sets of 2 (training: yes or no) X 2 (pre- vs. postmanipulation) ANOV As. The first looked at participants in the bottom quartile, the second at participants in the top quartile. Among bottomquartile participants, we found the expected interactions for estimates of logical reasoning ability, F(l, 35) = 6.67, p < .02, percentile test score, F(l, 35) = 14.30, p < .002, and raw test score, F(l, 35) = 41.0, p < .0001, indicating that the change in participants’ estimates of their ability and test performance depended on whether they had received training.
As Table 2 depicts, participants in the bottom quartile who had received training (n = 19) became more calibrated in every way.
Before receiving the training packet, these participants believed that their ability fell in the 55th percentile, that their performance on the test fell in the 5lst percentile, and that they had answered 5.3 problems correctly. After training, these same participants thought their ability fell in the 44th percentile, their test in the 32nd percentile, and that they had answered only 1.0 problems correctly. Each of these changes from pre- to posttraining was significant, t(l8) = -2.53, -5.42, and -6.05, respectively, ps < .03.
To be sure, participants still overestimated their logical reasoning ability, t(l8) = 5.16, p < .0001, and their performance on the test relative to their peers, t(18) = 3.30, p < .005, but they were considerably more calibrated overall and were no longer miscalibrated with respect to their raw test score, t(18) = 1.50, ns. No such increase in calibration was found for bottom-quartile participants in the untrained group (n = 18). As Table 2 shows, they initially reported that both their ability and score on the test fell in the 55th percentile, and did not change those estimates in their second set of self-ratings, all ts < 1. Their estimates of their raw test score, however, did change-but in the wrong direction.
Self-Ratings in Percentile Terms of Ability and Performance for Trained and Untrained Participants (Study 4) Untrained