Performance evaluation systems are critical for evaluating, motivating, and rewarding employees, but companies are hard-pressed to find a system that achieves these goals and promotes fair evaluations. Inconsistencies and biases in the evaluation process can leave employees dissatisfied and demotivated, especially if undeserving employees are rewarded and recognized while more deserving employees are left empty-handed.
These challenges are particularly pronounced in professional settings where objective measures of performance can be hard to capture and performance evaluations are more subjective. Subjectivity can allow inconsistencies and biases to creep into performance ratings. Differences between supervisors contribute to these inconsistencies, as one supervisor’s rating of five might be someone else’s three. Some supervisors also show favoritism, inflate ratings, or use inconsistent standards for different employees. This can be true even in organizations that have moved away from formal end-of-year evaluations to more-frequent and informal feedback sessions.
So how can inconsistencies and biases be minimized or eliminated? How can the fairness of the system be improved? Are there ways of strengthening the link between performance and rewards?
One approach some companies have taken is to use calibration committees, which are generally composed of higher-level supervisors. These committees adjust the ratings supervisors give employees, in an effort to improve consistency. To understand the role of these committees in performance evaluation systems, we collaborated with a multinational organization to study its use of calibration committees over a three-year period.
Their evaluation process starts with supervisors subjectively rating employees’ performance. The ratings of employees for each level or cohort are then passed to a calibration committee, which is composed of supervisors and other higher-level managers. The calibration committee meets to achieve a common understanding of the types of achievements and contributions that warrant various performance ratings. Based on this understanding, the committee then determines whether to adjust individual performance ratings. Once the committee has determined the final ratings, supervisors hold meetings with employees to discuss their ratings.
It may seem counterintuitive to allow a committee to adjust the ratings of employees they generally do not observe firsthand. But even though supervisors may have better information than calibration committees about the performance of individual employees, they do not know how the ratings they assign compare with ratings given by other supervisors. The committee has this macro-level knowledge, which enables them to assess ratings across all supervisors and promote greater consistency in performance ratings.
Our study, which is forthcoming in Management Science, found that the calibration committee adjusted ratings 25% of the time in the organization we studied. Ratings were decreased four times as often as ratings were increased, which lowered the overall average rating. Downward adjustments were also larger than upward adjustments.
Ratings were more likely to be adjusted downward when given by a supervisor who tended to give higher-than-average ratings, while ratings were more likely to be adjusted upward when given by a supervisor who tended to give lower-than-average ratings. This addresses the common concern that some supervisors are more lenient in giving ratings while others have stricter rating standards. Because of the calibration adjustments, the final ratings were more consistent across supervisors.
Not only did the calibration process contribute to improved consistency, but supervisors also modified their rating behavior in response to the process. Interestingly, a supervisor’s reaction depended on the direction of the adjustment. If an employee’s rating was increased by the calibration committee, the supervisor gave a higher rating to that employee in the next period, essentially matching the calibration committee’s adjustment from the previous period. Adjustments that resulted in a lower rating, however, were only partially incorporated into the next period. Supervisors gave a lower rating, but not to the full extent of the calibration committee’s downward adjustment from the previous period.
One of the potential drawbacks of calibrating ratings, which we observed in this organization, is a decrease in the variation in ratings. The committees were more likely to adjust ratings that were higher or lower than average, but did not make as many adjustments to average ratings. This pattern of adjustments resulted in a tighter distribution of ratings, with less differentiation between employees. This can be problematic, as a lack of variation makes it more challenging to identify high performers (for promotion or recognition) and low performers (for additional training or other remedial action).
We also surveyed 220 employees and 47 supervisors to assess their perceptions of the fairness of the performance evaluation system. On average, employees believed the outcome of the evaluation process was fair, but they were not completely satisfied with the system itself, in part because they perceived favoritism to be an issue. We also noted, perhaps not surprisingly, that higher-performing employees reported higher levels of perceived fairness and satisfaction with the performance evaluation system and less perceived favoritism relative to lower-performing employees.
Supervisors generally believed both that the outcomes and the process of evaluations were fair and that favoritism was not an issue. However, they were dissatisfied with the time demands the system imposed.
Complex knowledge-based work is notoriously difficult to measure, often forcing companies to use subjective performance evaluation. While subjective evaluation incorporates many aspects of job performance that are hard to capture, it leaves employees vulnerable to supervisor biases and inconsistencies in rating standards across supervisors that could affect performance ratings. In the organization we studied, we found the use of calibration committees overcame many of these challenges, and the benefits generally outweighed the costs.