The Likert Scale Debate: Reliability & Validity
Introduced by Renis Likert in 1932 in his work, “A Technique for the Measurement of Attitudes,” Likert scales are commonly used in questionnaires—from simple surveys to academic research—to collect opinion data. Since 1932, a great deal of debate has surrounded what features and factors might improve the reliability and validity of the scale—attempting to answer the question: how should we design the scale such that respondents provide the answer closest to the actual truth, i.e. their true answer devoid of influence or bias?
Of course, minimizing this potential at the individual level would improve reliability and validity of the data collected by the questionnaire as a whole—that the data would be replicable if surveyed, again, and that the data reflects the true opinions of the population surveyed.
Firstly, should we use more points or less? And perhaps the largest dispute: Should we include a neutral response or not—should the respondent have the ability to offer no opinion at all? Let’s leave the neutral response debate, for now, and look at the number of responses on the scale.
More points or less? (Why do we use 4 or 5 scales, rather than 6 or 7?)
Teaching-Family Model standards require the collection of Likert scale data on either a 4 scale or a 5 scale. These options provide the same descriptiveness of opinion—no complexity in opinion is added beyond the introduction of a neutral response. In other words, if respondents have an opinion, they’d be dealing with the same four opinionated options.
Very dissatisfied | Dissatisfied | Satisfied | Very satisfied
Very dissatisfied | Dissatisfied | Neutral | Satisfied | Very satisfied
Many researchers have argued for the use of more points to improve reliability and validity—a 7 scale, for example, looks like this:
Extremely | Moderately | Slightly | Neutral | Slightly | Moderately | Extremely
dissatisfied dissatisfied dissatisfied satisfied satisfied satisfied
As we will return to the neutral option question later, let’s interpret the difference between scales of 5 and 7 as equivalent to the difference between scales of 4 and 6. Since Likert introduced and advocated for the use of the 5 point scale, many researchers have argued that 7 points may increase reliability and validity. (Though, it should be noted, this is a hard limit—some researchers have argued that individuals cannot distinguish between more than seven different distinct opinions about a subject, and that reliability is not increased beyond 7 points (Miller, 1956 & Johns, R. 2010, respectively).)
If we were to agree with a growing consensus that 6 or 7 point scales are the most reliable and valid options, why might we continue to use 4 or 5 scales? On one level, researchers have also reported higher reliability for 5 point scales in certain contexts (Jenkins & Taber, 1977; Lissitz & Green, 1975; McKelvie, 1978; Remmers & Ewart, 1941). Furthermore, a number of studies have suggested that five point scales increase response rates and response quality in addition to being less confusing and reducing respondents’ “frustration level” (Babakus and Mangold, 1992; Devlin et al., 1993; Hayes, 1992). One study explained that the 5 point scale is common and thus readily comprehensible to respondents, enabling them to accurately express their views (Marton-Williams, 1986).
The research arguments above provide strong support for the use of 4 or 5 scales in the practical context of Teaching-Family Model implementation for two reasons:
Firstly, TFM standards require agencies to collect the opinions of clients, children and other non-experts who might not be able to distinguish the difference in opinion between, for example, slight satisfaction and moderate satisfaction (on a 6 or 7 scale) with regards to the questions being asked.
To illustrate—when asking for feedback on an agency’s Facilitative Administration, can we reasonably assume consumer respondents to have a complex enough understanding and opinion of this system of a Teaching-Family Model agency to provide their accurate opinion—without frustration—with 6 or 7 different options? Using a 5 scale lowers barriers to comprehension and makes it more likely for non-expert respondents to express their true opinion on a subject at a reasonable level of detail.
2. Response Rates
Secondly, the Consumer Satisfaction standard requires a relatively high response rate compared to what might be considered an acceptable sample size for research or opinion polling—for two important reasons: all relevant parties should be offered a regular opportunity to provide formal feedback and comment on the work of the agency, and the agency should collect as much specific information and feedback as possible, rather than work with only the basic response data of a representative sample.
While it might be possible to achieve high response rates on a survey of clients regardless of the number of points on the scale—if these responses are collected in person or conducted by interview, for example—standards also require the administration of external consumer surveys which are far more difficult to administer with such a comprehensive response rate. (Note, if feedback is collected by interview, some research has also suggested that it is much simpler for an interviewer to read out the complete list of scale descriptors with a 5 point scale, thus improving the reliability and validity of responses (Dawes J.G., 2008).)
So, while research might prefer a more descriptive scale, in the practical implementation of the Teaching-Family Model there are strong arguments for the continued use of a 4 or 5 point scale. As the purpose of the data collected is continuous quality improvement or accreditation review—we might also ask the expert opinion of reviewers and administrators—does the 4 or 5 point scale provide enough information to guide TFM implementation?
The Neutral Response Option (Why would an agency choose a 4 scale over a 5 or vice versa?)
The choice to use a 4 or 5 scale is up to individual Teaching-Family Association accredited agencies. It should be noted, however, that the decision should be consistent for the sake of comparability across surveys—the choice will affect response data and make data between a survey utilizing a 4 scale and one using a 5 scale not reliably comparable, though it may seem so.
An important consideration in this debate is that TFM standards require the same high level of criteria regardless of whether or not a neutral option is included. On a 4 scale, that’s a score of 3— describing that everyone is, on average, satisfied—and on a 5 scale, that’s a score of 4—describing the same. One might assume that a neutral option would throw off averages to be slightly lower than would be equivalent on a 4 scale, as there is one more option below criteria than otherwise, but the opposite might also be true—it may be that the neutral option would skew responses towards neutrality from negativity, raising the average score. Let’s see how this plays out in research debate.
A scale without a neutral option is often criticized for “forcing” the respondent to make a decision whether or not something is positive or negative, which may reduce reliability and validity where respondents’ truly have no opinion or are neutral. As Likert himself advocated for the use of the neutral option—his research found the neutral option and 5 choice scale to create the statistically expected standard deviation or “bell curve” distribution when used—it must be said that a strong majority of research literature supports the use of the neutral option or simply only considers the differences between scales which use a neutral option. In fact, a scale without a neutral option might not be considered a Likert scale by some resources.
However, the arguments for doing away with the neutral response are also compelling. Respondents have a tendency to avoid the cognitive effort required to pick an opinionated answer when reporting their opinion (Krosnick, et al., 2002)—in other words, neutrality is the path of least resistance, and a neutral option may inaccurately skew results towards neutrality. But that’s not all, respondents’ may also choose a neutral option due to ambivalence (Bishop, 1987)—respondents may choose a neutral answer to avoid the negative feelings associated with their conflicted views on an issue. Cognitive effort may be required for individuals to choose between their positive and negative feelings, which a neutral option would make less likely to occur (Nowlis et al., 2008).
For these reasons and others, some researchers suggest doing away with the neutral option on Likert scales (Garland, 1991; Krosnick et al., 2002; Kalton et al., 1980)—the central argument being that removing the neutral option forces respondents to exert cognitive effort whether simply tending towards neutrality or strongly ambivalent about a topic. This requires individuals to use what they perceive to be the most important point of an argument to make a decision (Weijters et al., 2010; Nowlis et al., 2002). I would speculate, from these points, that removing the neutral option might also make respondents more likely to provide specific comment on why they chose one way or the other, which makes feedback more valuable to Teaching-Family Model agencies.
I believe the key consideration in the debate between using a 4 or a 5 scale centers around the relevancy of the questions to the respondents. When asking for opinion data among a large sample of individuals about which you may know little, it follows that a neutral option should be used to allow for a truly neutral opinion towards some of the questions asked by the survey or questionnaire. However, if we can reasonably assume that all questions asked in a survey—carefully selected questions about TFM implementation asked to stakeholders and consumers of those services, for example—are highly relevant to the respondents selected, we might lean towards the elimination of the neutral option, “forcing” respondents to employ appropriate cognitive effort and provide the most accurate and useful feedback to Teaching-Family Model agencies.
One potentially compelling argument for the use of a 5 scale is that many other organizations may also use 5 scales in similar ways. In that case, collecting satisfaction data with a neutral option would allow a TFM agency to benchmark data against other non-TFM organizations. I say non-TFM, as it appears most common among Teaching-Family Association agencies to utilize a 4 scale for consumer feedback based upon review data.
The reliability and validity of opinionated data will continue to be a topic of debate for some time—it is particularly difficult to parse the many factors which influence, behaviorally, the responses of individuals. Interestingly, the decision of whether or not to label each individual option or simply the extreme options, like this—
1 | 2 | 3 | 4 | 5
Very dissatisfied Very satisfied
—also has a significant and somewhat ambiguous effect on individuals’ responses—labeling only the extremes leads to more extreme responses, but labeling all options may lead to responses that are more neutral than may be perfectly accurate. Due to the complexity of the behavioral issues that relate to the self-report of opinion, it may be advisable to simply work with what is standard—such that new results can reliably be compared with previous results.
It should also be remembered that while this numerical data provides a valuable indicator to Teaching-Family Model agencies, the value and usefulness of specific, detailed and written feedback cannot be understated, and all consumer feedback surveys should include prompting questions to regularly elicit this feedback.