k 2024

The Length and Verbal Labels Do Not Matter : The Influence of Various Likert-Like Response Formats on Scales’ Psychometric Properties

CÍGLER, Hynek, Petra HUBATKA, David ELEK and Martin TANCOŠ

Basic information

Original name

The Length and Verbal Labels Do Not Matter : The Influence of Various Likert-Like Response Formats on Scales’ Psychometric Properties

Authors

CÍGLER, Hynek, Petra HUBATKA, David ELEK and Martin TANCOŠ

Edition

ITC Conference 2024, 2024

Other information

Language

English

Type of outcome

Presentations at conferences

Country of publisher

Spain

Confidentiality degree

is not subject to a state or trade secret

References:

Organization

Fakulta sociálních studií – Repository – Repository

Keywords in English

Likert scale; psychometrics; measurement invariance; measurement; Height Inventory; validity

Links

GA23-06924S, research and development project.
Changed: 28/3/2025 00:50, RNDr. Daniel Jakubík

Abstract

V originále

While the Likert scale is the most commonly used response format to measure personality traits, there is no clear consensus on how the scale’s parameters moderate its performance. In two within-subject experiments, we manipulated the extremity of outer verbal labels and the presence of inner labels in a 5-point Likert-type scale (Study 1, N1 = 1044) and the scale length using 2, 6, and 10 options (Study 2, N2 = 846). We used the Height Inventory that allows for the comparison with the criterion of self-reported height and replicated the results using a typical psychological measure. In both studies, we assessed the measurement model and criterion validity. We utilized reliability analysis, path analysis, ordinal SEM, invariance analysis, and latent regressions . With more extreme outer labels and longer response scales, responses are slightly more central, impacting raw score variances (and means in skewed scales). With non-extreme labels and longer response scales, observed scores have negligibly higher reliability. Criterion validity of observed scores is only negligibly related to the presence of inner verbal labels. Reliability was higher in the all-labeled variants. We demonstrate that the measurement model can be equated across all experimental conditions, leading to an equivalent, invariant single latent trait with the same population characteristics and association with the criterion. The two-point scales resulted in lower reliability, but their criterion validity seemed unimpacted and could be advantageous in some contexts. The performance of the Likert response scale was stable across the conditions we manipulated, especially if SEM is used instead of raw score analysis. Still, we argue for verbally labeling all points on the scale and for non-extreme labels of endpoints.

Files attached