V-Dem has developed innovative methods for aggregating expert judgments in a way that produces valid and reliable estimates of difficult-to-observe concepts. This aspect of the project is critical because many key features of democracy are not directly observable. We continually review our methodology—and occasionally adjust it—with the goal of improving the quality of V-Dem indicators and indices.
Expert-coded data raise concerns regarding comparability across time and space. Rating complex concepts requires judgment, which may vary across experts and cases. Moreover, because even equally knowledgeable experts may disagree, it is imperative to report measurement error to the user. We address these issues using both cutting-edge theory and methods, resulting in valid estimates of concepts relating to democracy.
What have so far,
- We recruited over 3,000 country experts to provide their judgment on different concepts and cases.
- We typically gather data from 5 experts for each observation
- We ask our experts very detailed questions about specific concepts.
- We endeavor to both make our questions clear
- We use ”bridge coding” to facilitate cross-country comparability
We employ anchoring vignettes to further improve the estimates of expert-level parameters and thus the concepts we measure.
A Bayesian Item-Response Theory (IRT) Estimation Strategy
Pemstein et al. (2018) have developed a Bayesian Item-Response Theory (IRT) estimation strategy that accounts for many concerns regarding problems with the expert-coded data, while also providing estimates of remaining random measurement error.
We use this strategy to convert the ordinal responses experts provide into continuous estimates of the concepts being measured. The basic logic behind these models is that an unobserved latent trait exists, but we are only able to see imperfect manifestations of this trait. By taking all of these manifest items (in our case, expert ratings) together, we are able to provide an estimate of the trait. In the dataset, we present the user with a best estimate of the value for an observation (the point estimate), as well as an estimate of uncertainty (the credible regions, a Bayesian corollary of confidence intervals).
The IRT models we use allow for the possibility that experts have different thresholds for their ratings. These thresholds are estimated based on patterns in the data, and then incorporated into the final latent estimate. In this way, we are able to correct for the previously-discussed concern that one expert’s “some- what” may be another expert’s “weakly” (a concept known as Differential Item Functioning). Apart from experts holding different thresholds for each category, we also allow for their reliability (in IRT terminology, their “discrimination parameter”) to idiosyncratically vary in the IRT models, based on the degree to which they agree with other experts. Experts with higher reliability have a greater influence on concept estimation, accounting for the concern that not all experts are equally expert on all concepts and cases.
More information is available here:
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, Kyle L. Marquardt, Juraj Medzihorsky, Daniel Pemstein, Nazifa Alizada, Lisa Gastaldi, Garry Hindle, Johannes von Römer, Eitan Tzelgov, Yi-ting Wang, and Steven Wilson. 2020. “V-Dem Methodology v10”. Varieties of Democracy (V-Dem) Project.
Maxwell, Laura, Kyle L. Marquardt, Anna Lührmann. 2018. “The V-Dem Method for Aggregating Expert-Coded Data”. Varieties of Democracy (V-Dem) Project. A short brief on V-Dem Methodology.
Daniel Pemstein, Kyle L. Marquardt, Eitan Tzelgov, Yi-ting Wang, Juraj Medzihorsky, Joshua Krusell, Farhad Miri, and Johannes von Römer. 2020. "The V-Dem Measurement Model: Latent Variable Analysis for Cross-National and Cross-Temporal Expert-Coded Data". University of Gothenburg, Varieties of Democracy Institute: Working Paper No. 21, 5th edition.
Please refer to the Working Papers section on our website for more papers on our methodology.