V-Dem Methodology

V-Dem has developed innovative methods for aggregating expert judgments in a way that produces valid and reliable estimates of difficult-to-observe concepts. This aspect of the project is critical because many key features of democracy are not directly observable. We continually review our methodology—and occasionally adjust it—with the goal of improving the quality of V-Dem indicators and indices.

Author: Kyle Marquardt

V-Dem uses innovative methods to aggregate expert judgments and thereby produce estimates of important concepts. We use experts because many key features of democracy are not directly observable. For example, it is easy to observe whether or not a legislature has the legal right to investigate an executive. However, assessing the extent to which the legislature actually does so requires evaluation by experts with extensive conceptual and case knowledge.

V-Dem typically gathers data from five experts per country-year observation, using a pool of over 3,700 country experts who provide judgment on different concepts and cases. Experts hail from almost every country in the world, allowing us to leverage diverse opinions.

Despite their clear value, expert-coded data pose multiple problems. Rating concepts requires judgment, which varies across experts and cases; it may also vary systematically across groups of experts. We address these concerns by aggregating expert coded data with a measurement model, allowing us to account for uncertainty about estimates and potential biases.

The logic of the V-Dem measurement model is that an unobserved concept exists (e.g. a certain level of academic freedom and freedom of cultural expression) but we only see imperfect manifestations of this concept in the form of the ordinal categories which experts use to code their judgments. Our model converts these manifest items (expert ratings) to a single continuous latent scale and thereby estimates values of the concept.

In the process, the model algorithmically estimates both the degree to which an expert is reliable relative to other experts, as well as the degree to which their perception of the response scale differs from other experts. Similarly, we use patterns of overlapping coding – both in the form of experts who code multiple countries and experts who code hypothetical cases (anchoring vignettes) – to estimate the degree to which differences in scale perception are systematic across experts who code different sets of cases. Given the iterative nature of the estimation process, these estimates of reliability and scale perception weight an expert's contribution to the estimation of the unobserved concept.

In the resulting V-Dem dataset, we present users with a best estimate of the value for an observation (the point estimate), as well as an uncertainty estimate (the credible regions, a Bayesian corollary of confidence intervals). More precisely, the output of the measurement model is an interval-level point estimate of the latent trait that typically varies from –5 to 5, and its associated measurement error. These estimates are the best for use in statistical analysis.

However, the interval-level estimates are difficult for some users to interpret substantively. We therefore also provide interval-level point estimates that we have linearly transformed back to the coding scale that experts originally used to code each case. These estimates typically run from 0 to 4; users can refer to the V-Dem codebook to substantively interpret them. Finally, we provide ordinal versions of each variable for applications in which users require ordered categorical values. Each of the latter two data versions are also accompanied by credible regions.

The result of this process is a set of versions of indicators of democratic institutions and concepts, which allow academics and policymakers alike to understand the different features of a polity. The table summarizes the output with which we provide users.

Versions of the V-Dem Indicators

For more information, download the V-Dem Methodology document.

Suffix Scale Description Recommended use

None

Interval

V-Dem measurement model estimates

Regression analysis

_osp

Interval

Linearized transformation of the model estimates on the original scale

Substantive interpretation of graphs and data

_ord

Ordinal

Most likely ordinal value of model estimates on the original scale

Substantive interpretation of graphs and data

_codelow/
_codehigh

Interval

One standard deviation above (_codehigh) and below (_codelow) a point estimate

Evaluating differences over time within units

_sd

Interval

Standard deviation of the interval estimate

Creating confidence intervals based on user needs

KEY TERMS

Point Estimate: A best estimate of a concept’s value.

Confidence Intervals: Credible regions for which the upper and lower bounds represent a range of probable values for a point estimate. These bounds are based on the interval in which the measurement model places 68 percent of the probability mass for each score, which is generally approximately equivalent to the upper and lower bounds of one standard deviation from the median.

Significant Differences or Changes: When the upper and lower bounds of the confidence intervals for two point estimates do not overlap, we are confident that the difference between them is not a result of measurement error.

Resources and further reading