Creating Composite Indices From Continuous Variables for Research: The Geometric Mean
Author(s) -
Hertzel C. Gerstein,
Chinthanie Ramasundarahettige,
Shrikant I. Bangdiwala
Publication year - 2021
Publication title -
diabetes care
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.636
H-Index - 363
eISSN - 1935-5548
pISSN - 0149-5992
DOI - 10.2337/dc20-2446
Subject(s) - construct (python library) , categorical variable , medicine , mace , event (particle physics) , latent variable , myocardial infarction , statistics , econometrics , cardiology , computer science , mathematics , physics , quantum mechanics , conventional pci , programming language
Clinical research focuses on the relationship between one or more independent variables and somedependent variable or outcome chosen to reflect some underlying process. For categorical variables, the research may either be focused on a specific end point such as myocardial infarction (MI) or on an underlying construct such as vascular disease (withMI as just one exemplar). In the latter instance, a composite index such as major adverse cardiovascular event (MACE), defined as either a nonfatal stroke, nonfatal MI, or cardiovascular death, may be used. Composite categorical outcomes such as MACE optimize power by ensuring a high event rate, and the results they yield are generalizabletodiseasesthatareconsistent withtheunderlyingconstruct. It istherefore surprising that there is no widely used method to combine continuous variables into composite continuous outcomes. Nevertheless, there is a clear need for such a methodology when the underlying construct cannot be easily captured by one measurement. Glucose control is an example of a construct that can be assessed in many ways, including fasting or postprandial plasma glucose, HbA1c, fructosamine, or “time in target.” A composite of two or more of these could provide a better reflection of glucose control than any one alone. Whereas sophisticated statistical techniques such as structural equation modeling (1) can be used to model some underlying construct or latent variable from two or more measurements, a simplerwayof combining them into an index that reflects the underlying construct could provide a powerful tool for both researchers and clinicians. Such an approach is described below. Whenthesamemeasurementsaremade using the same scale, the arithmetic mean provides a more precise estimate than any one measurement alone. The challenge ariseswhendifferentmeasurements (e.g., heart rate and body temperature) are usedtomeasuresomeunderlyingconstruct (e.g., illness severity) using different scales. In this instance, an arithmetic mean is nonsensical. This is usually managed by converting the measurements into standardized Z scores (2) after ensuring that they are normally distributed (or are transformed to a normal distribution) (3). Thus, if a person’s heart rate Z score is 1.1 and body temperature Z score is 0.9, the arithmetic mean of those two Z scores may better reflect the construct of illness severity than either Z score alone. Clearly, all component measurements must have the same directional relationship with the underlying construct, and any that have inverse relationships need to be reverse-scored before being combined. A simpler approach is to calculate the geometric mean of the nmeasurements being combined. The geometric mean is simply thenth rootof theproductof then measurements (Supplementary Fig. 1) and can be understood as a function that reflects the multiplicative relationship between the components. Thus, for the geometric mean of measurements “a” and “b,” something that increases “a” by some relative amount (e.g., twoor threefold) will yield the same geometric mean as something that increases “b” by the same relative amount. Therefore, in contrast to the arithmeticmean, it canbe used to combine measurements from scales with different distributional properties and represents an easy-to-calculate composite index of n disparatemeasures that eliminates the need for standardization before combining them. The geometric mean can therefore reflect many aspects of some underlying construct, is simple to calculate, and reduces the need for complex multivariable analyses. Limitations (Table 1) include the inability to calculate a geometric mean when either the product of the component variables is a negative number (since the nth root of a negative number is an imaginary number) orwhen the value of any of the components is 0 (because it would return a geometric
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom