Maria Segui-Gomez
When evaluating injury-related outcomes, there are instances in which we need measures that contain a qualitative judgment regarding the relative importance to the individual or to society, of the various aspects of function and disability that may be affected because of the outcome under evaluation (Patrick and Erickson, 1993). These qualitative judgments are what we will refer to as “preferences.” Imagine, for example, the patient with a serious leg injury who is given the choice of undergoing amputation or leg reconstruction. Even if one could provide precise and valid predictions on the effect of either procedure on a variety of health-related dimensions or attributes (using, for example, any of the existing psychometric-based scales), it is likely that the patient would give more value to some particular outcomes; that is, he or she would “prefer” some outcomes over others. Preference-based measures reflect the relationship between impairment, functional limitation, and disability and the individual’s (or society’s) overall level of well-being and satisfaction with life. These measures combine mortality-, morbidity-, and quality of life-related issues in ways that health status measures based on the psychometric tradition cannot. Decision Theory and economic principles provide the theoretical background to build these preference-based measures; which normally result in interval-scale measures where a “preference” of 0.0 is assigned to death, a “preference” of 1.0 is assigned to optimal health, and any number in between reflects the relative position of any given (injury-related) health condition.
Because of the multidisciplinary background of the developers and users of these preference-based measures, many different terms and their corresponding acronyms are presented in the literature, which leads to some confusion. For the sake of clarity and through the remainder of this paper, the following references will be used interchangeably to refer to the qualitative judgment that individuals or societies place on the relative importance of alternative health status: “preferences,” values, utilities, quality of life (QOL), and health (injury) related quality of life (HRQOL). A different, but related concept is derived from the multiplication of these qualitative judgments by the length of time (or remaining life expectancy) during which the individual (or individuals in society) will live with those conditions. When such a procedure is done, we obtain measures of preference adjusted life years. Here too, multiple terms exist. For example, some researchers use the term quality adjusted life years (QALYs), whereas others refer to well years (WY), Years of Healthy Life (YHL), Quality adjusted life expectancy (QALEs), health adjusted life expectancy (HALEs) or Disability Adjusted Life Years (DALYs). The remainder of this paper will focus on the measurement of the qualitative assessments, not on the computation of “preference”-adjusted life years.
There are two components to the development of preference-based measures. The first component relates to characterizing the health state under evaluation. The second component relates to the assignment of the “preference” of that state over other health states (including death and optimal health).
The first component, the characterization of the health state, does not differ much from the process described for the psychometric-based health status measures. Here too, dimensions, attributes or levels of functioning need to be defined and individuals need to be mapped into these classification systems. Table 1 summarizes the core concepts or domains of health that have been identified as essential to assist in the measurement of quality of life.
CONCEPTS AND DOMAINS | INDICATORS |
---|---|
Health Perception | Self-rating of health; health concerns, health worry |
SOCIAL RELATIONS | INDICATORS |
Social Relations | Interaction with others’ participation in the community |
Usual Social Role | Acute or chronic limitation in unusual social role (major activities) or child, student, worker |
Intimacy/Sexual Function | Perceived feelings of closeness; sexual activity and/or problems |
Communication/Speech | Acute or chronic limitations in communication/speech |
PSYCHOLOGICAL FUNCTION | INDICATORS |
Cognitive Function | Alertness; disorientation; problems in reasoning |
Emotional Function | Psychological attitudes and behaviors |
Mood/ Feelings | Anxiety; depression; happiness; worries |
PHYSICAL FUNCTION | INDICATORS |
Mobility | Acute or chronic reduction in mobility |
Physical Activity | Acute or chronic reduction in physical activity |
Self-Care | Acute or chronic reduction in self-care |
IMPAIRMENT/ FUNCTIONAL LIMITATION |
INDICATORS |
Sensory Function | Vision; hearing |
Symptoms/Impairments | Reports of physical and psychological symptoms, sensations, pain, health problems or feelings not directly observable; or observable evidence of defect or abnormality |
A peculiarity of preference-based measures in this regard is that when these measures are used in cost-effectiveness analyses it is important that the impact of the health status on productivity and leisure activities be included (implicitly or explicitly) among the core concepts (Gold et al., Cost-Effectiveness in Health and Medicine, 1996).
Also, when developing psychometric-based health status measures, issues that relate to the survey instrument used to identify the health status are important. Whether the survey is administered by an interviewer in person, by telephone, or it is self-administered, and whether the survey is in a paper or computerized format, may have an impact on the quantity and quality of the responses. Needless to say, issues that relate to the validity, reliability, and psychometric properties of these scales (e.g., flooring effects, ceiling effect) should also be considered, as well as the readability, comprehension, ease of use, and framing of the questions.
The second, and defining component of these preference-based measures is the development of preferences. In this valuation process, there are a number of issues to consider. First, how to assign the preferences? Assigning preference values per each health state possible is commonly referred to as a “holistic” approach. If the number of health states possible is large, the holistic approach may be too cumbersome. In those instances, one could evaluate a few health states and interpolate preferences for the remaining states or assign preferences to each of the levels within the different domains as well as to the domains themselves, then mathematically compute the preference value for any particular health state. This last procedure is referred to as a “decomposed” approach and it is particularly helpful when the number of health states possible is very large. There are several mathematical transformations used in the decomposed approach, the most common one being transformations based on multi-attribute theory.
One needs to decide whether to identify the preference for the health status of an individual at one particular point in time, then apply that value to the remainder of the individual’s life, or whether one should develop a “health path” (that is, successive health states over the remaining life) that is assigned a relative value of preference in comparison with alternative paths.
It must also be decided who is going to assign the preferences. Undoubtedly, in the example at the beginning of this paper in which a patient with a severe lower extremity injury was confronted with the choice of amputation vs. leg-reconstructive surgery, we are most interested in his or her preference. Hence, patients (i.e., patients with the health condition under valuation) are an obvious source of preferences. This is particularly true in clinical settings where preferences could be assessed on an individual basis every time they are needed. However, there are instances in which patients are not able to make such valuations (e.g., if the individual has cognitive problems) and instances in which we may want to consider alternative sources of preferences; for example, health professionals or experts. Alternatively, we may want to survey a sample (whether a convenient or a representative sample) of the population to assign preferences to an array of conditions. This latter approach is particularly necessary if we are allocating resources or prioritizing which health states need to be focused on. Needless to say, if we need to elicit preferences from individuals who have no experience in the health state being valued, how we describe the state (e.g., concepts and domains associated with the hypothetical states described) becomes an even more relevant issue for a valid assessment.
An especially important question is how are we going to ask the respondent to assign the preference. As indicated before, the goal is to obtain preference scores. These preferences must be comprised in an interval scale where death and optimal health are anchor points. A number of techniques can be used in this process. Some of these techniques are derived from the psychometric literature, and they are: Rating Scaling (RS), also called Direct Scaling, Category Scaling or Visual Analog Scaling, paired comparison, and magnitude estimation. The other available techniques are based on economic theory; they are the Time Trade Off (TTO) and the Standard Gamble (SG). Additional techniques, such as the Person Trade Off (PTO) have also been used. Rating Scales, Time Trade Off, and Standard Gamble are the most commonly used techniques; they are briefly described below:
Rating Scaling represents the simplest type of techniques. The subject being surveyed needs to assign a value between 0 and 1 to the health state under evaluation. Whether that decision is assisted by asking the subject to directly report the number, to place the number in a line where 0 and 1 (or 0 and 100) are indicated in the extremes (as in Visual Analog rating), or filling in a “thermometer-like” scale with the numbers 0 and 1 anchoring the extremes (as in Category Scaling) varies. Because of their simplicity, these techniques are widely used, particularly when interviewing large groups of individuals. Unfortunately, the resulting preference scores may not have interval-scale properties and they lack the mathematical and logical properties of economic-rooted preference values. In addition, they have been proven to reflect preference scores too low (i.e., to close) for relatively minor conditions.
Time Trade Off represents the simplest of the economic-based techniques. The subject under interview is requested to make a choice between living in the health state under evaluation for the remainder of his/her life or living a shorter number of years in an optimal state of health. The number of years that he or she is willing to “trade off” in exchange for perfect health and at which he or she is indifferent between choosing giving up years of life and remaining in the health state under evaluation indicates the preference of the individual. The preference score is obtained by dividing the shorter time lived in perfect health by the longer time lived in the health state under evaluation. The preference values elicited using this technique only hold all the properties defined by the economic theory for preferences (or utilities) when the subject under interview has no time preference (that is, he or she values equally a year lived now with a year lived in the distant future).
Standard Gamble is the only preference elicitation technique that is ensured to always comply with the tenets of economic expected utility theory. In its simplest form, the subject is confronted with a scenario in which he or she can live in the health state under evaluation for the remainder of his/her life expectancy or can undergo a procedure which will either restore him/her to optimal health (with a probability pi of this occurring) or kill him or her immediately (with a probability 1-pi of this occurring). The probability pi at which the subject is indifferent to either undergoing the procedure or remaining in the health state under evaluation is the preference value of that health state. This technique is the most time consuming to use and requires the interviewee to have some knowledge of probabilities. Although this technique is often considered the gold standard for preference elicitation, it is difficult to use in large population surveys or surveys of some subgroups of the population (e.g., children, elderly, mild cognitive impairments).
Another issue to consider is the timing of the elicitation process: how many times and when are we going to elicit preferences? It is conceivable that preferences change over time (for example, due to accommodation). There are some instances in which determining these changes over time are of great relevance. In other instances, however, we can only assess preferences once, in which case we need to decide when should we do it. Should it be, for example right after the injury has occurred? Should we wait until the patient has had some time to accommodate the new state? Most commonly, preferences are assessed only once and in relation to a particular health state and the resulting preference score is applied to the remaining life expectancy of the individuals suffering that condition. With regard to this timing issue, one must decide whether preferences need to be elicited each and every time for each subject with a particular health state or whether one could develop a “library” of preferences for all the possible health states so that the only tasks that needs to be done every time is the characterization of the health status of the individual (or the population).
The last two issues to keep in mind relate to the meaningfulness and applicability of the identified preferences to the general population. In considering the meaningfulness of the preference, one must remember not only that preference scores may differ, and that they may even significantly differ statistically, but whether those differences are of clinical- or policy-making importance. In considering the ability to generalize the preferences, particularly when those preferences are derived from a representative sample of a population, the issue to evaluate is whether those preferences can be used in other settings and/or cultures.
With all the issues and possibilities just described, it should not be surprising that different researchers could argue for preference-based measures derived using very different criteria. However, it is agreed that, at a minimum, the ideal health-state classification upon which the quality weights are derived should reflect: (1) the domains that are important to the problem under consideration; and (2) quality weights that are population-based; economic theory-based (particularly if we are dealing with policy decisions); interval-scale; and measured or transformed onto an interval scale where the reference point “death” has a score of 0.0 and the reference point “optimal health” has a score of 1.0. The preferences should, at minimum, incorporate the effects of morbidity on productivity and leisure, and this is particularly true if the measure is to be used in an economic evaluation exercise.
In this section we briefly introduce seven “preference”-based measures: the European Quality of Life (EuroQoL), Quality of Well-being (QWB) -formerly known as the Health Status Indicator and the Index of Well-being, Health Utility Index Mark III (HUI:3), Years of Healthy Life (YHL), Functional Capacity Index (FCI), Medical Outcomes Short Form-36 (SF36) and its preference-based version, the Short Form-6 Dimensions (SF-6D), and the Disability Adjusted Life Years (DALYs). There are other preference-based measures with applicability in injury outcomes research, such as the Distress Index or the Quality of Life and Health Questionnaire. We chose to present only seven measure since those are measures that are either widely used in non-injury related settings (EuroQoL, QWB, HUI:3, SF36) or that have been developed for, or used in, settings where injury problems are common (YHL, FCI and DALYs) (MacKenzie, in press).
For each measure, we present the domains or attributes included in the health status descriptions: how those compare to the 13 core concepts presented in Table 1; the number of possible health states available; whether one must assess the health status and its preference each time; whether there are preferences available from a library (so that one only needs to assess the health status for each patient or condition); if a library exists, whose preferences were elicited; which technique was used for the elicitation; whether states were assessed individually or using a interpolation or a decomposed approach; the range of the scores produced; and whether they constitute an interval scale. We also present the approximate duration of the survey process.
The above information is presented either in the text below or in Table 2. Further references for each measure are provided at the end. A more detailed discussion of the application of EuroQoL, QWB, and FCI in injury-related settings follows in the next conference presentations. It should be noted that all these measures are in constant evolution; the information summarized below belongs to the measurement versions currently available.
EuroQoL. Uses five domains (Mobility, Self-care, Usual Activity, Pain/Discomfort, and Anxiety/Depression) and three possible levels within each domain to describe a total of 245 possible health states used in populations. A representative sample of UK citizens participated in the preference elicitation process for the development of a library of scores. Direct rating of health states was the elicitation technique, although lately the time trade off technique has been introduced. Preferences for 45 health states were elicited; preferences for the other states were interpolated. Preference scores are interval-based; the death score is 0.0 and the optimal health score is 1.0 (EuroQoL Group, 1990).
QWB. Uses three domains (Mobility, Physical Activity, and Social Activity) and up to 27 symptoms to classify health status. Used in clinical and population settings. A representative sample of US population participated in the development of a library of preferences. Category Scaling was used in the elicitation process. Preference scores range from 0 for death and 1.0 for optimal health (Kaplan et al., 1993).
HUI:3. Uses eight domains (Vision, Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition, and Pain) and five or six levels within domains for a total of 972,000 possible health states. Developed for clinical and population applications. Preference weights for health status are computed using multi-attribute methods. A representative sample of residents of Ontario, Canada, was used in the development of a library. Direct scaling and standard gamble techniques were used in the elicitation process. The preference scores are interval-scaled, death score 0.0, optimal health score 1.0 (Feeny et al., 1996).
YHL. Uses two domains (Self-perceived Health and Role function) and five or six levels within domain for a total of 30 possible health states. Both the domains and the levels are as defined in the USA National Health Interview Survey. This was developed for population applications only. Each possible health state was ranked individually and the scores are available in a library format. Preferences were derived using correspondence analysis and characterizing the best health state as 1, the worst as 0, and the HUI:1 of a comparable state for an intermediate value (Erickson et al., 1995).
FCI. Uses 10 domains (Excretory, Eating, Sexual, Ambulation, Hand & Arm, Bending &Lifting, Visual, Auditory, Speech, and Cognitive) and between three and seven levels within domain for a total of 4,354,560 possible health states. Used in population settings and with a mapping into the Abbreviated Injury Scale (AIS) dictionary available for application in existing datasets. Preference scores for each dimension and dimension/level were derived from a convenient sample that including disabled, physicians, and the general population, using direct scaling as the elicitation technique. The scores are available in a library. The preference scores range from 100 for death and 0 for optimal heath; although is it arguable whether they have interval-scale properties (MacKenzie et al., 1996).
SF-36 (SF-6D). Still under initial development. Uses six domains (Physical Functioning, Role Limitation, Social Functioning, Pain, Mental Health, and Vitality) and between four and six levels within domain for a total of 18,000 possible health states. Dimensions and dimension-levels were derived from the SF-36. This was developed for clinical and population applications. In order to establish a library, each dimension and dimension level was rated for preference in a representative sample of the UK residents using standard gamble. Multi-attribute theory was used to compute preferences for health states. A mapping algorithm to transform SF-36 onto SF-6D is currently under development. Preference scores are interval-scaled, with a death score of 0.0, and an optimal health score 1.0 (Brazier et al., 1998).
DALYs. Identifies seven disability categories that the authors mapped into mortality and morbidity data. Preference scores per each disability category were derived from a convenient sample of international experts using the Person Trade Off elicitation. Preference scores, in a library format, are interval-scaled, death score 1.0, optimal health score 0.0 (Murray and Acharya, 1997).
Note: Table 2 Legend
EuroQoL | QWB& | HUI:3 | YHL | SF-6D | FCI | DALYs | |
---|---|---|---|---|---|---|---|
Domains Contained/Desirable | 6/13 | 4/13 | 7/13 | 2/13 | 8/13 | 8/13 | ? |
Productivity/Leisure | ? | ? | ? | ? | ? | ? | ? |
Survey Formats | SR | IIP, IVT | IIP, IVT, SR | IIP | N/A | IIP, IVT | None |
Languages | E, D, F, N, Sw | E, S | E, Fr | E | E | E | E |
Time for health status assessment (minutes) | 5-10 | 15-20 | 2-10 | 1-2 | N/A | 10-15 | N/A |