Use of a “critical difference” statistical criterion improves the predictive utility of the Health Assessment Questionnaire-Disability Index score in patients with rheumatoid arthritis

Background The Health Assessment Questionnaire-Disability Index (HAQ-DI) is used to assess functional status in rheumatoid arthritis (RA), but the change required for meaningful improvements remains unclear. A minimum clinically important difference (MCID) of 0.22 is frequently used in RA trials. The aim of this study was to determine a statistically defined critical difference for HAQ-DI (HAQ-DI-dcrit) and evaluate its association with therapeutic outcomes. Methods We retrospectively analyzed data from adult German patients with RA enrolled in a multicenter observational trial in which they received adalimumab therapy at the decision of the treating clinician during routine clinical care. The HAQ-DI-dcrit, defined as the minimum change that can be reliably discriminated from random long-term variations in patients on stable therapy, was determined by evaluating intra-individual variation in patient scores. Other outcomes of interest included Disease Activity Score-28 joints and patient-reported pain and fatigue. Results The HAQ-DI-dcrit was calculated as an improvement (decrease) from baseline of 0.68 in a discovery cohort (N = 1645) of RA patients on stable therapy and with moderate disease activity (mean DAS28 [standard deviation] of 4.4 [1.6]). In the full patient cohort (N = 2740), 22.1% of patients achieved a HAQ-DI-dcrit improvement at month 6. Compared with patients with a small improvement in HAQ-DI (decrease of ≥0.22 to < 0.68) or no improvement (< 0.22), patients achieving a HAQ-DI-dcrit at month 6 had better therapeutic outcomes at months 12 and 24, including stable functional improvements. Change in pain was the most important predictor of HAQ-DI improvement during the first 6 months of therapy. Conclusions A HAQ-DI-dcrit of 0.68 is a reliable measure of functional improvement. This measure may be useful in routine clinical care and clinical trials. Trial registration ClinicalTrials.gov NCT01076205. Registered on February 26, 2010 (retrospectively registered).

Keywords: Health Assessment Questionnaire, Functional ability, Rheumatoid arthritis, Adalimumab

Background
The Health Assessment Questionnaire-Disability Index (HAQ-DI) is considered the gold standard for the assessment of function in patients with rheumatoid arthritis (RA) [1] and is the clinical variable most closely associated with joint replacement, work disability, and mortality [2]. This tool is scored on a scale of 0 (minimum disability) to 3 (maximum disability) and encompasses eight domains of daily living [3]. As a stand-alone measure, the HAQ-DI is frequently used as a primary or secondary endpoint in randomized controlled trials in patients with RA [4], and it is routinely incorporated into the American College of Rheumatology (ACR) improvement criteria as an option for functional assessment [5].
Despite the importance of this tool in measuring physical function, the level of HAQ-DI change required for a clinically important and robust improvement in an individual patient remains unclear. Several studies have evaluated the minimum clinically important difference (MCID) for the HAQ-DI using different methodologies and patient populations. An MCID of 0.22 was determined by Wells et al. on the basis of a single evening of conversations between 40 RA patients with differing functional status [6], and this value has been applied in randomized clinical trials of therapeutic agents [7,8]. On the higher end of the scale, Wolfe et al. determined a "really important difference" of 0.87 in 8931 RA patients aged < 65 years based on subjective measures of functional independence, and a difference of 0.74 based on objective reports of work disability [9]. Other studies have found intermediate values [10][11][12][13]. In addition to the wide variation in HAQ-DI MCIDs, some experts have criticized the methodology involved in calculating MCIDs from patient-reported outcomes such as the HAQ-DI based on ordinal measures in which distances between each raw score point are unequal; conversion to interval scaling based on Rasch model-transformed scales has been recommended as an alternative [14].
We have developed a statistical method for determining thresholds for individual therapeutic responses based on the magnitude of change required to exceed random variation during long-term stable therapy, termed the "critical difference" (d crit ) [15,16]. This approach was piloted using the Disease Activity Score-28 joints (DAS28); achievement of the DAS28-d crit , a DAS28 decrease (improvement) of ≥1.8 from baseline, was shown to be a stable and robust indicator of a positive individual therapeutic response in patients with active RA initiating adalimumab therapy [15]. A later study successfully applied this same method to patientreported outcomes, including pain and fatigue [16].
The observational studies on which the previous reports were based used the Funktionsfragebogen Hannover patient questionnaire as the functional assessment. A subsequent observational study used the HAQ-DI as a measure of self-reported function, thereby allowing us to apply the critical difference methodology to this important assessment. The aim of this study was to determine a statistically defined critical difference for HAQ-DI (HAQ-DI-d crit ) and evaluate its association with therapeutic outcomes. Our data suggest that this criterion would be useful both in the evaluation of individual patients during routine clinical care and as a response criterion in randomized clinical trials.

Study design
This study used data from German patients with RA enrolled in a multicenter observational trial who received adalimumab therapy at the decision of the treating clinician during routine clinical care (Clinicaltrials.gov NCT01076205). Adult patients (≥18 years of age) were required to have a diagnosis of active RA, a clinical indication for treatment with a tumor necrosis factor inhibitor, and no contraindications. Patients included in these analyses were treated between January 12, 2009, and September 14, 2017. All patients were informed of the objectives of the observational study and gave written consent for their voluntary participation in the study and the anonymous use of personal data in statistical analyses. Ethics approval was obtained from the Ethics Commission of the Medical Department of Goethe University, Frankfurt am Main, Germany (No. 122/09).
The discovery cohort, which was used to determine the HAQ-DI-d crit , included only patients who were on stable therapy (no change in adalimumab dose or concomitant therapies) from month 12 to 24 and had HAQ-DI data for the month 12 and 24 visits. The requirement for stable treatment allowed intra-individual fluctuations in outcomes to be distinguished from responses due to alterations in therapy. No other exclusion criteria were applied.
For the full cohort analyses, patients were required to have baseline data for DAS28 and HAQ-DI and month 6 data for HAQ-DI. Patients who were previously treated with adalimumab, were in functional remission (HAQ-DI ≤ 0.5), or had low disease activity (DAS28 ≤ 3.2) at baseline were excluded from these analyses. All patients who met the specified criteria were included in the full cohort analyses.

Outcomes
The analyses reported here include data up to 24 months. Visits were conducted at baseline (month 0, prior to initiation of adalimumab therapy) and months 3, 6, 12, and 24. Disease activity was assessed by DAS28 [17] and function was assessed by HAQ-DI [3]; for both measures, higher scores indicate greater impairment. At each visit, patients provided selfassessments of pain, fatigue, and global health in the past 7 days on an 11-point categorical scale ranging from 0 (best) to 10 (worst).

Statistical analyses
Statistical analyses were performed with SAS® statistical software (Version 9.4). Summary statistics are presented for demographic and disease characteristics. Missing data were not imputed. Patient numbers varied at different visits because of study discontinuations and missing data for specified outcomes.
The method for determining the HAQ-DI statistically defined critical difference (HAQ-DI-d crit ), the minimum change that can be reliably discriminated from random variations in patients on stable therapy, was based on evaluations of intra-individual variation in patients undergoing stable therapy (discovery cohort) between month 12 and month 24 as described previously [15,16]. These evaluations allowed us to determine the long-term reliability of the HAQ-DI over a period of months, rather than its short-term measurement error. Long-term variation is more applicable to real-life patient care in which assessment of disease activity is usually performed at intervals separated by several months. Briefly, we adapted the method of Lienert and Raatz [18] to determine a critical difference based on the one-sided 5% z-value of the normal distribution in patients on stable therapy from months 12 to 24 after initiation of adalimumab [19]. A one-sided critical difference was calculated because only improvements (decreases) in HAQ-DI were relevant to defining a response. Pearson correlation and the standard deviation were used to determine the standard error of measurement for the HAQ-DId crit . The HAQ-DI-d crit value was then used to evaluate functional improvement in the full cohort of patients initiating adalimumab therapy. Stepwise multiple regression analysis incorporating 29 variables, including demographic characteristics, comorbidities, concomitant treatment, and measures of disease activity, was used to identify predictors for improvement in HAQ-DI at month 6.

Determination of the critical difference in HAQ-DI
The discovery cohort consisted of 1645 patients who were on stable therapy from month 12 to month 24 after initiation of adalimumab. Seventy-two percent were female, and mean baseline values (SD) were disease duration of 10.9 (9.0) years, DAS28 of 4.4 (1.6), and HAQ-DI of 1.1 (0.72). The HAQ-DI-d crit value in this discovery cohort was determined to be 0.641. Subgroup analyses by baseline characteristics showed that HAQ-DI-d crit values ranged from a low of 0.597 for patients with baseline HAQ-DI < 1 to a high of 0.673 for patients with baseline HAQ-DI ≥ 1 (Table 1). On the basis of this subgroup analysis, we chose a HAQ-DI-d crit value of 0.68 as a conservative value representing a statistically valid individual improvement in HAQ-DI score that exceeded the threshold of random fluctuation.
In the full cohort of patients initiating treatment with adalimumab (all patients who met inclusion/exclusion criteria for the analysis), 522 of 2740 patients (19.1%) achieved a HAQ-DI-d crit improvement (HAQ-DI decrease ≥0.68 from baseline) at 3 months. The HAQ-DI-d crit achievement rates increased slightly during the study to 639 of 2895 (22.1%) at month 6, 544 of 2193 (24.8%) at month 12, and 443 of 1532 (28.9%) at month 24.

Characteristics of patients by improvement in HAQ-DI at 6 months
The statistically determined HAQ-DI-d crit of 0.68 was higher than many other values used to assess a HAQ-DI response, including the MCID value of 0.22 sometimes used in clinical trials [7,8]. We therefore decided to compare characteristics and outcomes in patient subgroups on the basis of achievement of various HAQ-DI criteria at month 6, the visit at which many specialists make the decision to continue or modify therapy. We could not directly compare patients with a HAQ-DI decrease ≥0.68 with those achieving a decrease ≥0.22 because the latter, less stringent target also included all patients in the HAQ-DI-d crit group. We therefore categorized patients in the full patient cohort (N = 2895 at 6 months) into the following 3 subgroups based on change in HAQ-DI at 6 months: (1) patients achieving a HAQ-DI-d crit improvement (decrease ≥0.68), (2) patients achieving an MCID of 0.22 but less than the HAQ-DId crit (HAQ-DI decrease of ≥0.22 to < 0.68; referred to as "small improvement"), and (3) patients with no or minimal HAQ-DI improvement (HAQ-DI decrease < 0.22; referred to as "no improvement"). Because these groups were biased by the functional criteria used to define them, statistical differences between them were not assessed; Table 2 provides descriptive data only. Greater improvements in HAQ-DI at month 6 were more common in younger patients and those with a lower body mass index (BMI) and shorter disease duration ( Table 2). The three subgroups had generally comparable DAS28 scores at baseline, although the group with no HAQ-DI improvement had the lowest disease activity. A similar pattern was seen with baseline HAQ-DI values: the group with the greatest HAQ-DI improvement at month 6 had the highest mean baseline HAQ-DI values and the group with no improvement had the lowest.

Association of HAQ-DI change criteria with other outcomes
To explore the predictive value of different levels of HAQ-DI change at month 6 with respect to additional therapeutic response outcomes, such as DAS28, we evaluated outcomes in patients in each of the 3 subgroups at months 12 and 24. During the first 24 months of the observational study, 31.2% of patients withdrew, most commonly because of a lack of effectiveness, and 18.9% were lost to follow-up. As might be expected from responder bias, study withdrawal rates were higher in the subgroup with no HAQ-DI improvement (28% at month 12 and 37% at month 24) than in the group with a small HAQ-DI improvement (18.7% at month 12 and 27.3% at month 24) or HAQ-DI-d crit improvement (15.3% at month 12 and 25.2% at month 24).
Patients who achieved a HAQ-DI-d crit improvement at month 6 consistently showed better outcomes at months 12 and 24 than patients with lower levels of HAQ-DI improvement (Table 3). Differences in outcomes were observed in both mean values and response criteria, including DAS28 remission and DAS28-d crit response (DAS28 improvement ≥1.8 from baseline). For instance, in patients who achieved a HAQ-DI-d crit response at month 6, the rate of DAS28 remission at month 12 was approximately 20% higher than in patients with a small HAQ-DI improvement and approximately 30% higher than patients with no HAQ-DI improvement (DAS28 remission rates of 46.6, 25.2, and 17.5%, respectively).

Stability of HAQ-DI changes during therapy
The stability of a therapeutic response in patients remaining on therapy reflects both the continued efficacy of the treatment and the consistency of the response tool. To evaluate the stability of the HAQ-DI-d crit response, we assessed the proportions of patients with a HAQ-DI-d crit response at month 6 who maintained this response at subsequent visits during continued adalimumab therapy. Approximately 70% of patients with a HAQ-DI-d crit response at month 6 also had a HAQ-DI-d crit response at months 12 and 24 (Fig. 1). Most patients who did not sustain the HAQ-DI-d crit response moved into the small improvement category (HAQ-DI decrease from baseline of ≥0.22 to < 0.68). Patients with no improvement also had stable responses; about 70% had no improvement at both subsequent time points. In contrast, only about half of the patients with a small HAQ-DI improvement at month 6 maintained this level of improvement at months 12 (54.7%) and 24 (44.0%). The remaining patients in this subgroup were fairly equally distributed between a HAQ-DI-d crit improvement (about 20%) and no improvement (about 30%).
Predictors for change in HAQ-DI from month 0 to month 6 A stepwise multiple regression model was used to identify predictors of HAQ-DI improvement during the first 6 months of adalimumab therapy ( Table 4). The most important predictor was change in pain, as assessed by a patient-reported 11-point categorical pain scale, between month 0 and month 6; greater improvement in pain was associated with greater improvement in HAQ-DI. A high baseline HAQ-DI score was a positive predictor for improvement in HAQ-DI, but a high baseline pain score was a negative predictor. Other negative predictors included older age, longer disease duration, higher BMI, and higher baseline DAS28. As with pain, greater improvement in DAS28 from month 0 to month 6 was associated with greater improvement in HAQ-DI during this time period.

Discussion
The HAQ-DI is a validated assessment of function that effectively discriminates active treatment from placebo [20] and predicts key RA outcomes, including work disability and mortality [2]. It is frequently used in RA clinical trials, observational studies, and daily patient care, and is considered the gold standard measurement of function in rheumatology [1,4]. Over the years, there have been many approaches to determining a clinically Measured on a categorical scale ranging from 0 (best) to 10 (worst) significant improvement in HAQ-DI. Many of these approaches have used anchor-based assessments involving either subjective (eg, patient's view of their overall disease status) or objective (eg, documented work disability) measures [6,9,12,13,19]. Some analyses were based on population-based means [8], whereas others were based on between-patient differences [6,10,12]. HAQ-DI MCIDs range widely in value depending on the specific study and there is concern about the accuracy of calculations based on an ordinal rather than interval scale [14]. Our approach to determining a valid criterion for HAQ-DI improvement is different from previous efforts: our goal was to establish a change in HAQ-DI that exceeded long-term random fluctuation within an individual patient on stable therapy. Long-term changes encompass short-term measurement variability as well Because a negative value for HAQ-DI month 6month 0 indicates an improvement, variables with negative coefficients are positive predictors and those with positive coefficients are negative predictors c Higher values on these scales represent greater impairment, so higher values for month 6month 0 correspond to lack of improvement and are a negative predictor for improvement in HAQ-DI Fig. 1 Stability of HAQ-DI changes during therapy. Continued achievement of HAQ-DI improvement criteria at months 12 and 24 was evaluated in patient subgroups based on HAQ-DI improvement at month 6. HAQ-DI-d crit improvement was defined as HAQ-DI change from baseline ≥0.68, small improvement as ≥0.22 to < 0.68, and no improvement as < 0.22. Differences in patient numbers from Table 3 are due to the absence of HAQ-DI data in some patients. HAQ-DI Health Assessment Questionnaire-Disability Index, HAQ-DI-d crit critical difference for change beyond random variation in the HAQ-DI (decrease ≥0.68 from baseline) as nonsystematic changes in disease activity during stable therapy. The short-term test-retest reliability of the HAQ-DI is quite high, as indicated by an intraclass correlation of 0.897 (95% confidence interval, 0.855-0.927) for two assessments taken 1 to 2 days apart [21]. However, patients in rheumatology clinical care are typically seen at 3-to 6month intervals, so long-term variability is more relevant to outcomes observed during clinical care.
We found that the degree of change required to exceed normal long-term variation in a discovery cohort (N = 1645) on stable therapy with moderate disease activity and a mean disease duration of 10.9 years was a HAQ-DI improvement (decrease) of ≥0.68 points. Of the various MCIDs previously reported, the d crit value is closest to the 0.74 "really important difference" determined from objective reports of work disability [9]. In the full patient cohort (N = 2740), 22.1% achieved a HAQ-DI-d crit response at month 6 after initiation of adalimumab therapy. Approximately 70% of patients who achieved a HAQ-DI-d crit response at month 6 retained it at months 12 and 24. The stability of the HAQ-DI-d crit criterion over 18 months is especially noteworthy given that disease-related deterioration in function occurs over time in patients with RA [22]. In contrast, patients in the small improvement subgroup showed considerable variation in HAQ-DI responses at subsequent time points, with some improving and some deteriorating.
Our observation that achievement of a HAQ-DI MCID of 0.22 is in some cases due to random variation, rather than an improvement in function, is in keeping with a previous study by Wolfe et al. involving 50 patients with RA followed over approximately 16 years [23]. This study found that the HAQ-DI within-patient variation between assessments (approximately one per year) was 0.436, only slightly below the between-patient variation of 0.596, and almost twice as large as an MCID of 0.22. It is likely that the extensive within-patient variation contributes to the high rates of HAQ-DI MCID achievement observed in some clinical trials. In one recent study, 43% of patients in the placebo arm of a randomized trial achieved a HAQ-DI MCID of 0.22 at 3 months (prior to being switched to active treatment) [7].
An examination of baseline patient characteristics based on the magnitude of HAQ-DI change at month 6 showed that the subgroup achieving a HAQ-DI-d crit improvement at month 6 had a lower mean age, lower BMI, and shorter disease duration than patients in the subgroups with a small HAQ-DI improvement (between the frequently used MCID of 0.22 and 0.68) or no improvement (< 0.22). Baseline mean HAQ-DI scores were somewhat higher in the HAQ-DI-d crit subgroup than in the other subgroups, perhaps because responder criteria are easier to achieve with high baseline disease activity [24].
Because the derivation of the HAQ-DI-d crit was based on statistical parameters and not on patientcentered anchors, it was critical to evaluate whether a HAQ-DI-d crit response was associated with clinically relevant outcomes. We found that patients achieving a HAQ-DI-d crit response at month 6 not only had higher rates of HAQ-DI remission at months 6 and 12, but also markedly higher rates of DAS28 remission and therapeutic responses for DAS28, pain, fatigue, and patient global health than patients in the other subgroups. Similarly, mean values for the objective assessments of tender and swollen joint counts were lower in the group achieving a HAQ-DI-d crit response. It is perhaps not surprising that a more stringent functional response criterion is associated with better function at later time points. However, the association between the HAQ-DI-d crit criterion and other outcomes, such as DAS28 remission and improvement in patient-reported outcomes, indicates that HAQ-DI-d crit functional improvements are linked to meaningful differences in subsequent patient clinical status compared with the small improvement and no improvement groups.
Using a stepwise regression model, we identified change in pain from month 0 to month 6 as the most important predictor of change in HAQ-DI during the first 6 months of adalimumab therapy; this variable accounted for > 25% of the HAQ-DI change variance observed in this model. High baseline pain was a negative predictor for HAQ-DI improvement. Other studies concur on the impact of pain on function [16,23,25,26]. Pain has been identified as the largest component of HAQ-DI [23] and an explanatory variable for all subdimensions of this functional assessment tool [26]. In addition to being correlated with function, pain is also strongly associated with DAS28; 68% of patients achieving a DAS28 therapeutic response, as assessed by the DAS28-d crit , also achieved a significant improvement in pain [16]. Together, these data suggest that pain is an important driver of therapeutic outcomes. We further identified high baseline HAQ-DI as a positive predictor for improvements in HAQ-DI from month 0 to month 6, likely due to the greater window for improvement in patients with high baseline scores. As others have observed, one of the most important drawbacks of HAQ-DI as a functional assessment is a floor effect in which patients with low baseline HAQ-DIs cannot experience significant HAQ-DI decreases despite clinical improvement [1].
This study has several important limitations. Although the HAQ-DI-d crit was derived from a large sample size, the discovery cohort was limited to German patients preparing to initiate adalimumab therapy. Accordingly, patients with different ethnicities or milder or earlier disease may have a different HAQ-DI-d crit limit than the one reported here. As our data indicate, the HAQ-DId crit for patients with baseline HAQ-DI < 1 is 0.597, rather than the higher number we used as a conservative value in this study. It is therefore possible that the HAQ-DI-d crit used in the study reported here is too high for patients with milder RA. We hope our statistical methods will be applied to varied groups of patients in other countries to provide insights into variations in HAQ-DI-d crit values in different populations and with different disease severities. In addition, it is important to note that individual patients may experience meaningful benefits with HAQ-DI improvements lower than the statistically determined HAQ-DI-d crit . However, as we have shown in this study, on a population-wide basis lower HAQ-DI improvements may be due to random fluctuation and are unlikely to be as clinically relevant or as stable as a HAQ-DI-d crit response. We acknowledge that patients who initiate treatment with good physical function are not well suited for this measure because of the fairly large change required to achieve a HAQ-DI-d crit response; we excluded patients who were in functional remission (HAQ-DI < 0.5) from our analyses. As noted previously, floor effects (the inability of patients with low baseline HAQ-DIs to experience significant HAQ-DI decreases despite clinical improvement) are an issue with the HAQ-DI, and this tool is not appropriate for detecting change within the range of normal physical function [1].

Conclusions
Our data indicate that the statistically determined HAQ-DI-d crit value of 0.68 represents a robust change in function that can be distinguished from long-term random fluctuation. The clinical relevance of this measure is shown by the fact that achievement of a HAQ-DI-d crit corresponds to other patient-reported and objective therapeutic outcomes. The stability of this criterion and its ability to reliably predict future functional status distinguishes it from other commonly used measures of HAQ-DI improvement that rely on smaller reductions. We hope our study will help extend the utility of HAQ-DI assessments in both randomized clinical trials and daily clinical practice.
Abbreviations ACR: American College of Rheumatology; BMI: Body mass index; DAS28: Disease Activity Score based on 28 joints; DAS28-d crit : Critical difference for change beyond random variation in the DAS28 (improvement of ≥1.8 from baseline); HAQ-DI: Health Assessment Questionnaire-Disability Index; HAQ-DI-d crit : Critical difference for change beyond random variation in the HAQ-DI (decrease ≥0.68 from baseline); MCID: Minimum clinically important difference; RA: Rheumatoid arthritis