Background
Many rheumatoid arthritis (RA) patients who are treated with biological disease-modifying anti-rheumatic drugs (bDMARDs) achieve long periods of low disease activity or remission [
1]. However, bDMARDs may also lead to adverse events, call for self-injections or hospital visits, and are expensive [
2‐
4]. Thus, tapering bDMARDs to the lowest effective dose is of great clinical interest and may support the sustainability of the healthcare system as a whole.
The guidelines of the European League against Rheumatism (EULAR) on the management of RA advise to consider tapering in patients that are in persistent remission [
5]. In addition, numerous clinical trials and reviews provide supportive evidence to also consider tapering in patients with stable low disease activity (LDA) [
6,
7]. This is in line with routine clinical practice, as maintaining a satisfactory low level of disease activity with a reduced medication dose is also of value.
The most successful and cost-effective strategy for tapering appears to be “disease activity-guided dose optimization” (DGDO) [
8‐
10]. This means the dose is gradually tapered (usually by increasing the administration interval), until either disease activity flares or the bDMARD is discontinued. Two randomized trials have demonstrated that, using this strategy, 63–80% of patients can taper or even stop their bDMARD [
8,
9]. No important difference was observed in the proportion of patients with LDA or remission after 18 months between DGDO and usual care.
However, since DGDO is a “trial and error” approach, flares occur frequently during the tapering process. In the case of a flare, the previously effective dose needs to be reinstated or additional therapy is necessary. Although these short-lived flares do not seem to relevantly affect radiographic progression or long-term disease activity, there is conflicting evidence regarding functional outcome and impact on quality of life [
9,
11]. Therefore, it would be beneficial to predict whether, and to which extent, a bDMARD can be tapered in a particular patient without a flare occurring.
Several predictors for successful dose reduction or discontinuation of bDMARDs have been explored [
12,
13]. However, these studies only included “baseline predictors” from before the start of the tapering process, and the strength of the evidence for these predictors is limited. Furthermore, “successful tapering” is often defined as reaching a lower bDMARD dose at some time point after the start of tapering, regardless of whether a flare occurred during the tapering process.
Therefore, this study aims to predict the likelihood of a flare occurring during bDMARD tapering at each consecutive dose reduction step. Such a dynamic prediction may be used to optimize the DGDO strategy for bDMARDs for an individual patient, as the decision for a further tapering step can be based on the predicted risk of a flare. This could minimize the number of flares during tapering, while retaining most of the bDMARD dose reduction. To facilitate future implementation of this approach in routine practice, we decided to exclusively use information easily obtainable in regular care.
Discussion
The goal of this study was to develop and validate a flare prediction model to reduce the number of flares during bDMARD tapering, exclusively using data that can easily be obtained in routine care. Our simulation results show that the addition of our flare prediction model to a DGDO tapering strategy is both superior to routine care and to DGDO alone, when considering the ratio between the number of flares and amount of bDMARD dose reduction. To our knowledge, this is the first study not only developing a dynamic flare prediction model, but also performing an external validation and subsequent simulation of clinical impact in the context of bDMARD tapering.
As tapering bDMARDs is of great clinical interest, other studies have also investigated predictors in the context of tapering. Several studies and systematic reviews have investigated the predictive value of biomarkers, serum drug levels, or PET-scans during bDMARD tapering [
12,
20‐
22]. However, none of these studies showed a clear predictive value of these markers. In addition, the study by Verhoef et al. showed that for a biomarker to be cost-effective during bDMARD tapering, it must be inexpensive and have high sensitivity and specificity [
23]. If future studies do show a predictive value of (bio)markers during tapering, these can be included in the prediction model. The added predictive value of such markers and their cost-effectiveness should then be assessed. An important advantage of the current model is that it only includes variables that are routinely collected in RA clinical practice, thereby enhancing feasibility and cost-effectiveness.
A recent review [
13] focused on predictors for successful discontinuation, rather than tapering, of bDMARDs. Similar to the current study, they found seropositivity, LDA, disease duration, and CRP/ESR to be possible predictors of value. In addition, they mention physical functioning and ultrasound measures as possible predictors. However, the studies included in this review were often small and too heterogeneous to compare in meta-analysis. Furthermore, only fixed baseline variables were included, rather than performing dynamic predictions using information over time.
Two studies have incorporated such dynamic variables to predict RA disease activity over time [
24,
25]. The study by Norgeot et al. [
24] found the Clinical Disease Activity Index (CDAI), CRP/ESR, glucocorticoid use, and other DMARD use to be important predictors. However, this study is not performed in the specific context of tapering bDMARDs. The model developed by Vodenčarević et al. [
25] does focus specifically on bDMARD tapering. However, this model is developed and validated on the clinical trial data of 41 patients only and may therefore be difficult to extrapolate to routine care. Both of these dynamic prediction models were developed using machine learning techniques. We have previously also explored the potential of a machine learning model similar to Vodenčarević et al. [
26]. However, we chose to pursue the joint latent class model as the performance was similar, and the joint latent class model is more transparent regarding the DAS28-trajectories used and the effects of covariates in the model (i.e., providing hazard ratios).
A major unique strength of this study is that the model’s performance is assessed in external validation. There were several significant differences between the patient populations from routine care used for developing the model and the DRESS pragmatic trial data for external validation regarding baseline characteristics, disease activity, and bDMARD treatment. However, despite these differences the model retained an adequate performance in the external validation, indicating that these differences do not invalidate the model. Another strength is that the clinical impact is evaluated in simulation. In this simulation, successful tapering was not only defined by reaching a lower bDMARD dose, but also by the number of flares during tapering. Furthermore, our model was developed using easily obtainable parameters from routine care EHR data, rather than, e.g., clinical trial data or specific biomarkers [
27].
The AUC in cross-validation and external validation (0.76 and 0.68, respectively) may be interpreted as only a moderate performance. However, the AUC may not be the most suitable measure to assess the model’s clinical utility. The added value in clinical practice is determined by the effects of prediction-aided treatment on the rate of flares and the amount of bDMARD dose reduction, when compared to the available alternatives. The currently existing alternatives are either continuing the bDMARD at full dose or tapering until a flare occurs in a trial-and-error approach. Our simulation results show that prediction-aided treatment is superior to both these alternatives regarding the ratio between the number of flares and the amount of bDMARD dose reduction. Therefore, prediction-aided treatment may present the best available bDMARD tapering strategy. This is currently being investigated in the PATIO randomized controlled clinical trial (Dutch Trial Register number NL9798).
Interestingly, the AUC of the prediction model improved in external validation from 0.68 to 0.71 when baseline predictions were removed. This is likely because the model can only function as a “joint” model when longitudinal information is available. This effect on AUC was also observed in the development data, but due to the relative overrepresentation of baseline visits in the DRESS data compared to the development data, this was less pronounced. As the removal of baseline predictions had almost no effect on the simulation of clinical impact, we chose to retain these predictions. Including disease activity measures prior to the start of tapering could potentially improve the performance of our model, as this would ensure that longitudinal information is available at baseline.
A challenge in this study was the limited data quality regarding the frequency of DAS28 measurements in the development data. This might also have contributed to the different flare rates and resulting discrepancy between the optimal cutoff points in the development data and external validation data from the DRESS trial. When implementing a prediction-aided bDMARD tapering strategy in clinical practice or clinical studies, a treat-to-target (T2T) strategy with regular (e.g., 3 monthly) DAS28 measurements should be used, in line with EULAR recommendations [
5]. As the DAS28 measurement frequency in the DRESS trial best reflects these recommendations, the optimal cutoff point found in simulation (i.e. 35%) is likely the most suitable for implementation of the model in clinical practice.
Besides the DAS28 measurements, several other parameters were also difficult to extract as structured data from the EHR, such as smoking, concurrent csDMARDs, and erosiveness of disease. We explored imputation to increase the amount of these data points, but this did not improve the model’s performance in cross-validation. Improved registration of these parameters and the optimization of free text mining techniques could allow for future inclusion of these parameters in model development and possibly a better performance. Importantly, the results from external validation are not biased by missing data, since the DRESS data had a standard measurement frequency and very few data missing on disease activity. Therefore, we think our simulation should be an accurate representation of the potential clinical impact of using the models predictions as an decision aid added to a DGDO strategy.
Since prediction-aided treatment could reduce the number of flares during bDMARD tapering, patients and physicians may be more willing to start tapering with such a prediction model than without [
28]. Furthermore, our prediction model can be used as an add-on to DGDO, retains most of the bDMARD reduction as attained by DGDO, and is a low cost intervention. Therefore, the model might prove to be an even more cost-effective strategy than DGDO alone [
10]. The clinical implementation may be relatively straightforward, as it uses only predictors usually available in the EHR.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.