Introduction
The European Resuscitation Council (ERC) and the European Society of Intensive Care Medicine (ESICM) published joint guidelines for neurological prognostication after cardiac arrest (CA) in 2014 and 2015 [
1,
2]. The included algorithm consists of 4 separate steps and was based on the current level of evidence for individual methods and expert opinions about combinations of methods.
According to ERC/ESICM, prediction of neurological recovery cannot be done with high confidence before 72 h after CA and only after confounding factors such as metabolic derangements or effects of residual sedation and muscle-relaxants are excluded (Step 0) [
1]. In Step 1, the patient’s best response to painful stimuli is evaluated according to the Glasgow Coma Scale Motor Score (GCS-M) as a screening criterion. If patients either have no motor response to pain or extend the extremities (GCS-M ≤ 2), they will be assessed further whilst patients with at least stereotypic flexor response (GCS-M 3) or better are excluded from further prognostication. In Step 2 the most robust predictors are considered. Outcome is “very likely poor” if a patient has bilaterally absent corneal and pupillary light reflexes, and/or bilaterally absent N20 response on short-latency somatosensory evoked potentials (SSEP). A patient not fulfilling the Step-2-criteria should be re-examined after ≥ 24 h. If GCS-M is still ≤ 2, Step 3 of the algorithm states that outcome will be “likely poor” if there are ≥ 2 pathological findings of the following: “high” serum neuron-specific enolase (NSE) according to locally established cut-off values, unreactive burst-suppression or unreactive status epilepticus on electroencephalography (EEG), generalized oedema on head computed tomography (CT) ≤ 24 h post-arrest or on magnetic resonance imaging (MRI) or early (≤ 48 h) status myoclonus. Since the publication of the ERC/ESICM guidelines, a standardized classification of post-arrest EEG patterns has been suggested [
3‐
6] and two large studies on serum NSE levels have been published [
7,
8].
The aim of this study was to assess the performance of the ERC/ESICM algorithm in a large international cohort of patients comparing predicted neurological outcome with the outcome reported 6 months post-arrest. Additionally, we wanted to identify strengths and weaknesses of the current algorithm, and explore possible modifications.
Methods
Retrospective descriptive analysis using data from the international multicentre Target Temperature after Out-of-hospital Cardiac Arrest (TTM) Trial randomizing 939 adult patients with CA of presumed cardiac cause to a targeted temperature management of 33 °C or 36 °C between 2010 and 2013. Rationale, design and results have previously been published [
9,
10]. Ethical consent was obtained in each participating country [
10]. The TTM-database contains information on clinical data, patient demographics, neurological prognostication, withdrawal of life-sustaining-therapy (WLST) and follow-up at 6 months after CA [
11,
12]. Poor neurological outcome was defined as Cerebral Performance Category Scale (CPC) 3–5 (severe cerebral disability, vegetative state or brain death) [
13]. GCS-M and clinical seizures were evaluated daily; brain stem reflexes were registered at formal neurological prognostication ≥ 108 h post-arrest [
10,
11,
14]. In this study, we used GCS-M on day 4 (72–96 h post-arrest), since this is closest to guideline recommendations [
1,
2].
A routine EEG was performed in unconscious patients 48–72 h post-arrest, and if available, SSEP was performed during normothermia and was recommended for patients unconscious between 84 and 108 h after cardiac arrest [
12]. Blinded retrospective evaluation of original EEG data was based on the terminology of the American Clinical Neurophysiology Society [
15] and classified into
unreactive burst-
suppression and
unreactive status epilepticus (abundant rhythmic/periodic discharges) according to ERC/ESICM [
2]. In an exploratory analysis the recently proposed standardized
highly malignant EEG patterns was applied [
3,
4,
15,
16].
Serum samples were collected at 24, 48 and 72 h after CA, stored in a central biobank and analysed after TTM-trial completion [
7]. We defined NSE levels as “high” if ≥ 48 pg/mL at 48 h and/or ≥ 38 pg/mL at 72 h, corresponding to 2% false positive rates for poor outcome as previously published [
7]. In a sensitivity analysis, we explored an alternative cut-off ≥ 33 pg/mL at 48 and/or 72 h as suggested in previous guidelines [
17,
18] and recently used validating the ERC/ESICM algorithm [
19,
20]. Both CT and MRI were performed on clinical indication and evaluated for generalized oedema by local radiologists in a pragmatic approach similar to clinical practice. The first available CT was included in this study [
21].
According to trial protocol, WLST was permitted when any of the following criteria were fulfilled; (1) status myoclonus ≤ 24 h post-arrest and bilaterally absent N20 potentials after rewarming, (2) persisting coma (GCS-M ≤ 2) AND bilaterally absent N20 potentials OR a treatment refractory status epilepticus at ≥ 108 h post-arrest, (3) brain death according to national legislation or (4) ethical reasons (also including treatment refractory shock or end-stage multiorgan failure) [
12]. If applicable, the presumed cause of death was reported by the physician responsible for patient care [
12].
Statistical analyses
Continuous variables are expressed as median (interquartile range) and categorical variables in numbers (percentages). Analyses were performed using two different cohorts. Prognostic accuracies of the ERC/ESICM algorithm and explorative variations thereof were calculated in patients examined with GCS-M on day 4 (n = 585). Missing diagnostic examinations were regarded “negative/non-pathological”, allowing evaluation according to remaining ERC/ESICM criteria similar to clinical practice.
To reduce selection bias, prognostic accuracies of single and combinations of diagnostic methods were calculated using all patients with 6-month outcome registered (n = 933). Only patients actually examined were included when calculating prognostic accuracies. Prognostic accuracies of methods were compared to each other using the McNemar’s Test.
The term “true” was used when predicted outcome and reported outcome were identical, whilst “false” indicated that outcome prediction was contrary to the reported outcome. “Negative” referred to good outcome (CPC1–2), and “positive” referred to poor outcome (CPC3–5). For example, “true positive” (TP) indicated a patient where both predicted and reported outcome was poor. 95% confidence intervals were calculated with Wilson’s method. Analyses were performed using R version 3.5.1 (The R Foundation for Statistical Computing) [
22].
Discussion
Applied to the cohort of a large pragmatic international trial, the ERC/ESICM algorithm predicted poor neurological outcome without false positive predictions and correctly identified 38.7% of poor outcome patients. Despite various exploratory modifications with the same outcome definitions used, specificity remained at 100% in this cohort. No good outcome patient had ≥ 2 pathological findings and only 3% had 1 pathological finding, elevated NSE being the most common.
The ERC/ESICM algorithm failed to identify 60% of the poor outcome patients, among whom the “presumed cause of death” was neurological in approximately 60%. An algorithm intended for identifying cerebral injuries cannot be expected to identify the remaining patients with other causes of death. Maintaining a very high specificity is essential for an algorithm predicting poor outcome, but improved sensitivity is nevertheless desirable. Two single-centre studies recently validated the ERC/ESICM algorithm and both reassuringly also concluded on 100% specificity, but WLST was permitted due to specified criteria in both studies [
19,
20]. The reported sensitivities for the ERC/ESICM algorithm were 18–26% and 32%, respectively [
19,
20].
The sensitivity of a prognostic method or an algorithm will vary depending on the extent to which the different prognostic methods are available and which definitions are used for pathological findings, as well as the selection of patients included. In our study, this is illustrated by the low sensitivity of the strictly defined status myoclonus (6.9%) and the relatively high sensitivity of an elevated NSE (60.2–67.3%), frequently analysed due to the common biobank.
The ERC/ESICM recommends GCS-M 1–2 as a screening criterion [
1]. Our results confirm that GCS-M should not be used to make decisions on level-of-care due to limited specificity. Evaluation of persisting coma by motor score may anyhow be adequate to differentiate between unconscious patients in need of further prognostication and those with a presumed good outcome. In this study, a considerable fraction of patients with GCS-M 3–4 had poor outcome and ≥ 2 pathological prognostic findings, and it may be considered whether the current dichotomization should remain between GCS-M 2–3 in future guideline algorithms.
As supported by our results, false positive findings may occur with all methods currently used for prognostication, emphasizing the importance of a multimodal approach to reduce the risk of overly pessimistic predictions [
8,
23‐
26]. Six patients with single pathological findings were awake and obeying commands on day 4, illustrating that sufficient time for recovery post-arrest is also an important part of neurological prognostication [
1,
27].
The ERC/ESICM algorithm permits unimodal prognostication using SSEP, whilst the TTM-trial protocol permitted unimodal prognostication for patients fulfilling specific SSEP or EEG criteria [
1,
11]. Applying a stricter multimodal approach, any ≥ 2 pathological findings predicted poor outcome with maximal specificity in unconscious patients irrespective of GCS-M level (Fig.
2c, d). In this multimodal approach, overall sensitivity was slightly decreased and we speculate that sensitivity may have been higher with an increased use of diagnostic methods.
The evidence for the ERC/ESICM algorithm consists largely of studies where withdrawal of therapy was permitted, and hence influence from the self-fulfilling prophecy cannot be excluded [
1,
28,
29]. Whilst WLST
was common in the TTM-trial, the trial protocol was designed to avoid premature decisions, applying conservative rules for prognostication [
11]. The pre-specified TTM-criteria permitting WLST (SSEP, status myoclonus and EEG) partly overlaps with the ERC/ESICM recommendations published 2 years after trial completion [
2,
11]. Bilaterally absent N20-potentials in combination with early status myoclonus or in isolation were considered predictive of poor outcome in the TTM-trial. Whilst SSEP is considered a very robust method for prognostication after cardiac arrest, the self-fulfilling prophecy may have affected most previous studies [
25,
28,
29]. One patient with absent N20-potentials in the TTM-trial awoke before scheduled prognostication [
11] and the majority of our patients lacking N20 potentials had ≥ 1 additional prognostic finding indicating severe brain injury. We still cannot exclude the self-fulfilling prophecy affecting our results but find it reassuring that bilaterally absent N20 predicted poor outcome with 100% specificity in two recent studies where WLST was not practiced [
30,
31].
When we calculated prognostic accuracies of the ERC/ESICM algorithm using an alternative definition of poor outcome (CPC 4–5), one patient who fulfilled the criteria for “poor outcome likely” and survived with severe cerebral disability (CPC 3) was reclassified. Definitions of poor outcome vary between studies, and survival with CPC 3 includes a wider range of severe cerebral disabilities, some of which may be considered acceptable outcome by patients or caregivers.
Strengths of this study include the large cohort, a conservative and protocolized approach to prognostication and the extensive clinical information available [
9,
10]. Results of EEG and NSE have been obtained after the study in a blinded fashion, and all results on clinical tests, SSEPs, CT-examinations and decisions on level-of-care have been presented in separate projects [
3,
4,
7,
11,
14,
16,
21].
There are several limitations to this retrospective study. Some of the co-authors who designed the TTM-trial also participated in the ERC/ESICM recommendations for prognostication after cardiac arrest, creating a risk for inherent bias. Clinical neurological examinations in the TTM-trial were performed according to local routines. We acknowledge that an increased focus on neurological examination techniques might have improved prognostic performance, since imprecise testing may be common [
32]. Neuroimaging and SSEP was often performed on clinical indication in patients with presumed poor neurological prognosis likely leading to selection bias. We included the first reported CT-examination despite guidelines only recommending CT ≤ 24 h post-arrest. A stricter application of the 24-h time limit would presumably have reduced overall sensitivity of the algorithm. However, recent studies indicate improved performance of brain CT after 24 h [
21,
33,
34]. NSE cut-off values were defined from the same TTM-study cohort [
7]. Despite NSE being analysed after trial completion, its prognostic performance may still be indirectly affected by WLST based on other predictors. For statistical reasons, we excluded the 24-h delay between Steps 2 and 3 and only included each prognostic method once in our analyses, which is an approximation of clinical practice where patients are continuously re-examined. In the future, quantitative methods such as pupillometry [
35,
36], standardized evaluation of neuroimaging and electrophysiology or novel serum biomarkers [
37,
38] might prove themselves valuable additions to the current algorithm.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.