Scoring systems in the intensive care unit: A compendium
Correspondence Address: Source of Support: None, Conflict of Interest: None DOI: 10.4103/0972-5229.130573
Source of Support: None, Conflict of Interest: None
Severity scales are important adjuncts of treatment in the intensive care unit (ICU) in order to predict patient outcome, comparing quality-of-care and stratification for clinical trials. Even though disease severity scores are not the key elements of treatment, they are however, an essential part of improvement in clinical decisions and in identifying patients with unexpected outcomes. Prediction models do face many challenges, but, proper application of these models helps in decision making at the right time and in decreasing hospital cost. In fact, they have become a necessary tool to describe ICU populations and to explain differences in mortality. However, it is also important to note that the choice of the severity score scale, index, or model should accurately match the event, setting or application; as mis-application, of such systems can lead to wastage of time, increased cost, unwarranted extrapolations and poor science. This article provides a brief overview of ICU severity scales (along with their predicted death/survival rate calculations) developed over the last 3 decades including several of them which has been revised accordingly.
Keywords: Acute physiology, and chronic health evaluation, beta-coefficients in scoring systems, intensive care unit scoring systems, probability of death calculation
Assessment of medical treatment outcome was started in 1863, when Florence Nightingale first addressed this issue.  Initially, outcome prediction in critical illness was based on the subjective judgment of the clinicians. The rapid development of intensive care units (ICUs) created the need for quantitative and clinically relevant surrogate outcome measures in order to evaluate the effectiveness of treatment practices. Hence, scoring systems have been developed and applied for the same. The outcome of intensive care patients depends on several factors present on the 1 st day in the ICU and subsequently on the patient's course in ICU. For such populations, many scoring systems have been developed but few are used. Several of these systems are known simply by their acronym.  A scoring system usually comprises of two parts - a score (a number assigned to disease severity) and a probability model (equation giving the probability of hospital death of the patients). A model refines the ability of scores or scales to be used in comparing various groups of patients for the purpose of treatment, triage or comparative analysis  and thus helps in decision making. They also allow an increased understanding of the effectiveness of treatment and optimizing the use of hospital resources and hence aid in the development of treatment standards. An accurate scoring model should have a high predictive power starting from day one, should not be limited to certain cut-off-points and should be calculated according to the well-known and established formula used for such a purpose with specific -coefficients. , The transformation of the (severity) score into a probability of death in the hospital uses a logistic regression equation. The ideal model should be well-validated, calibrated and discriminated. "Validity" is the term usually used to assess the performance of the prediction model by testing in the dataset that was used for model development. "Calibration" evaluates the accuracy of the degree of correspondence concordance between the estimated probabilities of mortality produced by a model and the actual mortality experienced by patients population and can be statistically evaluated using formal goodness-of-fit tests.  "Discrimination" refers to the ability of the model to distinguish patients who die from patients who live, based on the estimated probabilities of mortality. Measures of discrimination are sensitivity, specificity, false positive rate, false negative rate, positive predictive power, misclassification rate, area under the receiver operating characteristic curve and concordance.  This article provides the reader with an interesting compendium of ICU severity scales along with their predicted death and survival rate calculations, which can be adopted in order to improve decision making, treatment, research and in comparative analyses in quality assessment.
In most of the scoring systems, scores are calculated from data collected on the first ICU day - acute physiology and chronic health evaluation (APACHE), simplified acute physiology score (SAPS) and mortality prediction model (MPM). Others are repetitive and collect data every day throughout the ICU stay or for the first 3 days - organ dysfunction and infection system (ODIN), sequential organ failure assessment (SOFA), multiple organs dysfunction score (MODS), logistic organ dysfunction (LOD) model and three-day recalibrating ICU outcomes (TRIOS). Scores can be subjective or objective.  Subjective scores are established by a panel of experts who choose the variables and assign a weight to each variable based on their personal opinion. E.g., APACHE II, ODIN and SOFA. Objective score variables are collected using the logistic regression modeling techniques and clinical judgment to determine ranges and to assign weights. E.g., APACHE III, SAPS II, MPM II, MODS, LOD score (LODS) and TRIOS. The commonly used ICU scoring systems (for the adult population) discussed in this article are:
Many studies have shown the effectiveness of scoring systems in predicting hospital mortality and most of the available scores are comparable in terms of outcome prediction. , Prediction models should however, periodically be updated to reflect the changes in medical practice and case-mix over time.  A prospective study by Meyer et al.  showed that among patients who were predicted by clinical judgment and APACHE II score to die, more than 40% of actually survived. They concluded that no method is reliable for predicting the mortality of surgical ICU patients. This raises the question of what are the desirable characteristics of risk-adjusted mortality predictors and how to avoid the confusion that exists between interpreting an estimated probability of mortality and predicting whether a given patient will live or die.
Developed in 1985 using a database of North American ICU patients, APACHE II [Table 1]a and b]  is the severity of disease classification system. It uses a point score based upon values of 12 routine physiologic measurements (taken during the first 24 h after admission), age and previous health status to provide a general measure of severity of disease. An integer score from 0 to 71 is then computed based on these measurements; higher scores imply a more severe disease and a higher risk of death. APACHE II scores can prognostically stratify acutely ill patients and assist investigators comparing the success of new or differing forms of therapy. If a variable has not been measured, it is assigned zero points. Hospital mortality is predicted using the APACHE II score, the principal diagnostic category with which the patient is admitted to ICU and also depending on whether or not the patient required emergency surgery.  The estimated risk of hospital death is calculated using logistic regression equation, utilizing specific beta co-efficients made for its purpose [Table 1]a and b].  In a retrospective study of 396 patients by Peter et al.  the performance of the APACHE II score, the SAPS II, MPM II and the poisoning severity score (PSS) was evaluated; they found that even in the setting of poisoning, the generic scoring systems APACHE-II and SAPS-II outperform the PSS. However, the APACHE II score is neither very sensitive nor specific in terms of mortality prediction. The major limitation of this scoring system is that many patients have several co-morbid conditions and selecting only one principal diagnostic category may be very difficult. In addition, the physiological variables are all dynamic and can be influenced by multiple factors, including ongoing resuscitation and treatment, hence, time bias is present; which is an important consideration when treating patients in the ICU especially with recent increased emphasis on the importance of an early goal directed therapies.  All these factors can lead to a risk of overestimation of predicted mortality.
The APACHE III prognostic system was designed to refine APACHE II. It consists of two parts: 
APACHE III score, which can provide initial risk stratification for severely ill hospitalized patients within independently defined patient groups
APACHE III predictive equation, which uses APACHE III score and reference data on major disease categories and treatment location immediately prior to ICU admission to provide risk estimates for hospital mortality for individual ICU patients.
APACHE III largely uses the same variables as APACHE II, but a different way is used to collect the neurological data-no longer using the GCS. It adds particularly two important variables: The patient's origin and the lead-time bias. The acute diagnosis is taken into account; one diagnosis must be preferred.  The APACHE III scores (evaluated as the most deranged values from the first 24 h in the ICU) vary between 0 and 299 points, including 252 points for the 18 physiological variables, 24 points for age and 23 points for the chronic health status; all variables are chosen to increase the explanatory power of the model. 
APACHE IV was gradually developed,  using day 1 data for 1, 16, 209 ICU admissions and using the same variables as APACHE III. New variables added were: Mechanical ventilation, thrombolysis, impact of sedation on GCS, re-scaled GCS and PaO 2 /FiO 2 (arterial oxygen tension and fractional concentration of inspired oxygen) ratio.
First described in 1993 by Le Gall et al.,  SAPS II [Table 2]  is used to score the ICU patients' severity. The model includes 17 variables: 12 physiologic variables, age, type of admission and three disease-related variables. As with other scoring systems, the SAPS II score registers the worst value of selected variables, within the first 24 h after admission. The SAPS II score can vary between 0 and 163 points (0-116 points for physiological variables, 0-17 points for age and 0-30 points for previous diagnosis). Probability of death is then calculated using logistic regression [Table 2].  However, the discrimination and particularly the calibration of the SAPS II model do not fit when applied to a new population. Therefore, to calculate the standardized mortality ratio or the ICU performance measure, a proposal was recently made by Le Gall et al.,  where six admission variables were added to SAPS II: Age, sex, length of the ICU hospital stay, patient location before ICU, clinical category and whether drug overdose was present or not. Probability of death (P) for this expanded model is again calculated using logistic regression, where:
A world-wide database of 19,577 patients was then used to develop SAPS III in 2005, , comprising of three parts: chronic variables, acute variables including the sepsis and its characteristics and physiology. Data are acquired within 1 st h of admission. The calculated probability of ICU and hospital death emerges by adding diagnoses to the model. Recently, Liu et al., developed an electronic SAPS 3, which was tested among 67,889 first-time ICU admissions at 21 hospitals between 2007 and 2011 to predict hospital mortality. This customized eSAPS 3 version was also developed in a 40% derivation cohort and tested in a 60% validation cohort; they concluded that this eSAPS 3 shows good potential for providing automated risk adjustment in the ICU. 
In an article in 1995 Marshall et al.  proposed an objective scale to measure the severity of multiple organ dysfunction as an outcome in critical illness and tested these criteria in a population of 692 patients. They developed the MODS [Table 3],  which comprises a score based on six organ failures. Scores were given from 0 to 4 (maximum of 24). Hospital mortality is then estimated after adding the total scores [Table 3].  This score correlated in a graded fashion with the ICU mortality rate, both when applied on the first day of ICU admission as a prognostic indicator and when calculated over the ICU stay as an outcome measure. The score showed excellent discrimination and that mortality depends not only on the admission score but also on the course of ICU stay and therefore, may prove useful as an alternative end point for clinical trials involving critically ill patients.
The SOFA system [Table 4] was created in a consensus meeting of the European Society of Intensive Care Medicine in 1994 and further revised in 1996.  In 1998, Vincent et al.  evaluated the SOFA subjective score on 1449 patients. This score was developed to quantify the severity of patients illness, based on the degree of organ dysfunction data on six organ failures and are scored on a scale of 0-4. One failure plus a respiratory failure indicate the lowest mortality; all the other combinations yield mortality between 65% and 74%. Subsequent analyses have considered the maximal score plus the maximal change and have shown that the latter has a lower prognostic value than the former; the time course of the patient's condition during the entire ICU stay is also taken into account.  Although there is no direct conversion of SOFA score to mortality, a rough estimate of mortality risk may be made based on two prospective papers that have been published [Table 4]. ,,
Sequential assessment of organ dysfunction during the first few days of ICU admission is a good indicator of prognosis. A prospective study by Bale et al. showed that both the mean and highest SOFA scores are particularly useful predictors of outcome, independent of the initial score and a high SOFA score at 48 h of presentation predicts an increased mortality rate.  In their study, Ferreira et al.  determined that, regardless of the initial score, an increase in SOFA score during the first 48 h in the ICU predicts a mortality rate of at least 50%. Vosylius et al.  showed that cumulative SOFA scores were better in discriminating outcome compared to a single organ dysfunction scores. A study published in 2007, Grissom et al. proposed and published a simplified version of the SOFA score known as the Modified SOFA (MSOFA) score. The MSOFA score eliminates the necessity of laboratory examinations such as the platelet count and substitute measurements of PaO 2 /FiO 2 and serum bilirubin level with the SPO 2 /FiO 2 ratio (obtained by dividing pulse oxymeter saturation with a fraction of inspired oxygen) and clinical examination for jaundice. Although simpler, this score has to have more validation.
Le Gall et al.  initially proposed the LODS [Table 5] ,, in 1996, where 12 variables were tested and six organ failures defined. The model has been tested over time. The difference between the LODS on day 3 and day 1 is highly predictive of the hospital outcome. The LODS was designed to combine measurement of the severity of multiple organ dysfunctions into a single score. The probability of death is then calculated using an equation designed for its purpose [Table 5]. ,,
In a prospective multicenter study on 1685 ICU patients, Timsit et al. concluded that daily LOD and SOFA scores showed good accuracy and internal consistency and they could be used to adjust the severity for events occurring in the ICU. Another prospective study by Kim and Yoon in 521 consecutive patients admitted to the neurological ICU, showed that both the LODS and the APACHE II score had excellent discrimination but LODS had superior calibration; they therefore, concluded that the LODS was more stable than the APACHE II scoring system in the neurological ICU setting.  However, Maccariello et al. evaluated the performance of LODS in patients receiving renal replacement therapy and found poor correlation between LODS score and predicted mortality rate. They attributed this poor correlation to the fact that it was studied in an older and rather severely ill population due to high frequencies of comorbidity, sepsis, functional capacity impairment and need for mechanical ventilation and vasoactive amines.
First described by Lemeshow et al.  MPM II [Table 6] is a model giving the probability of hospital death directly. Four models have been proposed: MPM II at admission and at 24, 48 and 72 h. The initial version of this model was designed to predict mortality at hospital discharge based on data from admission and after the first 24 h in the ICU.  Additional models were later developed and included data from 48 to 72 h after admission to the ICU. This model uses chronic health status, acute diagnosis, a few physiological variables and some other variables including mechanical ventilation. The MPM II at 48 and 72 h use the same variables as MPM II at 24 h and are based on the most deranged values of the preceding 24 h with different weights to compute the probabilities of death using logistic regression [Table 6]. ,
Fagon et al.  proposed the ODIN system [Table 7] , in 1993. This includes data recorded within the first 24 h of ICU admission if there is any presence or absence of dysfunction in six organs plus one infection and it differentiates the prognosis according to the type of failures; the highest mortality rates was found to be associated with hepatic followed by hematologic and renal dysfunctions and the lowest with respiratory dysfunction and infection. Taking into account both the number and the type of organ dysfunction, a logistic regression model was then used to calculate individual probabilities of death that depended upon the statistical weight assigned to each ODIN (in the following order of descending severity: Cardiovascular, renal, respiratory, neurologic, hematologic, hepatic dysfunctions and infection).
In 2001, Timsit et al. proposed a composite score, the TRIOS [Table 8],  using daily SAPS II and LODS for predicting hospital hospitality in ICU patients hospitalized for more 72 h. Using logistic regression, the probability of hospital mortality can be computed [Table 8]  This TRIOS composite score has excellent statistical qualities and may be used for research purposes. 
The GCS [Table 9] is a universal tool for the rapid assessment of an injured  patient's consciousness level and as a guide to the severity of brain injury.  Several studies have shown that there is a good correlation between GCS and neurological outcome. , A modified verbal and motor version has been developed to aid in the evaluation of the consciousness level of infants and children. , [Table 9].
Prediction models do face many challenges. Some of the desirable characteristics of risk-adjusted mortality predictors are that no lead-time bias should be present and they should not be affected by whether a patient is hospitalized or not. Albeit imperfect, the existing models have increased application in decision making at the right time and in decreasing hospital cost. It is also imperative that the choice of the severity score scale, index, or model accurately match the event, setting or application, as mis-application of such systems can result in avoidable wastage of time, increase in cost incorrect extrapolations and may contribute to mismanagement and death.
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6], [Table 7], [Table 8], [Table 9]