Is Mallampati Class More Consistent and Reliable among Providers When Assessed from Airway Photos?

Accurate prediction of a difficult airway patient remains a challenge among laryngoscopists and anesthesia providers. Despite the lack of sensitivity and specificity of routine preoperative airway tests, many providers still perform them, suggesting they may still guide and influence airway planning. The most commonly used Mallampati exam has a low sensitivity. Our hypothesis was that digital documentation of the airway exam would improve intraobserver reliability between airway exams and provide more consistent information for airway providers. We obtained written informed consent from 250 patients presenting to the UF Health Shands Presurgical Center to participate in an observational cohort study. Their airway exam was photographed and uploaded into the electronic medical record. Data extracted from the electronic medical record were reviewed by three independent investigators. From chi-square analyses, there were significant differences (p < 0.05) in all measures across raters, indicating raters had varied assessments and predictions about patients. There were no statistically significant associations (p > 0.05) between Mallampati scores from the preassessment or reviews of photographs and the method of intubation or laryngoscopic view observed in the patient. There was also no statistically significant association between the Mallampati score and the use of video laryngoscopy for the intubation method. Moderate intraobserver reliability of the Mallampati exam may be a confounding factor regarding the lack of a significant relationship between the Mallampati exam and the assessment of whether a patient may be difficult to intubate, or the method chosen to facilitate intubation in this study.


Introduction
Accurate prediction of a difficult airway patient remains a challenge for anyone entrusted to obtain an artificial airway. Many preoperative tests have been studied and validated to assess patients for difficulty, such as observing thyromental distance, degree of mouth opening, neck range of motion (ROM), ability to protrude the lower jaw beyond the upper incisors (prognath), and other characteristics. In the 1980s, Mallampati initially described a correlation between the visualization of oropharyngeal structures and difficult laryngoscopy [1]. Samsoon and Young then modified this "Mallampati" score and described the four Mallampati classes that are still widely used during preoperative airway assessment [2]. Lewis et al. demonstrated that phonation affected the Mallampati classification and that phonation had the greatest predictive value [3]. Kahn et al. further showed that the highest specificity to detect a difficult laryngoscopy and intubation was in the upright position with phonation; the highest positive predictive value was in the supine position with phonation [4].
An ideal exam to predict a difficult airway would have a high sensitivity, a high specificity, and a high intraobserver reliability. When studied, the value of the Mallampati score as a single predictor of a difficult airway showed a specificity of 0.84, a positive predictive value of 0.10, and a sensitivity of 0.64 [5]. Various combinations of exams have been used to try to improve the predictive value of the Mallampati score, such as combining with a thyromental distance [6] or adding craniocervical extension to create the Extended Mallampati Score, EMS [7]. The intraobserver reliability of the Mallampati score is also low compared to other examinations. In a study by Eberhart et al., the κ coefficient was 0.6 for the Mallampati exam, while the upper lip bite test was 0.8 [8]. The upper lip bite test has a higher predictability (positive likelihood ratio: 14) compared to several other tests including the Mallampati score (positive likelihood value: 4.1) [9]. The κ coefficient for the Mallampati exam has been reported to be as low as 0.3 [10]. Thus, the Mallampati exam has failed to show a good sensitivity or intraobserver reliability when predicting difficult airways, even when combined with other tests.
Airway examination also involves assessing facial hair, neck mobility, and the patient's ability to sublux the jaw. Cervical spine limitation has been shown to increase the likelihood of a difficult intubation [7]. This information could be consistently reported in the anesthesia preoperative evaluation if it is incorporated into the patient's medical record using photography of the airway examination. Despite the lack of sensitivity and specificity of routine preoperative airway tests, many providers still perform these tests, suggesting that they may still guide and influence airway planning. The hypothesis of this study was that digital documentation of the airway exam would improve intraobserver reliability between airway exams and provide more consistent information for airway providers. This hypothesis was based on the assumptions that variability was caused by differences in patient instructions, patient positioning, lighting, and how the evaluation was performed.

Materials and Methods
After obtaining Institutional Review Board approval, 250 patients presenting to the University of Florida (UF Health) Shands Presurgical Center from March until April 2015 were approached and gave written informed consent to participate in the study. Since 2015, all patients who have presented for preoperative evaluation at the center in preparation for surgery have undergone a photographical examination of their airway as routine practice. N = 250 would be able to estimate Kappa statistic (agreement) with a 95% confidence interval ± 0.14. Participants 18 years or older were enrolled on days when a designated study team member was present in the clinic. There were no other exclusion criteria.
All patients had their airway exam photographed and uploaded into the electronic medical record. The Mallampati exam was photographed with the patient sitting in a chair with an iPad mini 2 (Apple Inc., Cupertino, CA, USA) level with and within 6 inches of the mouth. Maximal neck ROM without pain was photographed from a profile view. The patient's ability to prognath was also photographed from profile view. Further description of how the images were obtained has been previously described [11]. All of the images were later reviewed by three independent investigators (a certified registered nurse anesthetist with 10 years of experience, a junior faculty member, and a senior faculty member) who recorded the Mallampati score, whether the ROM was full or limited, and whether the patient was able to bite greater than, less than, or at the vermillion border so an evaluation of interrater variability could be performed. The investigator also recorded if there was a reasonable possibility that the patient would be difficult to intubate.
A single investigator, not involved with evaluating the pictures, extracted the following data from the written preassessment documented in the electronic medical record: age, gender, body mass index, Mallampati score, mouth opening greater than 3 finger breadths, thyromental distance greater than 6 cm, whether the patient had full neck ROM, and the presence of dentures or missing teeth. Three investigators (a certified registered nurse anesthetist with 10 years of experience, a junior faculty member, and a senior faculty member) reviewed all the extracted data to determine whether they predicted that there was a reasonable possibility that the patient would be difficult to intubate. The single investigator also extracted the laryngoscopic view and the method of intubation for each patient from the electronic anesthesia record. Correlations between airway exam and anesthetic management were considered secondary outcomes.
All analyses were conducted in JMP Pro 13.0 (SAS Institute Inc., Cary, NC, USA). Categorical measures were summarized by percentages and continuous measures were summarized by means and standard deviations. Associations between categorical measures were assessed with chi-square analyses. p < 0.05 was considered statistically significant after correction for multiple comparisons [12]. Interrater agreement, from all of the photographs, Mallampati score, neck ROM, and rater prediction of difficulty with intubation (also from preassessment), was assessed by the Kappa coefficient. Kappa was also used to evaluate agreement between the preassessment and rater values from photographs. For the Kappa coefficient, 0 to 0.20 indicated only slight agreement, 0.21 to 0.40 indicated fair agreement, 0.41 to 0.60 indicated moderate agreement, 0.61 to 0.80 indicated substantial agreement, and 0.81 to 1.0 indicated almost perfect or perfect agreement [13].

Results
A total of 250 patients were enrolled in the study, none of whom withdrew. Seventeen patients had an incomplete Mallampati exam in the electronic medical record, so those charts were not reviewed for perceived possible difficulty with intubation. One patient did not have a complete set of pictures, so they were not included in the pictures assessed by the reviewers. One hundred and twenty-nine patients out of 131 had a Cormack-Lehane assessment of the laryngeal structures under direct laryngoscopy. Eleven of 13 patients had a Cormack-Lehane assessment recorded after video laryngoscopy. Table 1 reports additional characteristics recorded from the electronic medical record.  Table 2 compares rater differences in Mallampati scores, neck ROM, and rater prediction of difficulty with intubation, based on photographs. From chi-square analyses, there were significant differences (p < 0.05) in all measures across raters, indicating that looking at the same photographs, raters had varied assessments and predictions about the patients. Tables 3 and 4 report the association between Mallampati scores with the method of intubation and laryngoscopic view and for Mallampati score obtained from preassessment and those obtained from each rater from photographs. There were no statistically significant associations (p > 0.05) between Mallampati scores from the preassessment ( Table 3) or reviews of photographs (Table 4) and the method of intubation or laryngoscopic view observed in the patient. There was also no statistically significant association between Mallampati score and the use of video laryngoscopy for the intubation method.  Interrater agreement was inconsistent and was mostly in fair to good range (Table 5, all Kappa < 0.70) for Mallampati scores, neck ROM, and rater prediction of difficulty with intubation based on photographs. Interrater agreement for rater prediction of difficulty with intubation based on preassessment information was also less than 0.70 (Table 5). Table 6 shows the agreement between the reviewers' evaluation of the preassessment and the photographs. There was only slight agreement between Mallampati scores and neck ROM. Similarly, intrarater agreement for rater prediction of difficulty with intubation between prediction based on preassessment and prediction based on photographs was also low. None of the recorded variables fell into the substantial or almost perfect agreement range among all raters (Tables 5 and 6).   CI-confidence interval.

Discussion
The incidence of difficult intubation varies in the literature, but it is in the range of 1% to 10%. Complications associated with difficult or failed intubation range from mild to serious complications, such as death or brain damage [14,15]. Accurate prediction of difficult intubation could potentially reduce these complications but remains a challenge, even among experienced airway providers. Many simple and easy to perform bedside tests exist; however, studies of individual tests as well as combinations of tests have failed to demonstrate adequate positive or negative predictive value [4,5,7,8].
Our study demonstrated a range of Mallampati scores similar to those found in other studies, with over 50% of patients having a ranking of Class 1 or 2. The Mallampati class rating did not appear to correlate or predict the view seen at the time of laryngoscopy nor the method of intubation chosen by the provider, supporting its lack of sensitivity for predicting intubation difficulty. Poor interrater reliability has been demonstrated for many of the airway tests used for assessment, so it is not surprising that our study demonstrated this result as well [5,16,17]. While our study was novel in that photographs were used for airway assessment, the interrater reliability did not improve. Interrater variability has been accepted as a major limitation of the Mallampati test, and if interrater variability was reduced, this may increase its value in the preoperative exam.
Phonation during performance of the Mallampati test has been reported to increase the false negative rate of the test [18,19]. One advantage of using a photograph to assess Mallampati score, assuming that the patient was not asked to phonate during the photo, is that this variable can be removed. All of the providers in the study who obtained the airway photographs were instructed not to have the patient phonate during the Mallampati exam, but we did not audit for compliance with these instructions.
There was no agreement between the assessments made using a photograph versus the preoperative airway assessment documented in the electronic medical record. One weakness of the study is that the source of the documented airway assessment in the record is unknown: whether the assessment was made from the photographs alone, by the provider examining the patient in the preoperative clinic, or by the provider consenting the patient on the day of surgery is unclear. The lack of agreement could be due to this uncertainty but it could also support the findings that assessments vary among raters. Other variables that could impact the Mallampati class assessment are patient position or provider level of training. The photographs were taken with patients sitting upright, while day-of-surgery airway assessments are usually performed with the patient lying supine on a stretcher. Several studies have compared Mallampati class in the supine versus upright position and found that differences in position can affect the results. Two studies, one by Khan et al. and another by Singhal et al., found that the upright position improved Mallampati class compared to the supine position [4,20].
The level of training may also impact the ability to predict intubation difficulty using preoperative airway assessment tests. A study by Celebi compared difficult airway prediction among anesthesia residents in different years of residency and found significant interobserver variability, especially with the use of the Mallampati test and assessment of mouth opening [21].
Our study has several other weaknesses. We used a low number of raters and small sample size. It is possible that with a larger number of raters as well as patients, the results might have been different. The electronic medical record documentation of airway assessment as well as airway management was sometimes incomplete, which could have affected our results as well. It is unknown whether the airway assessment documented on the day of surgery was a de novo evaluation by the anesthesia providers caring for the patient Cent. Eur. Ann. Clin. Res. 2020, 2(1), 3; 10.35995/ceacr2010003 8 or a repeat evaluation of the photographs in the electronic record. There was still significant variability in difficult airway prediction, regardless of whether the assessment was from a preoperative exam versus a photograph.

Conclusions
Is a picture worth a thousand words? Is airway assessment from a photograph as opposed to a dynamic airway assessment via a physical exam superior? This remains unclear. Poor lighting can influence a photographic result, but it can also affect an airway assessment by a physical exam. The fact remains that no single airway assessment test or combination of tests have been proved to be sensitive or specific enough for predicting airway difficulty [22].
The value of the tests themselves to predict a difficult airway may lie in the focus on clinical examination and the thought process generated by the encounter. In a provocative but still relevant editorial titled "Predicting difficult intubation: worthwhile exercise or pointless ritual?", Yentis argues that although attempting to predict intubation difficulty is unlikely to be accurate, it does force the anesthesia provider to examine and assess the airway before formulating an airway management plan [23].
Based on a review of the literature and our findings, it is evident that accurate prediction of airway difficulty is nuanced and remains a challenge. The evidence for photographs of the airway assessment being superior to bedside evaluation is lacking since both demonstrate low specificity, sensitivity, and interrater agreement. However, photographic documentation of airway assessment tests may allow for airway evaluation prior to the day of surgery.

Conflicts of Interest:
The authors declare no conflict of interest.