An Ensemble of Deep Convolutional Neural Networks is More Accurate and Reliable than Board-certified Ophthalmologists at Detecting Multiple Diseases in Retinal

My Session Status

What:

Paper Presentation | Présentation d'article

Part of:

Retina III: Deep Retina – Plug Those Leaks and Keep It Flowing

When:

5:05 PM, Friday 16 Jun 2023 (3 minutes)

Where:

Québec City Convention Centre - Room 307 AB | Salle 307 AB

Authors: Jovi Chau-Yee Wong ¹, Prashant U. Pandey², Brian G. Ballios¹, Panos G. Christakis¹, Alexander J. Kaplan¹, David J. Mathew¹, Stephan Ong Tone¹, Michael J. Wan¹, Jonathan A. Micieli¹. ¹Department of Ophthalmology and Vision Sciences, University of Toronto, ²School of Biomedical Engineering, University of British Columbia.

Author Disclosures: J.C. Wong: None. P.U. Pandey: None. B.G. Ballios: None. P.G. Christakis: None. A.J. Kaplan: None. D.J. Mathew: None. S. Ong Tone: None. M.J. Wan: None. J.A. Micieli: None.

Abstract Body:

Purpose: To develop an algorithm to classify common retinal pathologies accurately and reliably from fundus photographs and to validate its performance against human experts.

Study Design: We performed a prospective comparative evaluation of a diagnostic technology and compared it against human performance.

Methods: We trained a deep convolutional ensemble (DCE), an ensemble of five convolutional neural networks (CNNs), to classify retinal fundus photographs into the four classes. Image data included 43,055 fundus images from 12 public datasets, consisting of samples of diabetic retinopathy (DR), glaucoma, age-related macular degeneration (AMD), and normal eyes. The CNN architecture was based on the InceptionV3 model, and initial weights were pre-trained on the ImageNet dataset. Five trained ensembles were then tested on an ‘unseen’ set of 100 images. Seven board-certified ophthalmologists were asked to classify these test images. We measured classification performance through accuracy, F1-score, positive predictive value (PPV), sensitivity, and specificity. Reliability was measured through the agreement between confidence and accuracy of predictions.

Results: Board-certified ophthalmologists achieved a mean accuracy of 72.7% (SD: 6.0%) over all classes, while the DCE achieved a greater mean accuracy of 79.2% (SD: 2.3%, p = 0.03). The DCE also achieved a greater mean PPV ( p = 0.0005), sensitivity ( p = 0.03), specificity ( p = 0.03), and F1-score ( p = 0.02) than ophthalmologists over all classes. When performing analysis based on each class, the DCE had a statistically significant higher mean F1-score for DR classification compared to the ophthalmologists (76.8% vs. 57.5%; p = 0.01), and greater but statistically non-significant mean F-scores for glaucoma (83.9% vs. 75.7%; p = 0.10), AMD (85.9% vs. 85.2%; p = 0.69), and normal eyes (73.0% vs. 70.5%; p = 0.39). We also found that the DCE had better reliability than the ophthalmologists, with a greater mean agreement between accuracy and confident of 81.6% vs. 70.3% ( p < 0.001).

Conclusions: We developed a deep learning model and found that it could more accurately and more reliably classify four categories of fundus images compared to board-certified ophthalmologists. This work provides proof of principle that an AI algorithm is capable of accurate and reliable recognition of multiple retinal diseases using fundus photographs only.

Jovi Chau-Yee Wong

Presenter

An Ensemble of Deep Convolutional Neural Networks is More Accurate and Reliable than Board-certified Ophthalmologists at Detecting Multiple Diseases in Retinal

My Session Status

My Session Status

Send Feedback

Session detail