Passer au contenu de la page principale

An Ensemble of Deep Convolutional Neural Networks is More Accurate and Reliable than Board-certified Ophthalmologists at Detecting Multiple Diseases in Retinal

Quoi:
Paper Presentation | Présentation d'article
Quand:
5:05 PM, Vendredi 16 Juin 2023 (3 minutes)
Où:
Centre des congrès de Québec - Room 307 AB | Salle 307 AB
Comment:

Author Block: Jovi Chau-Yee Wong  1, Prashant U. Pandey2, Brian G. Ballios1, Panos G. Christakis1, Alexander J. Kaplan1, David J. Mathew1, Stephan Ong Tone1, Michael J. Wan1, Jonathan A. Micieli11Department of Ophthalmology and Vision Sciences, University of Toronto, 2School of Biomedical Engineering, University of British Columbia.

Author Disclosure Block: J.C. Wong:   None.  P.U. Pandey:   None.  B.G. Ballios:   None.  P.G. Christakis:   None.  A.J. Kaplan:   None.  D.J. Mathew:   None.  S. Ong Tone:   None.  M.J. Wan:   None.  J.A. Micieli:   None.

 

Abstract Title: An Ensemble of Deep Convolutional Neural Networks is More Accurate and Reliable than Board-certified Ophthalmologists at Detecting Multiple Diseases in Retinal Fundus Photographs

Abstract Body: Purpose:   To    develop an algorithm to classify common retinal pathologies accurately and reliably from fundus photographs and to validate its performance against human experts.   Study Design:   We performed a prospective comparative evaluation of a diagnostic technology and compared it against human performance.   Methods:   We trained a deep convolutional ensemble (DCE), an ensemble of five convolutional neural networks (CNNs), to classify retinal fundus photographs into the four classes. Image data included 43,055 fundus images from 12 public datasets, consisting of samples of diabetic retinopathy (DR), glaucoma, age-related macular degeneration (AMD), and normal eyes. The CNN architecture was based on the InceptionV3 model, and initial weights were pre-trained on the ImageNet dataset. Five trained ensembles were then tested on an ‘unseen’ set of 100 images. Seven board-certified ophthalmologists were asked to classify these test images. We measured classification performance through accuracy, F1-score, positive predictive value (PPV), sensitivity, and specificity. Reliability was measured through the agreement between confidence and accuracy of predictions.   Results:   Board-certified ophthalmologists achieved a mean accuracy of 72.7% (SD: 6.0%) over all classes, while the DCE achieved a greater mean accuracy of 79.2% (SD: 2.3%,  p  = 0.03). The DCE also achieved a greater mean PPV ( p  = 0.0005), sensitivity ( p  = 0.03), specificity ( p  = 0.03), and F1-score ( p  = 0.02) than ophthalmologists over all classes. When performing analysis based on each class, the DCE had a statistically significant higher mean F1-score for DR classification compared to the ophthalmologists (76.8% vs. 57.5%;  p  = 0.01), and greater but statistically non-significant mean F-scores for glaucoma (83.9% vs. 75.7%;  p  = 0.10), AMD (85.9% vs. 85.2%;  p  = 0.69), and normal eyes (73.0% vs. 70.5%;  p  = 0.39). We also found that the DCE had better reliability than the ophthalmologists, with a greater mean agreement between accuracy and confident of 81.6% vs. 70.3% ( p  < 0.001).   Conclusions:   We developed a deep learning model and found that it could more accurately and more reliably classify four categories of fundus images compared to board-certified ophthalmologists. This work provides proof of principle that an AI algorithm is capable of accurate and reliable recognition of multiple retinal diseases using fundus photographs only.

Présentateur.rice
University of Toronto
Resident Physician
Detail de session
Pour chaque session, permet aux participants d'écrire un court texte de feedback qui sera envoyé à l'organisateur. Ce texte n'est pas envoyé aux présentateurs.
Afin de respecter les règles de gestion des données privées, cette option affiche uniquement les profils des personnes qui ont accepté de partager leur profil publiquement.

Les changements ici affecteront toutes les pages de détails des sessions