Passer au contenu de la page principale

Comparative Performance of Artificial Intelligence Chatbots in Triage of Ophthalmic Conditions - 5799

Mon statut pour la session

Quand:
4:57 PM, Vendredi 20 Juin 2025 (5 minutes)
Author’s Name(s): Daniel Chow, Raj Pathak, Michael Cho, Sidratul Rahman, Alex Lau, Maya Alik, Sheetal Pundir, Raheem Remtulla, Hady Saheb

Author’s Disclosure Block: Daniel Chow: none; Raj Pathak: none; Michael Cho: none; Sidratul Rahman: none; Alex Lau: none; Maya Alik: none; Sheetal Pundir: none; Raheem Remtulla: none; Hady Saheb: none

Abstract Body
Purpose: Maximizing the efficiency of patient triage is a challenge frequently faced by highvolume ophthalmology departments. Prioritizing patients is based on a number of factors, such as their medical history, current symptoms, and resource availability. Ophthalmologists could better manage patient flow and determine when more resources are needed with the help of accurate and timely triage. Artificial intelligence advances in recent times have yielded sophisticated Large Language Models (LLMs). This research evaluates and compares the potential of different LLMs to enhance ophthalmology triage processes by predicting appropriate triage decisions across various metrics. Study Design: Retrospective analysis of prospectively collected data Methods: This prospective study at the McGill Academic Eye Institute compared the triage capabilities of three leading Large Language Models (LLMs), Claude 3.5 (Anthropic), GPT-4o (OpenAI), and Gemini Advanced (Google) using 50 derived clinical consult vignettes and images. Vignettes included scanned hand-written notes under various lighting conditions and clinical images. LLMs were tested with a standardized prompt aided by a list of abbreviations, optimized through task chaining, few-shot learning, and task decomposition. Inputs were transcribed notes and images, and outputs were consulting subspecialist and appointment timeframes. Two clinical raters established a consensus gold standard. Main outcomes were accuracy of subspecialist selection and appointment urgency determination. Over- and under-triage were assessed by comparing LLM outputs to clinicians' consensus. Results: We evaluated the performance of three leading Large Language Models (LLMs), Claude 3.5 (Anthropic), GPT-4o (OpenAI), and Gemini Advanced (Google) on 50 ophthalmology clinical vignettes using our optimized prompt. With error rates of 8% for choosing a subspecialty and 12% for suggesting a timeline, Claude 3.5 showed the best performance. Comparable outcomes were obtained by GPT-4o, which achieved error rates of 14% for both subspecialty and timeline selection. Gemini Advanced performed worse than its rivals, with higher error rates of 38% for subspecialty selection and 24% for timeline. These findings suggest that when it came to correctly assessing ophthalmology clinical vignettes, Claude 3.5 and GPT-4o performed better than Gemini Advanced, with Claude 3.5 having a marginal advantage. Conclusion: In 50 derived consult vignettes and hand-written consults, the large language model Claude 3.5 performed best in ophthalmology triage. It accurately identified subspecialists and appointment timeframes with the lowest error rate. Theoretically, ophthalmologists could optimize patient care and improve current triage procedures by using such a model.

Daniel Chow

Conférencier.ère

Mon statut pour la session

Évaluer

Detail de session
Pour chaque session, permet aux participants d'écrire un court texte de feedback qui sera envoyé à l'organisateur. Ce texte n'est pas envoyé aux présentateurs.
Afin de respecter les règles de gestion des données privées, cette option affiche uniquement les profils des personnes qui ont accepté de partager leur profil publiquement.

Les changements ici affecteront toutes les pages de détails des sessions