Skip to main page content

Comparative Performance of Artificial Intelligence Chatbots in Triage of Ophthalmic Conditions - 5799

My Session Status

When:
4:57 PM, Friday 20 Jun 2025 (5 minutes)
Author’s Name(s): Daniel Chow, Raj Pathak, Michael Cho, Sidratul Rahman, Alex Lau, Maya Alik, Sheetal Pundir, Raheem Remtulla, Hady Saheb

Author’s Disclosure Block: Daniel Chow: none; Raj Pathak: none; Michael Cho: none; Sidratul Rahman: none; Alex Lau: none; Maya Alik: none; Sheetal Pundir: none; Raheem Remtulla: none; Hady Saheb: none

Abstract Body
Purpose: Maximizing the efficiency of patient triage is a challenge frequently faced by highvolume ophthalmology departments. Prioritizing patients is based on a number of factors, such as their medical history, current symptoms, and resource availability. Ophthalmologists could better manage patient flow and determine when more resources are needed with the help of accurate and timely triage. Artificial intelligence advances in recent times have yielded sophisticated Large Language Models (LLMs). This research evaluates and compares the potential of different LLMs to enhance ophthalmology triage processes by predicting appropriate triage decisions across various metrics. Study Design: Retrospective analysis of prospectively collected data Methods: This prospective study at the McGill Academic Eye Institute compared the triage capabilities of three leading Large Language Models (LLMs), Claude 3.5 (Anthropic), GPT-4o (OpenAI), and Gemini Advanced (Google) using 50 derived clinical consult vignettes and images. Vignettes included scanned hand-written notes under various lighting conditions and clinical images. LLMs were tested with a standardized prompt aided by a list of abbreviations, optimized through task chaining, few-shot learning, and task decomposition. Inputs were transcribed notes and images, and outputs were consulting subspecialist and appointment timeframes. Two clinical raters established a consensus gold standard. Main outcomes were accuracy of subspecialist selection and appointment urgency determination. Over- and under-triage were assessed by comparing LLM outputs to clinicians' consensus. Results: We evaluated the performance of three leading Large Language Models (LLMs), Claude 3.5 (Anthropic), GPT-4o (OpenAI), and Gemini Advanced (Google) on 50 ophthalmology clinical vignettes using our optimized prompt. With error rates of 8% for choosing a subspecialty and 12% for suggesting a timeline, Claude 3.5 showed the best performance. Comparable outcomes were obtained by GPT-4o, which achieved error rates of 14% for both subspecialty and timeline selection. Gemini Advanced performed worse than its rivals, with higher error rates of 38% for subspecialty selection and 24% for timeline. These findings suggest that when it came to correctly assessing ophthalmology clinical vignettes, Claude 3.5 and GPT-4o performed better than Gemini Advanced, with Claude 3.5 having a marginal advantage. Conclusion: In 50 derived consult vignettes and hand-written consults, the large language model Claude 3.5 performed best in ophthalmology triage. It accurately identified subspecialists and appointment timeframes with the lowest error rate. Theoretically, ophthalmologists could optimize patient care and improve current triage procedures by using such a model.

My Session Status

Send Feedback

Session detail
Allows attendees to send short textual feedback to the organizer for a session. This is only sent to the organizer and not the speakers.
To respect data privacy rules, this option only displays profiles of attendees who have chosen to share their profile information publicly.

Changes here will affect all session detail pages