Can ChatGPT Replace Human Doctors? Not Yet, According to Recent Studies
ChatGPT, an AI chatbot known for its conversational prowess, demonstrates insufficient accuracy in making medical diagnoses, with a recent study showing it gets diagnoses right less than half of the time. The study conducted by Dr. Amrit Kirpalani and his team revealed that out of 150 complex case studies from Medscape, GPT 3.5, which powered ChatGPT at its 2022 launch, made accurate diagnoses in only 49% of cases.
Earlier research highlighted ChatGPT’s ability to barely pass the United States Medical Licensing Exam (USMLE), a milestone in artificial intelligence development. However, the study published in PLOS ONE emphasizes the potential danger of relying on the chatbot for serious medical cases, as it struggles particularly in scenarios that require nuanced human judgment.
AI: Broader Pattern Recognition Versus Specialized Medical Knowledge
ChatGPT's functionality is rooted in training data obtained from Common Crawl, consisting of extensive text data from various sources such as books and online articles. This enables the AI to provide responses based on pattern recognition. While useful for educational purposes, the chatbot frequently shows tendencies toward "hallucination" by creating entirely fabricated responses, thus limiting its reliability in clinical diagnosis. This is prepared by SSP.
During the study, the researchers challenged ChatGPT with patient histories, physical examination findings, and lab images, testing it against trainees’ typical diagnostic tasks. Although ChatGPT could produce well-structured and clear treatment plans, it often faltered by providing incorrect diagnoses 76 out of 150 times. This inadequacy was attributed mainly to its generalized learning dataset, which lacks the depth required for interpreting complex clinical cases accurately.
ChatGPT Versus Specialized Medical AI Tools
In contrast to ChatGPT, specialized medical AI systems, such as Google’s Articulate Medical Intelligence Explorer (AMIE), have demonstrated superior diagnostic capabilities in research settings. AMIE, trained on targeted medical datasets and case studies, has outperformed human doctors in diagnosing from sourced material in medical journals. This highlights the critical role of domain-specific training data, which allows these specialized AIs to grasp intricate medical details better than a broadly trained language model like ChatGPT.
Potential and Limitations
Despite its shortcomings, ChatGPT may still support medical education by simplifying complex topics for learners. According to co-author Edward Tran, medical students increasingly utilize ChatGPT to organize notes and clarify diagnostic algorithms, albeit under the watchful eye of educators.
Dr. Kirpalani sees potential uses for AI in streamlining administrative tasks and enhancing clinical decision-making, provided these systems go through substantial data-driven training and rigorous oversight. Nevertheless, while evolving AI remains a topic of interest, Kirpalani cautions against the general public using ChatGPT for medical advice due to the risk of misinformation.
In conclusion, ChatGPT holds promise in specific educational contexts and lower-stakes scenarios but remains unfit to replace human physicians for complex diagnosis and management. For now and foreseeable future, tools like ChatGPT should be used to augment rather than replace human medical expertise, especially in scenarios where precise medical discernment is crucial.