AI chatbots like ChatGPT, Gemini, Copilot, Claude, and Perplexity are making their way into healthcare conversations—but how accurate are they when it comes to back pain?
A new 2025 study by Rossettini and colleagues tested these chatbots against clinical practice guidelines for lumbosacral radicular pain. The results? None of them fully aligned. Perplexity scored the highest at 67%, Gemini followed at 63%, Copilot hit 44%, and ChatGPT-3.5, ChatGPT-4o, and Claude only reached 33%.
That means a large portion of chatbot advice could be misleading or even harmful if taken at face value. The big takeaway: AI has potential, but it’s not yet ready to replace clinical reasoning. To truly help patients, these systems need to be trained on best evidence and clinical guidelines—not just the internet.
👩⚕️ As clinicians, our job is to guide patients with expertise and evidence—not algorithms.
📚 Reference: Rossettini et al. (2025) Frontiers in Digital Health - https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2025.1574287/full
👉 Subscribe for more insights on pain science, evidence-based practice, and the future of AI in healthcare.