• Pascal's Chatbot Q&As
  • Posts
  • Researchers analyzed 83 studies from 2018 to 2024 to compare how well genAI models perform in diagnosing medical conditions compared to human doctors. Overall, AI got it right about 52% of the time.

Researchers analyzed 83 studies from 2018 to 2024 to compare how well genAI models perform in diagnosing medical conditions compared to human doctors. Overall, AI got it right about 52% of the time.

Expert doctors were significantly better than AI (by ~16%). AI did better in some specialties, possibly because they involve visual pattern recognition, which AI is good at.

Question for ChatGPT-4o: Please read the paper "A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians" and tell me what it says in easy to understand language. Also, list the most surprising, controversial and valuable statements made in the paper.

ChatGPT-4o’s answer:
Here’s a summary of the paper “A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians”, along with a list of surprising, controversial, and valuable findings:

🧠 What the paper says in simple terms:

  • Researchers analyzed 83 studies from 2018 to 2024 to compare how well generative AI models (like GPT-4) perform in diagnosing medical conditions compared to human doctors.

  • Overall, AI got it right about 52% of the time.

  • When compared:

    • No major difference was found between AI and non-expert doctors.

    • Expert doctors were significantly better than AI (by ~16%).

  • Some newer AI models, like GPT-4o, Gemini 1.5 Pro, Claude 3, etc., did almost as well as non-expert doctors, though not quite enough to declare a win.

  • AI did better in some specialties (like dermatology and urology), possibly because they involve visual pattern recognition, which AI is good at.

  • However, many studies had a high risk of bias (76%), and lots of models didn’t clearly state what data they were trained on, raising questions about fairness and generalizability.

  • The authors suggest that AI could help in education (e.g., for med students or junior doctors) or augment carein low-resource settings—but it’s not ready to replace expert physicians.

😲 Most Surprising Findings:

  1. Generative AI models perform at about the same level as non-expert physicians.
    → That means in less complex cases or early-stage training, AI might be nearly as good as junior doctors.

  2. Some AI models outperformed non-expert doctors (though not statistically significant).
    → GPT-4, Claude 3, Perplexity, and others sometimes gave better answers than people!

  3. In specialties like dermatology and urology, AI accuracy was higher.
    → Likely because these areas involve lots of visual data—something AI can be very good at interpreting.

😬 Most Controversial Points:

  1. Generative AI had a low average accuracy of just 52%.
    → That’s concerning, especially if patients or clinics assume AI is more “intelligent” than it actually is.

  2. The vast majority of studies had a high risk of bias (76%).
    → Many didn't clearly separate training and test data, or lacked external validation—meaning the results might not hold up in real-world use.

  3. Lack of transparency in training data.
    → Most models didn’t disclose what medical data they were trained on, making it hard to judge fairness or limitations.

  4. Publication bias was detected.
    → There’s a chance that studies showing poor AI performance didn’t get published, skewing the overall results.

💡 Most Valuable Takeaways:

  1. AI is not ready to replace expert doctors, but it could be a great support tool—especially in medical education or low-resource settings.

  2. Generative AI has the potential to democratize access to diagnostic knowledge, especially where human experts are scarce.

  3. Transparency and rigorous testing are urgently needed to ensure AI tools are safe and reliable.

  4. AI's potential lies in collaboration, not replacement. The future might be “doctor + AI” rather than one or the other.