Hybrid human-AI diagnoses increase accuracy in medicine

by | Jun 23, 2025 | Digitization, Health, Research

An international research team led by the Max Planck Institute for Human Development has shown for the first time that the combination of human expertise and artificial intelligence (AI) leads to significantly more accurate medical diagnoses. The study, conducted with partners from the Human Diagnosis Project (San Francisco) and the CNR-ISTC (Rome), underlines the strength of hybrid diagnostic collectives that combine human specialists and AI models such as ChatGPT-4 or Claude 3.

Collective intelligence in medicine | Copyright: MPI for Human Development
Collective intelligence in medicine | Copyright: MPI for Human Development

Diagnostic errors are a central problem in medicine. AI systems offer support, but make different mistakes than humans, for example through “hallucinations” or bias. The researchers analyzed over 2,100 realistic clinical case vignettes and compared more than 40,000 diagnoses made by doctors and five AI models. The results show: AI collectives often outperform human diagnosticians, but humans shine in cases where AI fails. Hybrid collectives that combine both achieve the highest accuracy, as human and machine errors balance each other out in a complementary way.

Even adding an AI model to a group of diagnosticians significantly improves the results, especially for complex questions. The study, part of the EU-funded HACID project, sees great potential for hybrid collectives, for example in regions with limited access to medical care or in other areas such as climate policy.

The limitations of the study lie in the use of text-based vignettes instead of real patients and the focus on diagnoses, not treatments. Acceptance, ethical issues and risks such as bias also need to be examined further. Nevertheless, the researchers emphasize that hybrid human-AI collectives can increase patient safety and make healthcare fairer.

Original Paper:

Human-AI collectives most accurately diagnose clinical vignettes | PNAS

Read also:

INQUIRED: “Surprisingly, ChatGPT replicates common stereotypes” – MedLabPortal


Editorial office: X-Press Journalistenbüro GbR

Gender note. The personal designations used in this text always refer equally to female, male and diverse persons. Double/triple references and gendered designations are avoided in favor of better readability.