Artificial Intelligence Assistance Significantly Improves Gleason Grading of Prostate Biopsies by Pathologists
Aug 1, 2020·,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,·
0 min read
Wouter Bulten
Maschenka Balkenhol
Jean-Joël Awoumou Belinga
Américo Brilhante
Aslı Çakır
Lars Egevad
Martin Eklund
Xavier Farré
Katerina Geronatsiou
Vincent Molinié
Guilherme Pereira
Paromita Roy
Günter Saile
Paulo Salles
Ewout Schaafsma
Joëlle Tschui
Anne-Marie Vos
Brett Delahunt
Hemamali Samaratunga
David J. Grignon
Andrew J. Evans
Daniel M. Berney
Chin-Chen Pan
Glen Kristiansen
James G. Kench
Jon Oxley
Katia R. M. Leite
Jesse K. McKenney
Peter A. Humphrey
Samson W. Fine
Toyonori Tsuzuki
Murali Varma
Ming Zhou
Eva Comperat
David G. Bostwick
Kenneth A. Iczkowski
Cristina Magi-Galluzzi
John R. Srigley
Hiroyuki Takahashi
Theo Van Der Kwast
Hester Van Boven
Robert Vink
Jeroen Van Der Laak
Christina Hulsbergen-Van Der Kaa
Geert Litjens
Abstract
The Gleason score is the most important prognostic marker for prostate cancer patients, but it suffers from significant observer variability. Artificial intelligence (AI) systems based on deep learning can achieve pathologist-level performance at Gleason grading. However, the performance of such systems can degrade in the presence of artifacts, foreign tissue, or other anomalies. Pathologists integrating their expertise with feedback from an AI system could result in a synergy that outperforms both the individual pathologist and the system. Despite the hype around AI assistance, existing literature on this topic within the pathology domain is limited. We investigated the value of AI assistance for grading prostate biopsies. A panel of 14 observers graded 160 biopsies with and without AI assistance. Using AI, the agreement of the panel with an expert reference standard increased significantly (quadratically weighted Cohen’s kappa, 0.799 vs. 0.872; p = 0.019). On an external validation set of 87 cases, the panel showed a significant increase in agreement with a panel of international experts in prostate pathology (quadratically weighted Cohen’s kappa, 0.733 vs. 0.786; p = 0.003). In both experiments, on a group-level, AI-assisted pathologists outperformed the unassisted pathologists and the standalone AI system. Our results show the potential of AI systems for Gleason grading, but more importantly, show the benefits of pathologist-AI synergy.
Type
Publication
Mod Pathol