The Medical Segmentation Decathlon
Jun 1, 2021ยท,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ยท
0 min read
Michela Antonelli
Annika Reinke
Spyridon Bakas
Keyvan Farahani
AnnetteKopp-Schneider
Bennett A. Landman
Geert Litjens
Bjoern Menze
Olaf Ronneberger
Ronald M. Summers
Bram Van Ginneken
Michel Bilello
Patrick Bilic
Patrick F. Christ
Richard K. G. Do
Marc J. Gollub
Stephan H. Heckers
William R. Jarnagin
Maureen K. McHugo
Sandy Napel
Jennifer S. Goli Pernicka
Kawal Rhode
Catalina Tobon-Gomez
Eugene Vorontsov
Henkjan Huisman
James A. Meakin
Sebastien Ourselin
Manuel Wiesenfarth
Pablo Arbelaez
Byeonguk Bae
Sihong Chen
Laura Daza
Jianjiang Feng
Baochun He
Fabian Isensee
Yuanfeng Ji
Fucang Jia
Namkug Kim
Ildoo Kim
Dorit Merhof
Akshay Pai
Beomhee Park
Mathias Perslev
Ramin Rezaiifar
Oliver Rippel
Ignacio Sarasua
Wei Shen
Jaemin Son
Christian Wachinger
Liansheng Wang
Yan Wang
Yingda Xia
Daguang Xu
Zhanwei Xu
Yefeng Zheng
Amber L. Simpson
Lena Maier-Hein
M. Jorge Cardoso
Abstract
International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts.
Type
Publication
arXiv:2106.05735