Evaluation and Comparison of Current Fetal Ultrasound Image Segmentation Methods for Biometric Measurements: A Grand Challenge
AUTHORS: Rueda S, Fathima S, Knight CL, Yaqub M, Papageorghiou AT, Rahmatullah B, Foi A, Maggioni M, Pepe A, Tohka J, Stebbing R, Mcmanigle J, Ciurte A, Bresson, X, Bach Cuadra M, Sun C, Ponomarev G, Gelfand M, Kazanov M, Wang C, Chen H, Peng C, Hung C, Noble A
IEEE Transactions on Medical Imaging, 33(4): 797 - 813, August 2013
This paper presents the evaluation results of the methods submitted to Challenge US: Biometric Measurements from Fetal Ultrasound Images, a segmentation challenge held at the IEEE International Symposium on Biomedical Imaging 2012. The challenge was set to compare and evaluate current fetal ultrasound image segmentation methods. It consisted of automatically segmenting fetal anatomical structures to measure standard obstetric biometric parameters, from 2D fetal ultrasound images taken on fetuses at different gestational ages (21 weeks, 28 weeks, and 33 weeks) and with varying image quality to reflect data encountered in real clinical environments. Four independent sub-challenges were proposed, according to the objects of interest measured in clinical practice: abdomen, head, femur, and whole fetus. Five teams participated in the head sub-challenge and two teams in the femur sub-challenge, including one team who tackled both. Nobody attempted the abdomen and whole fetus sub-challenges. The challenge goals were two-fold and the participants were asked to submit the segmentation results as well as the measurements derived from the segmented objects. Extensive quantitative (region-based, distance-based, and Bland-Altman measurements) and qualitative evaluation was performed to compare the results from a representative selection of current methods submitted to the challenge. Several experts (three for the head sub-challenge and two for the femur sub-challenge), with different degrees of expertise, manually delineated the objects of interest to define the ground truth used within the evaluation framework. For the head sub-challenge, several groups produced results that could be potentially used in clinical settings, with comparable performance to manual delineations. The femur sub-challenge had inferior performance to the head sub-challenge due to the fact that it is a harder segmentation problem and that the techniques presented relied more on the femur’s appearance.