Neurofibromatosis Tumor Segmentation on Whole-body MRI (WBMRI-NF) Challenge

Evaluation & Ranking

The model performance will be evaluated in terms of segmentation masks and volumetric tumor volume of closed test dataset. Both detection and segmentation metrics will be used to assess a model:

Detection metrics: Balanced accuracy, F1-Score, and Area under curve (AUC), will be used to assess the detection performance. A detection of a tumor is defined if the segmentation mask covers more than 50% of the tumor mask.

Segmentation metrics: Dice Similarity Coefficient (DSC), and normalized surface dice (NSD), will be used to assess the segmentation performance.

Only the last run of the submitted Docker container is officially counted to rank challenge results.

Overall Ranking: 1) Rank average score of detection and segmentation performance. 2) Using bootstrapping and Bayes factor calculation to determine ties. 3) Wilcoxon signed-rank test will be used to confirm the final ranking.

Detection Performance will be ranked in terms of the detection metrics. Segmentation Performance will be ranked in terms of the segmentation metrics.