TY - JOUR AU - Jonathan Ganz AU - Christian Marzahl AU - Jonas Ammeling AU - Emely Rosbach AU - Barbara Richter AU - Chloé Puget AU - Daniela Denk AU - Elena Demeter AU - Flaviu Tăbăran AU - Gabriel Wasinger AU - Karoline Lipnik AU - Marco Tecilla AU - Matthew Valentine AU - Michael Dark AU - Niklas Abele AU - Pompei Bolfa AU - Ramona Erber AU - Robert Klopfleisch AU - Sophie Merz AU - Taryn Donovan AU - Samir Jabari AU - Christof Bertram AU - Katharina Breininger AU - Marc Aubreville AB - Abstract The count of mitotic figures (MFs) observed in hematoxylin and eosin (H&E)-stained slides is an important prognostic marker, as it is a measure for tumor cell proliferation. However, the identification of MFs has a known low inter-rater agreement. In a computer-aided setting, deep learning algorithms can help to mitigate this, but they require large amounts of annotated data for training and validation. Furthermore, label noise introduced during the annotation process may impede the algorithms’ performance. Unlike H&E, where identification of MFs is based mainly on morphological features, the mitosis-specific antibody phospho-histone H3 (PHH3) specifically highlights MFs. Counting MFs on slides stained against PHH3 leads to higher agreement among raters and has therefore recently been used as a ground truth for the annotation of MFs in H&E. However, as PHH3 facilitates the recognition of cells indistinguishable from H&E staining alone, the use of this ground truth could potentially introduce an interpretation shift and even label noise into the H&E-related dataset, impacting model performance. This study analyzes the impact of PHH3-assisted MF annotation on inter-rater reliability and object level agreement through an extensive multi-rater experiment. Subsequently, MF detectors, including a novel dual-stain detector, were evaluated on the resulting datasets to investigate the influence of PHH3-assisted labeling on the models’ performance. We found that the annotators’ object-level agreement significantly increased when using PHH3-assisted labeling (F1: 0.53 to 0.74). However, this enhancement in label consistency did not translate to improved performance for H&E-based detectors, neither during the training phase nor the evaluation phase. Conversely, the dual-stain detector was able to benefit from the higher consistency. This reveals an information mismatch between the H&E and PHH3-stained images as the cause of this effect, which renders PHH3-assisted annotations not well-aligned for use with H&E-based detectors. Based on our findings, we propose an improved PHH3-assisted labeling procedure. BT - Scientific Reports DO - 10.1038/s41598-024-77244-6 M1 - 1 N2 - Abstract The count of mitotic figures (MFs) observed in hematoxylin and eosin (H&E)-stained slides is an important prognostic marker, as it is a measure for tumor cell proliferation. However, the identification of MFs has a known low inter-rater agreement. In a computer-aided setting, deep learning algorithms can help to mitigate this, but they require large amounts of annotated data for training and validation. Furthermore, label noise introduced during the annotation process may impede the algorithms’ performance. Unlike H&E, where identification of MFs is based mainly on morphological features, the mitosis-specific antibody phospho-histone H3 (PHH3) specifically highlights MFs. Counting MFs on slides stained against PHH3 leads to higher agreement among raters and has therefore recently been used as a ground truth for the annotation of MFs in H&E. However, as PHH3 facilitates the recognition of cells indistinguishable from H&E staining alone, the use of this ground truth could potentially introduce an interpretation shift and even label noise into the H&E-related dataset, impacting model performance. This study analyzes the impact of PHH3-assisted MF annotation on inter-rater reliability and object level agreement through an extensive multi-rater experiment. Subsequently, MF detectors, including a novel dual-stain detector, were evaluated on the resulting datasets to investigate the influence of PHH3-assisted labeling on the models’ performance. We found that the annotators’ object-level agreement significantly increased when using PHH3-assisted labeling (F1: 0.53 to 0.74). However, this enhancement in label consistency did not translate to improved performance for H&E-based detectors, neither during the training phase nor the evaluation phase. Conversely, the dual-stain detector was able to benefit from the higher consistency. This reveals an information mismatch between the H&E and PHH3-stained images as the cause of this effect, which renders PHH3-assisted annotations not well-aligned for use with H&E-based detectors. Based on our findings, we propose an improved PHH3-assisted labeling procedure. PY - 2024 EP - 26273 T2 - Scientific Reports TI - Information mismatch in PHH3-assisted mitosis annotation leads to interpretation shifts in H&E slide analysis UR - https://www.nature.com/articles/s41598-024-77244-6 VL - 14 SN - 2045-2322 ER -