Dataset of Registered Hematoxylin–Eosin and Ki67 Histopathological Image Pairs Complemented by a Registration Algorithm

Petríková, Dominika; Cimrák, Ivan; Tobiášová, Katarína; Plank, Lukáš

doi:10.3390/data9080100

Open AccessData Descriptor

Dataset of Registered Hematoxylin–Eosin and Ki67 Histopathological Image Pairs Complemented by a Registration Algorithm

by

Dominika Petríková

^1,*

,

Ivan Cimrák

¹

,

Katarína Tobiášová

²

and

Lukáš Plank

²

¹

Cell-in-Fluid Biomedical Modelling & Computations Group, Faculty of Management Science and Informatics, University of Žilina, 01026 Žilina, Slovakia

²

Department of Pathological Anatomy, University Hospital Martin, Jessenius Faculty of Medicine, Comenius University, 036 01 Martin, Slovakia

^*

Author to whom correspondence should be addressed.

Data 2024, 9(8), 100; https://doi.org/10.3390/data9080100

Submission received: 29 May 2024 / Revised: 12 July 2024 / Accepted: 2 August 2024 / Published: 7 August 2024

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

In this work, we describe a dataset suitable for analyzing the extent to which hematoxylin–eosin (HE)-stained tissue contains information about the expression of Ki67 in immunohistochemistry staining. The dataset provides images of corresponding pairs of HE and Ki67 stainings and is complemented by algorithms for computing the Ki67 index. We introduce a dataset of high-resolution histological images of testicular seminoma tissue. The dataset comprises digitized histology slides from 77 conventional testicular seminoma patients, obtained via surgical resection. For each patient, two physically adjacent tissue sections are stained: one with hematoxylin and eosin, and one with Ki67 immunohistochemistry staining. This results in a total of 154 high-resolution images. The images are provided in PNG format, facilitating ease of use for image analysis compared to the original scanner output formats. Each image contains enough tissue to generate thousands of non-overlapping 224 × 224 pixel patches. This shows the potential to generate more than 50,000 pairs of patches, one with HE staining and a corresponding Ki67 patch that depicts a very similar part of the tissue. Finally, we present the results of applying a ResNet neural network for the classification of HE patches into categories according to their Ki67 label.

Dataset: https://doi.org/10.5281/zenodo.11218961

Dataset License: CC-BY-4.0

Keywords:

hematoxylin–eosin; Ki67; proliferation index; tissue registration; clustering; convolutional neural network

1. Summary

Deep learning has revolutionized digital pathology, offering robust tools for the automated analysis of histopathological images. By leveraging convolutional neural networks, researchers can achieve high accuracy in tasks such as tumor detection, tissue segmentation, and cellular classification. These models excel in recognizing complex patterns within whole slide images (WSIs), significantly reducing the manual workload of pathologists and enhancing diagnostic consistency. Advanced architectures, such as ResNet [1], U-Net, and Transformer-based models, have been particularly effective in improving the feature extraction and interpretation of histological features [2,3,4,5]. Additionally, deep learning techniques have been applied to predict molecular phenotypes and patient outcomes from morphological data, bridging the gap between histology and genomics [6]. Despite these advancements, challenges remain, including the need for large annotated datasets, computational resource demands, and ensuring model generalizability across diverse populations and staining variations [3]. Ongoing research is focused on addressing these issues, as well as integrating deep learning with other artificial intelligence techniques to further enhance the capabilities of digital pathology.

Hematoxylin–eosin (HE) staining, as shown in Figure 1a, is widely used as a universal basic tissue stain. It is the first step in evaluating various cancer types and is extensively used for primary diagnosis due to its simplicity and cost-effectiveness. HE staining provides basic morphological information, such as the shape of cells and tissues. In clinical practice, immunohistochemical (IHC) staining, illustrated in Figure 1b, is frequently employed to obtain the protein expression status for diagnosis confirmation and subtyping. IHC staining visualizes the expression of various proteins (e.g., Ki67, estrogen receptor) on the cell membrane or nucleus. It is often necessary to perform several different IHC stains to conduct a differential diagnosis and determine attributes such as histogenesis, molecular subtype, or proliferation rate.

Despite being a standard procedure, IHC staining has several limitations. It is expensive and highly dependent on tissue handling protocols, as the results are expressed through stain intensity, presence/absence of a stain, localization of staining, or the percentage of cells showing detectable stain intensity. Additionally, the interpretation of IHC results is visual and relies on the subjective assessment of pathologists, leading to inter-observer variability. Recent studies have shown a correlation between HE- and IHC-stained slides from the same region [7,8,9]. Consequently, it should be possible to model the relationship between the morphological information in HE slides and IHC information, predicting the expression of specific proteins directly from HE-stained slides without the additional IHC staining process [10]. This approach could prove to be time- and cost-efficient, or it could provide a second opinion in assessing IHC staining.

In this work, we provide a dataset of high-resolution (35,000 × 35,000 pixels on average) histological images suitable for the application of various machine learning (ML) methods, especially convolutional neural networks. The dataset consists of 77 conventional testicular seminoma patients’ histology samples obtained from the surgical resection of patients’ tumors. The tested series offers pairs of images of HE-stained tissue and Ki67 IHC-stained tissue, showing adjacent sections of the tissue, creating a total of 154 images. The high resolution of the images makes it possible to generate tens of thousands of patches of different sizes, which can form a large dataset. Despite the non-uniformity at the cellular level, since the sections are spatially adjacent, it is ensured that the tissue slides exhibit the same properties and characteristics. Another advantage of the dataset provided is the fact that all images are already converted to PNG format, which is very easy to work with and can be used for many image analyses, unlike the original scanner output formats. In addition to the image pairs themselves, Supplementary Data such as age, tumor stage, rete testis invasion, and lymphocytic infiltration are also recorded for each patient.

1.1. Classification of Histological Images

In recent years, deep neural networks have been increasingly utilized for various medical image analysis tasks [2]. This trend extends to the histological domain, where they are employed for tasks such as classifying tumor tissues and evaluating biomarkers to inform treatment planning.

In [11], the researchers introduced HE-HER2Net, an enhanced Xception network, by incorporating global average pooling, batch normalization, dropout, and dense layers with the Swish activation function. This network was designed to classify HE images into four categories of human epidermal growth factor receptor 2 (HER2) positivity, ranging from 0 to 3+. Beyond the standard model evaluation, the researchers compared HE-HER2Net to other existing architectures, reporting that it outperformed all in accuracy, precision, recall, and AUC score. The authors of [9] advanced the classification of breast cancer molecular status (estrogen receptor, progesterone receptor, HER2) directly from HE-stained histopathology images using deep learning techniques. They developed an innovative approach that combines neural networks with neural style transfer techniques to generate tissue “fingerprints”. These fingerprints are unique, high-dimensional representations of tissue images that maintain crucial morphological features despite variations in staining styles. The authors demonstrated that their method significantly improves the accuracy of predicting the ER, PR, and HER2 status of breast cancer tissues compared to traditional methods. Ref. [8] demonstrates that machine learning can accurately determine the molecular marker status from cellular morphology alone. The scholars developed a multiple-instance learning-based deep neural network to identify estrogen receptor status from HE-stained WSI. In [12], the researchers proposed a three-step method for classifying HER2 status in breast cancer tissues. Initially, they utilized a pre-trained UNet-based nucleus detector [13] to generate patches. Next, they trained a CNN to detect tumor nuclei and subsequently classify them as HER2-positive or HER2-negative.

1.2. Prediction of Ki67 Expression from HE Images

The dataset presented here is part of research aimed at predicting Ki67 expression directly from HE images, eliminating the need for additional IHC staining. Ki67 is a nuclear protein present in cancer cells and detectable only in actively proliferating cells [14]. This protein is absent in cells during their resting phase, indicating they are not growing. Consequently, elevated levels of Ki67 serve as an indicator of rapid cancer cell growth and division, making it a good marker of proliferation (rapid increase in the number of cells) [15].

In [10], the authors addressed the problem of determining the number of Ki67-positive cells from HE images for the treatment of several cancer types. The ResNet model was trained to differentiate between negative and positive cells in homogeneous regions, effectively classifying tissues as having either 0% or 100% positivity. In contrast, we aim to train the model to classify tissues into various positivity ratio intervals. This approach involves analyzing patches that encompass heterogeneous tissue regions containing both positive and negative cells. In seminomas, the Ki67 index typically exceeds 50%, although values below 20% have also been observed [16]. Notably, a high proliferation index in seminomas does not show a clear correlation with the clinical stage or the presence of distant metastases [17]. However, a specific study identified a significant inverse relationship between Ki67 expression exceeding 50% and rete testis invasion [18]. To facilitate machine learning on 77 pairs of HE- and Ki67-stained slides containing testicular seminoma samples, we established three categories for Ki67 expression: below 20%, 20–50%, and above 50%. The method employed for obtaining Ki67 annotations for HE patches is described in more detail in [19]. Applying clustering, the Ki67 scans were recolored into three dominant colors: brown, blue, and white. Then, the Ki67 positivity ratio was estimated from the number of pixels belonging to the colors mentioned above. In [20], we employed the presented dataset to train the ResNet18 model on both binary and multiclassification tasks. To evaluate the model performance, we divided the dataset into a training set and validation set, which comprised 10% of the extracted patches from the training set, allowing us to monitor training progress and validate the model on previously unseen data originating from tissues familiar to the model. The model achieved good performance in classifying HE patches into Ki67 index categories on both binary tasks and multiclass tasks with an accuracy of 0.775 and 0.789, respectively.

2. Data Description

The dataset contains 77 pairs of stained sections of testicular seminoma. Each pair is produced from neighboring sections of the tissue to ensure maximum similarity. The dataset comprises 154 PNG files: 77 files with the prefix “id_HE” contain images of HE staining, and 77 files with the prefix “id_Ki67” contain images of Ki67 staining. Both files in each pair (id_HE_string.png and id_Ki67_string.png) have the same resolution, averaging 35,000 × 35,000 pixels, and are registered. Image names include an additional random string for uniqueness. The data are available in 39 zipped files, each containing both HE and Ki67 image files grouped in fours (two pairs per zip file), except for the last file (39.zip), which contains only one pair.

2.1. Additional Data

Additional data include six columns for every pair of samples and are publicly available as Additional_data.xlsx, see Supplementary Materials Section.

Testicular tumors included in the dataset were radical orchiectomy resection specimens from male patients aged 27 to 61 years, with a median of 39.5 years and a multimodal distribution (modes 29, 46). The pTNM stage was assessed by a pathologist specialized in urogenital pathology according to valid guidelines. Tumors limited to the testis and epididymis without lymphovascular invasion were evaluated as pT1. The pT2 category included tumors defined the same way as pT1 but with lymphovascular invasion, or tumors extending through the tunica albuginea with involvement of the tunica vaginalis. Tumors infiltrating the spermatic cord regardless of lymphovascular invasion were placed into category pT3. Tumors extending into the scrotum were classified as pT4 [21]. Orchiectomy specimens in our cohort did not include lymph nodes biopsies; therefore, the N and M stage was marked as “NxMx” in every sample.

Rete testis invasion in the column “infiltration of rete testis” was evaluated independently from the tumor stage as an adverse prognostic factor which may account for higher rates of recurrence and distant metastasis even in the early-stage disease [22,23]. Lymphocytic infiltration (column “intensity of lymphocyte inflammatory reaction”) presented a characteristic histological feature of seminoma and was classified according to density into three categories: strong, moderate, and weak. No association was discovered between the density of inflammatory reaction and tumor stage or rete testis invasion.

In the column “Ki67 proliferation index (eyeballing method)”, we report the proliferation activity for samples evaluated by pathologists within areas of the highest density of positive staining (so-called hot spots). The column “laterality” refers to the laterality of the testis.

3. Methods

3.1. Image Acquisition

Seventy-seven testicular seminoma samples were sectioned into parallel formalin-fixed paraffin-embedded sections with a thickness of 3–4 micrometers. Hematoxylin–eosin (HE) staining was conducted using the Tissue-Tek Prisma^® Plus Automated Slide Stainer (Sakura Finetek Japan Co., Ltd. (Tokyo, Japan)). The deparaffinized sections were stained with Weigert hematoxylin, followed by washing and differentiation with low pH alcohol, additional washing, eosin staining, dehydration, clearing with carboxylole and xylene, and coverslipping with the Tissue-Tek Film® Automated Coverslipper (Sakura Finetek Japan Co., Ltd.). The immunohistochemical analysis employed the monoclonal mouse antibody clone MIB-1 (FLEX, Dako) on the automated PTLink platform (Dako, Denmark A/S). Visualization utilized EnVision FLEX/HRP (Dako), DAB (EnVision FLEX, Dako), and contrast hematoxylin staining. Whole slide images of HE- and Ki67-stained sections from the same cases were sequentially ordered, anonymized, and scanned using the 3D Histech PANNORAMIC® 250 Flash III 3.0.3 in BrightField Default mode with magnification 20×. HE and Ki67 staining were performed on adjacent tissue sections, to ensure tissue similarity.

3.2. Data Preprocessing

Scanned whole slide images (WSIs) were stored in MRXS format, with each file approximately 1 GB in size. An MRXS file contains images of multiple specimen samples on a single digitized virtual slide, captured at multiple levels with varying resolutions. Since there is a limited set of operations that can be performed on MRXS format scans via Python libraries, for image analysis, we converted the MRXS files to PNG format using the OpenSlide library in Python. The scans included images at 8 levels, with lower levels having higher resolution images. To manage the substantial memory demands of the top-resolution images (approximately 6 GB per image at level 0), we opted to process the images at the second-highest resolution level 1. This level retains sufficient detail without compromising information integrity, thereby alleviating memory-related challenges. HE scans included two tissue sections; thus, we extracted super patches containing a single tissue section from the original scans. For the HE scans, we selected one tissue out of the two that was the most complete or most similar in shape to the Ki67 tissue. This procedure was similarly applied to Ki67 scans, significantly reducing the size of the resulting PNG images. Images were extracted from the WSIs based on exported annotations created in SlideViewer.

3.3. Tissue Registration

To create pairs of patches, it is essential to ensure that patches from the same region on the HE- and Ki67-stained images correspond to the same tissue or tissue from the same region of the slide. In other words, if we place the images on top of each other, there will be an overlap of tissues. Due to rotation and displacement differences in the converted PNG images, an alignment of the two images was required. Since both sections were scanned under the same resolution, the registration does not require scaling and, thus, we consider an affine transformation including translation and rotation. This requires determining three transformation parameters.

It is important to note that the tissue pairs are not identical, preventing cell-level matching. Although the HE and IHC sections were adjacent and sequential, the same cells were not present in both sections. Therefore, the matching was based on the similarity of tissue regions, such as shape. To predict proliferation in staining by neural networks, not having cell-to-cell correspondence may seem like a problem, but if we do not want to teach the model to recognize the difference between individual positive and negative cells, only to recognize patches belonging to a certain degree of proliferation, we can use the HE and Ki67 patches even if there are not identical cells or the same number of cells on them, operating under the assumption that the tissue structure remains relatively preserved in a given area.

To validate this assumption, patches measuring

224 \times 224

and

512 \times 512

pixels were generated, the Ki67-positivity ratio was quantified in accordance with [19], and all patches from the image were visualized in a heatmap, colored based on the degree of Ki67 positivity, as shown in Figure 2. Due to the low Ki67 positivity of tissue patches, negative values were assigned to patches without tissue to distinctly differentiate the tissue-containing areas from the background.

The objective was to confirm that, although the tissues may not be identical, the Ki67 expression in a given region is influenced by the existing tissue structure (aggressive or non-aggressive tumor) which persists in three-dimensional space. Therefore, two adjacent sections will retain this structure and its properties (such as the degree of proliferation) despite discrepancies in individual cells. This implies the presence of regions with uniform Ki67 positivity from which patches can be derived. Examination of the heatmaps supports this assumption, demonstrating homogeneous regions with multiple patches exhibiting the same or similar Ki67 expression, indicating that these values are not randomly distributed across the tissue patches.

In [24], we introduced a semi-automated registration approach based on keypoints and optimization methods. For each HE and Ki67 scan pair, we manually defined pairs of keypoints and used an optimization technique to determine the best transformation parameters between them. Keypoint definition was conducted using SlideViewer, where small square annotations were created for individual keypoints. These annotations were then exported to XML files via SlideMaster. SlideViewer enabled simultaneous viewing and annotation of multiple scans side by side, which expedited the process of annotating keypoints on both HE and Ki67 scans. Corresponding keypoints were given identical annotation names for straightforward identification during subsequent steps. Five square keypoints were annotated for each slide. The XML annotation files recorded the coordinates of each annotation’s top-left corner relative to the top-left corner of the scan, along with the annotation’s width and height. All dimensions were specified for the original scan size (layer 0) and were scaled by dividing by

2^{l a y e r}

, corresponding to the layer from which the PNG image was later extracted. Initially, keypoints were defined as squares rather than points. In the next phase, we determined the centers of these squares and recalculated their coordinates relative to the large bounding box annotation (super patch annotation). These adjusted coordinates in the scaled image slices served as input for the optimization algorithm.

The transformation between the two sets of keypoints was defined as rotation and translation. To simplify the subsequent image transformations, we performed rotations around the center of the image. The rotation matrix is typically defined for rotation around the origin point

[0, 0]

. Therefore, within the transformation function, we first translated all points by the vector

(\frac{- w i d t h}{2}, \frac{- h e i g h t}{2})

, applied the rotation, and then translated them back to their original positions. After the rotation, we applied a tissue shift via translation. By expanding the space from

2 \times 2

to

3 \times 3

, the entire transformation can be expressed using matrices as follows:

(\begin{matrix} x^{'} \\ y^{'} \\ 1 \end{matrix}) = (\begin{matrix} 1 & 0 & \frac{- w i d t h}{2} \\ 0 & 1 & \frac{- h e i g h t}{2} \\ 0 & 0 & 1 \end{matrix}) (\begin{matrix} cos α & - sin α & 0 \\ sin α & cos α & 0 \\ 0 & 0 & 1 \end{matrix}) (\begin{matrix} 1 & 0 & \frac{w i d t h}{2} \\ 0 & 1 & \frac{h e i g h t}{2} \\ 0 & 0 & 1 \end{matrix}) (\begin{matrix} 1 & 0 & p x \\ 0 & 1 & p y \\ 0 & 0 & 1 \end{matrix}) (\begin{matrix} x \\ y \\ 1 \end{matrix}),

(1)

where

x, y

are input keypoint coordinates,

x^{'}, y^{'}

are output transformed coordinates,

α

is the parameter for rotation in radians, and

p x, p y

are coordinates for final translation.

To identify the optimal transformation parameters that minimized the distance between the original HE keypoints and the transformed Ki67 keypoints, we employed an optimization method. The objective function L could be expressed as the sum of the Euclidean distances between the original HE keypoints (

x_{H E}, y_{H E})

and the Ki67 keypoints transformed via the optimized parameters in (1) (

x_{K i 67}^{'}, y_{K i 67}^{'}

) as follows:

L (α, p x, p y) = \sqrt{{(x_{H E} - x_{K i 67}^{'})}^{2} + {(y_{H E} - y_{K i 67}^{'})}^{2}} .

(2)

We utilized the scipy.optimize library in Python, specifically the minimize method, which allows for the selection of various solvers (algorithms) depending on whether the optimization problem includes constraints or bounds. Since our problem had neither, we opted for the default solver. The parameter for this method is a function that computes the objective function L for the optimized parameters

α

,

p x

, and

p y

.

After obtaining the transformation parameters and preprocessing both scans, we compared their dimensions and added white pixels in both directions so that they were the same size. This ensured that the upper left corner of the original image remained in the same position in the new image, preserving the coordinates of keypoints. Here is a summary of the procedure:

1.: Rotate the Ki67 image around its center with the “expand” option enabled, ensuring the resulting image is large enough to contain the entire rotated IHC image, with additional white pixels as padding;
2.: Create a white image of the same dimensions as the rotated Ki67 image;
3.: Calculate the translation vector v for the HE image relative to the white image, ensuring that, when placed with its top-left corner at the origin and then shifted, it is centered;
4.: Adjust the translation vector v by subtracting the shift parameters obtained from the optimization;
5.: Copy each pixel of the HE image to the corresponding coordinates in the white image, adjusted by the translation vector v.

An example of successful registration is shown in Figure 3, where keypoints are highlighted as blue rectangles in the HE images and orange rectangles in the IHC images.

Validation with Convolutional Network Model

The correctness of the registration was validated by assessing the accuracy of the classification model in predicting Ki67 expression directly from HE images, further described in [20]. We compared the accuracy of two models: the first trained on an existing dataset by default, and the second on a randomly generated dataset. A model trained on a randomly generated dataset should exhibit lower validation accuracy, as the annotations in the validation data are assigned randomly, preventing the neural network from modeling this randomness effectively. From the results presented in [20], it is clear the first model achieved high accuracy, while the second model’s accuracy was equivalent to random guessing. The significantly lower accuracy of the second model provides sufficient evidence to confirm the correctness of our registration procedure.

4. Dataset Limitations and Considerations

While the dataset was carefully created, several limitations were identified that should be considered for further data processing and analysis. Firstly, the sample size is limited to 77 participants. Although the morphological spectrum of germ cell tumors is broad, the histomorphology of “conventional seminoma” is uniform across different individuals, representing a diagnostically specific morphological entity. Therefore, while a series of 77 tumors may be sufficiently representative for many research tasks, application-specific data requirements should be considered. Additionally, the dataset exclusively comprises samples of male tissue due to the fact that seminoma, by definition, includes germ cell tumors arising only in testicular tissue.

Nevertheless, despite these limitations, the dataset significantly enhances opportunities for machine learning applications in digital pathology.

5. Conclusions

This dataset is aimed for AI models to predict the Ki67 index or even to generate Ki67 staining. Since one pair of HE and Ki67 images contain two physically different sections of tissue (although neighboring), there is no one-to-one correspondence on a cellular level. However, patches, created on the same locations from the images in one pair, contain similar quantitative characteristics, such as number of cells, number of Ki67 positive cells, averaged cell size, etc. In [20,24], we elaborate on the usage of the dataset by computing the Ki67 index of HE patches evaluated from the corresponding Ki67 patch.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/data9080100/s1. Supplementary data include a single spreadsheet file Additional_data.xlsx; see Section 2.1 for details.

Author Contributions

Conceptualization, I.C. and L.P.; methodology, D.P. and I.C.; software, D.P.; validation, D.P.; formal analysis, D.P. and I.C.; investigation, D.P. and K.T.; resources, K.T. and L.P.; data curation, K.T., D.P., and I.C.; writing—original draft preparation, D.P.; writing—review and editing, I.C., K.T. and L.P.; visualization, D.P.; supervision, I.C. and L.P.; project administration, I.C. and L.P.; funding acquisition, I.C. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Education, Science, Research and Sport of the Slovak Republic under the contract No. VEGA 1/0369/22.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the retrospective analysis of the fully anonymized data used in the study. The consent to use biological material for diagnostic and research purposes was included during admittance to the healthcare facility.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original data presented in the study are openly available at https://doi.org/10.5281/zenodo.11218961.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
GB	Giga byte
HE	Hematoxylin–eosin
HER2	Human epidermal growth factor receptor 2
HSV	Hue saturation value
IHC	Immunohistochemical
Ki67	Proliferation biomarker
MRXS	MIRAX multi-file format with very complicated proprietary metadata and indexes
PNG	Portable Network Graphic
WSIs	Whole slide images
XML	eXtensible Markup Language

References

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
Komura, D.; Ishikawa, S. Machine Learning Methods for Histopathological Image Analysis. Comput. Struct. Biotechnol. J. 2018, 16, 34–42. [Google Scholar] [CrossRef] [PubMed]
Cao, C.; Liu, F.; Tan, H.; Song, D.; Shu, W.; Li, W.; Zhou, Y.; Bo, X.; Xie, Z. Deep Learning and Its Applications in Biomedicine. Genom. Proteom. Bioinform. 2018, 16, 17–32. [Google Scholar] [CrossRef] [PubMed]
Niazi, M.K.K.; Parwani, A.V.; Gurcan, M.N. Digital pathology and artificial intelligence. Lancet Oncol. 2019, 20, e253–e261. [Google Scholar] [CrossRef] [PubMed]
Feng, X.; Shu, W.; Li, M.; Li, J.; Xu, J.; He, M. Pathogenomics for accurate diagnosis, treatment, prognosis of oncology: A cutting edge overview. J. Transl. Med. 2024, 22, 131. [Google Scholar] [CrossRef] [PubMed]
Seegerer, P.; Binder, A.; Saitenmacher, R.; Bockmayr, M.; Alber, M.; Jurmeister, P.; Klauschen, F.; Müller, K.R. Interpretable Deep Neural Network to Predict Estrogen Receptor Status from Haematoxylin-Eosin Images; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 16–37. [Google Scholar] [CrossRef]
Naik, N.; Madani, A.; Esteva, A.; Keskar, N.S.; Press, M.F.; Ruderman, D.; Agus, D.B.; Socher, R. Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains. Nat. Commun. 2020, 11, 5727. [Google Scholar] [CrossRef] [PubMed]
Rawat, R.R.; Ortega, I.; Roy, P.; Sha, F.; Shibata, D.; Ruderman, D.; Agus, D.B. Deep learned tissue “fingerprints” classify breast cancers by ER/PR/Her2 status from H&E images. Sci. Rep. 2020, 10, 7275. [Google Scholar] [CrossRef]
Liu, Y.; Li, X.; Zheng, A.; Zhu, X.; Liu, S.; Hu, M.; Luo, Q.; Liao, H.; Liu, M.; He, Y.; et al. Predict Ki-67 Positive Cells in H&E-Stained Images Using Deep Learning Independently From IHC-Stained Images. Front. Mol. Biosci. 2020, 7, 183. [Google Scholar] [CrossRef]
Shovon, M.S.H.; Islam, M.J.; Nabil, M.N.A.K.; Molla, M.M.; Jony, A.I.; Mridha, M.F. Strategies for enhancing the multi-stage classification performances of HER2 breast cancer from hematoxylin and eosin images. Diagnostics 2022, 12, 2825. [Google Scholar] [CrossRef] [PubMed]
Anand, D.; Kurian, N.C.; Dhage, S.; Kumar, N.; Rane, S.; Gann, P.H.; Sethi, A. Deep learning to estimate human epidermal growth factor receptor 2 status from hematoxylin and eosin-stained breast tissue images. J. Pathol. Inform. 2020, 11, 19. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351. [Google Scholar]
Gerdes, J.; Schwab, U.; Lemke, H.; Stein, H. Production of a mouse monoclonal antibody reactive with a human nuclear antigen associated with cell proliferation. Int. J. Cancer 1983, 31, 13–20. [Google Scholar] [CrossRef] [PubMed]
Kos, Z.; Dabbs, D.J. Biomarker assessment and molecular testing for prognostication in breast cancer. Histopathology 2016, 68, 70–85. [Google Scholar] [CrossRef] [PubMed]
Rabes, H.M.; Schmeller, N.; Hartmann, A.; Rattenhuber, U.; Carl, P.; Staehler, G. Analysis of proliferative compartments in human tumors. II. Seminoma. Cancer 1985, 55, 1758–1769. [Google Scholar] [CrossRef] [PubMed]
Gallegos, I.; Valdevenito, J.P.; Miranda, R.; Fernandez, C. Immunohistochemistry expression of P53, Ki67, CD30, and CD117 and presence of clinical metastasis at diagnosis of testicular seminoma. Appl. Immunohistochem. Mol. Morphol. 2011, 19, 147–152. [Google Scholar] [CrossRef] [PubMed]
Lourenço, B.C.; Guimarães-Teixeira, C.; Flores, B.C.T.; Miranda-Gonçalves, V.; Guimarães, R.; Cantante, M.; Lopes, P.; Braga, I.; Maurício, J.; Jerónimo, C.; et al. Ki67 and LSD1 expression in testicular germ cell tumors is not associated with patient outcome: Investigation using a digital pathology algorithm. Life 2022, 12, 264. [Google Scholar] [CrossRef] [PubMed]
Petríková, D.; Cimrák, I.; Tobiášová, K.; Plank, L. Semi-Automated Workflow for Computer-Generated Scoring of Ki67 Positive Cells from HE Stained Slides. In Proceedings of the BIOINFORMATICS, Barcelona, Spain, 22–24 September 2023; pp. 292–300. [Google Scholar]
Petríková, D.; Cimrák, I. Semi-automated Ki67 Index Label Estimation For HE Images Classification. In Proceedings of the 2024 35th Conference of Open Innovations Association (FRUCT), Tampere, Finland, 24–26 April 2024; Volume 35, pp. 812–817. [Google Scholar]
Brierley, J.D.; Gospodarowicz, M.K.; Wittekind, C. (Eds.) TNM Classification of Malignant Tumours, 8th ed.; John Wiley & Sons: Nashville, TN, USA, 2016. [Google Scholar]
Boormans, J.L.; Mayor de Castro, J.; Marconi, L.; Yuan, Y.; Laguna Pes, M.P.; Bokemeyer, C.; Nicolai, N.; Algaba, F.; Oldenburg, J.; Albers, P. Testicular tumour size and rete testis invasion as prognostic factors for the risk of relapse of clinical stage I seminoma testis patients under surveillance: A systematic review by the testicular cancer guidelines panel. Eur. Urol. 2018, 73, 394–405. [Google Scholar] [CrossRef] [PubMed]
Trevino, K.E.; Esmaeili-Shandiz, A.; Saeed, O.; Xu, H.; Ulbright, T.M.; Idrees, M.T. Pathological risk factors for higher clinical stage in testicular seminomas. Histopathology 2018, 73, 741–747. [Google Scholar] [CrossRef] [PubMed]
Petríková, D.; Cimrák, I.; Tobiášová, K.; Plank, L. Ki67 expression classification from HE images with semi-automated computer-generated annotations. In Proceedings of the BIOINFORMATICS, Kunming, China, 19–21 July 2024. [Google Scholar]

Figure 1. Different tissue staining types: (a) HE tissue, (b) Ki67 IHC tissue.

Figure 2. Heatmaps of patch ratios with patch sizes 224 × 224 (left) and 512 × 512 (right).

Figure 3. Example of successful semi-automated registration for two pairs. (a) HE tissue on the left, (b) Ki67 tissue in the middle, (c) overlay of transformed HE and Ki67 tissue on the right.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Petríková, D.; Cimrák, I.; Tobiášová, K.; Plank, L. Dataset of Registered Hematoxylin–Eosin and Ki67 Histopathological Image Pairs Complemented by a Registration Algorithm. Data 2024, 9, 100. https://doi.org/10.3390/data9080100

AMA Style

Petríková D, Cimrák I, Tobiášová K, Plank L. Dataset of Registered Hematoxylin–Eosin and Ki67 Histopathological Image Pairs Complemented by a Registration Algorithm. Data. 2024; 9(8):100. https://doi.org/10.3390/data9080100

Chicago/Turabian Style

Petríková, Dominika, Ivan Cimrák, Katarína Tobiášová, and Lukáš Plank. 2024. "Dataset of Registered Hematoxylin–Eosin and Ki67 Histopathological Image Pairs Complemented by a Registration Algorithm" Data 9, no. 8: 100. https://doi.org/10.3390/data9080100

APA Style

Petríková, D., Cimrák, I., Tobiášová, K., & Plank, L. (2024). Dataset of Registered Hematoxylin–Eosin and Ki67 Histopathological Image Pairs Complemented by a Registration Algorithm. Data, 9(8), 100. https://doi.org/10.3390/data9080100

Article Menu

Dataset of Registered Hematoxylin–Eosin and Ki67 Histopathological Image Pairs Complemented by a Registration Algorithm

Abstract

1. Summary

1.1. Classification of Histological Images

1.2. Prediction of Ki67 Expression from HE Images

2. Data Description

2.1. Additional Data

3. Methods

3.1. Image Acquisition

3.2. Data Preprocessing

3.3. Tissue Registration

Validation with Convolutional Network Model

4. Dataset Limitations and Considerations

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI