An Approach for Systems-Level Understanding of Prostate Cancer from High-Throughput Data Integration to Pathway Modeling and Simulation

Mobashir, Mohammad; Turunen, S. Pauliina; Izhari, Mohammad Asrar; Ashankyty, Ibraheem Mohammed; Helleday, Thomas; Lehti, Kaisa

doi:10.3390/cells11244121

Open AccessArticle

An Approach for Systems-Level Understanding of Prostate Cancer from High-Throughput Data Integration to Pathway Modeling and Simulation

¹

Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solnavägen 9, Solna 17165, Sweden

²

Faculty of Applied Medical Sciences, University of Al-Baha, Al-Baha 65528, Saudi Arabia

³

Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 22233, Saudi Arabia

⁴

SciLifeLab, Department of Oncology and Pathology, Karolinska Institutet, P.O. Box 1031, 17121 Stockholm, Sweden

^*

Author to whom correspondence should be addressed.

Cells 2022, 11(24), 4121; https://doi.org/10.3390/cells11244121

Submission received: 20 November 2022 / Revised: 14 December 2022 / Accepted: 16 December 2022 / Published: 19 December 2022

(This article belongs to the Collection Emerging Cancer Target Genes)

Download

Browse Figures

Versions Notes

Abstract

:

To understand complex diseases, high-throughput data are generated at large and multiple levels. However, extracting meaningful information from large datasets for comprehensive understanding of cell phenotypes and disease pathophysiology remains a major challenge. Despite tremendous advances in understanding molecular mechanisms of cancer and its progression, current knowledge appears discrete and fragmented. In order to render this wealth of data more integrated and thus informative, we have developed a GECIP toolbox to investigate the crosstalk and the responsible genes’/proteins’ connectivity of enriched pathways from gene expression data. To implement this toolbox, we used mainly gene expression datasets of prostate cancer, and the three datasets were GSE17951, GSE8218, and GSE1431. The raw samples were processed for normalization, prediction of differentially expressed genes, and the prediction of enriched pathways for the differentially expressed genes. The enriched pathways have been processed for crosstalk degree calculations for which number connections per gene, the frequency of genes in the pathways, sharing frequency, and the connectivity have been used. For network prediction, protein–protein interaction network database FunCoup2.0 was used, and cytoscape software was used for the network visualization. In our results, we found that there were enriched pathways 27, 45, and 22 for GSE17951, GSE8218, and GSE1431, respectively, and 11 pathways in common between all of them. From the crosstalk results, we observe that focal adhesion and PI3K pathways, both experimentally proven central for cellular output upon perturbation of numerous individual/distinct signaling pathways, displayed highest crosstalk degree. Moreover, we also observe that there were more critical pathways which appear to be highly significant, and these pathways are HIF1a, hippo, AMPK, and Ras. In terms of the pathways’ components, GSK3B, YWHAE, HIF1A, ATP1A3, and PRKCA are shared between the aforementioned pathways and have higher connectivity with the pathways and the other pathway components. Finally, we conclude that the focal adhesion and PI3K pathways are the most critical pathways, and since for many other pathways, high-rank enrichment did not translate to high crosstalk degree, the global impact of one pathway on others appears distinct from enrichment.

Keywords:

systems-level understanding; high-throughput data; crosstalk; mathematical modeling and simulation; docking; gene expression; enriched pathways

1. Introduction

High-throughput data ((epi-)genomic, transcription, and proteomic) are frequently generated with the goal to understand the genotype–phenotype relationship in complex diseases [1,2,3]. The initial goal in the analysis of these high-throughput data remains the identification of a list of important targets (genes/proteins). Once the target genes/proteins are identified then the major challenge remains at functional annotation and pathway-level understanding [1,2,3,4,5,6]. In this direction, multiple successful approaches have been developed. Among the most frequently used tools for functional annotation for a given gene list are DAVID [7], EnrichNet [8], and PANTHER [9]. So far, in most high-throughput studies, the major focus has been on functional classification and annotation of genes or the analysis of enriched pathways. In cancer, mutations and copy number alterations coupled with epigenetic aberrations and altered gene expression are commonly focused points of study towards understanding the cause, heterogeneity, and drug resistance or defining other phenomena related to the aggressiveness and progression of cancers [10,11]. Accordingly, compared to normal cells, drastic changes occur in cancer cells at multiple levels, including molecular functions and cellular processes [12,13,14,15,16,17,18,19,20,21,22,23,24]. In different cancer types, the pattern of aberrations behind the disrupted tissue and cell functions varies substantially [25,26]. High-throughput identification of the different gene expression patterns and genomic changes are frequently used to uncover the biological pathways involved. From there, it is vital to understand which are the functionally most significant individual changes and pathway-level alterations [1,2,3,27]. To achieve such goals, DAVID, EnrichNet, and PANTHER are helpful. As a result, incorporating multi-omic data in a meaningful way to provide a more comprehensive analysis of a biological area of interest is becoming more common [28,29,30,31,32,33,34,35]. A critical and promising challenge which remains unanswered is how the significantly affected or enriched pathways in a particular disease type or even for the same disease in an individual patient collectively coordinate or crosstalk with each other [31,32,33,36,37,38,39,40].

To address this, we have here developed a GECIP toolbox, a computational method that first processes the list of genes or proteins of interest and performs pathway enrichment and network analysis similar to other methods [30]. Most importantly, and unlike the previously developed approaches [4,5,6,41,42,43,44,45,46,47,48], GECIP calculates the predictive degree of crosstalk (

C T_{d e g r e e}

) between the pathways as the final output. This toolbox is different from the existing previous similar work in the sense that it includes the list of potential genes or targets and proceeds for pathway enrichment analysis, network-level understanding of the genes or the proteins and enriched pathways, and finally, associates the different parameters (list of genes/proteins, connectivity, sharing frequency of genes/pathways with genes/proteins/pathways) to calculate the possible interactions between the enriched pathways.

In this work, the GECIP toolbox was applied for the differentially expressed genes (DEGs) in prostate cancer versus normal tissue. Since the gene expression alterations in different cancer patients and experimental conditions may vary, we have collected three separate human microarray gene expression datasets for prostate cancer (gene expression omnibus; https://www.ncbi.nlm.nih.gov/geo/ (accessed on 1 November 2022)). The steps in GECIP proceed from preparation of a gene list (DEGs), formulation of the network of DEGs, prediction and analysis of enriched KEGG pathways [1,2,3,7,49], and finally, calculation of the

C T_{d e g r e e}

between the pathways (Figure 1).

2. Materials and Methods

Data collection and processing: In this work, we have collected prostate cancer gene expression datasets of normal and cancerous tissue from Gene Expression Omnibus (GEO) [50]. Three sets of datasets for prostate cancer studies have been collected. These gene expression profiling datasets were Affymetrix Human Genome U133 Plus 2.0 Array, Affymetrix Human Genome U133A, and Affymetrix Human Genome U133B, respectively. For generating the DEG network, FunCoup2.0 [51] has been used for all the networks throughout the work, and cytoscape [52] has been used for network visualization [53]. For most of our coding and calculations, MATLAB has been used. FunCoup predicts four different classes of functional coupling or associations such as protein complexes, protein–protein physical interactions, metabolic, and signaling pathways [1,2,3,4,5,6,51].

Analysis of crosstalk between the pathways: To investigate the crosstalk between the enriched pathways, we have divided the approach into three major steps. In the first step we have processed the raw data for normalization applying RMA [7,54]. A standard statistical t-test (mattest MATLAB function) has been applied for identifying significant changes between two groups of data (normal or control and cancer or target) for differential gene expression analysis [8,15,16,17,18,19,22,23,24,53,55,56,57]. To assess the differences in gene expression between two experimental conditions or phenotypes, mattest (https://www.mathworks.com/help/bioinfo/ref/mattest.html, (accessed on 1 November 2022)) uses a two-sample t-test. Every gene is subjected to a standard two-tailed, two-sample unpaired t-test for differential expression, which results in a p-value for every gene. For FDR calculation, the details could be seen in the link mentioned here (https://se.mathworks.com/help/bioinfo/ref/mafdr.html, (accessed on 1 November 2022)). For fold change calculation, we have used the p-values after corrections, and the threshold was 0.05. The same we have also updated in the main text. In the second step, we have prepared the list of DEGs and the total gene list and have also collected the list of KEGG pathways. Once we have these three lists in normal text file format, a Fischer exact test [9,58,59] is performed as follows:

f (k; N, m, n) = \frac{(\begin{matrix} m \\ k \end{matrix}) (\frac{N - m}{n - k})}{(\frac{N}{n})}

(1)

where m is the total number of genes genome-widely annotated with each pathway, N is the total number of genome-wide genes, k is the number of genes annotated with an individual pathway in the gene list, n is the total number of genes in the gene list, and f-value is the value of probability that this random event could happen under hypergeometric distribution [10,11,55,60,61]. To get the probability of more extreme cases, we sum up all the probability as follows:

p = \sum_{k}^{n} f (k; N, m, n)

(2)

For all the pathways listed in the KEGG database (from five classes, as shown in Figure 1), we continue this step. After calculating p-values, multiple hypothesis testing has been applied for false discovery rate (FDR) correction. From this step, we get the list of enriched pathways. We have considered only those pathways which have FDR < 0.05 [12,62].

Finally, in the third step, after calculating the enriched KEGG pathways, we proceed to calculate the crosstalk between the enriched pathways. For crosstalk analysis between all the enriched pathways, an adjacency matrix (

A_{i j}

) is generated for the network, shown in Figure 3c. This

A_{i j}

stands for the connectivity, which means the specific node connecting or not [63,64,65,66,67,68]. Afterward, the degree of crosstalk is calculated by using a mathematical equation (Equation (3)).

C T_{d e g r e e} = \sum_{n o d e_{i}}^{n o d e_{n}} A_{i j} * n o d e_{c o n n} * n o d e_{s h a r i n g} * \frac{C o n n_{t o t a l}}{(n o d e_{s} (c o n n) + n o d e_{t} (c o n n))}

(3)

Here,

C T_{d e g r e e}

is crosstalk degree between two pathways,

n o d e_{c o n n}

is the number of nodes associated with the selected node, and

n o d e_{s h a r i n g}

means how many pathways share the selected nodes.

C o n n_{t o t a l}

represents total connectivity within the DEGs’ network,

n o d e_{s} (c o n n)

stands for total number of links for source node, and

n o d e_{t} (c o n n)

stands for total number of links for target node. Thickness of the connecting links between the pathways has been generated based on the

C T_{d e g r e e}

, which is based on the overall

C T_{d e g r e e}

. The generalized workflow of this work has been presented in Figure 1 (Analysis of crosstalk between the pathways).

Mathematical modeling and simulation of selected pathway components: After crosstalk calculations, we have selected a few components which are well-known to play roles in prostate cancer (Figure 6), prepared a mathematical model by using mass-action kinetics and ordinary differential equations (ODEs) [47,69,70,71,72,73,74], and by using an evolutionary algorithm (EA) [68,75], optimized the parameters. The modeling approach and EA optimization approach were adapted from our previous studies [66,67,68]. The summary of the previous approach could be presented here. The interaction matrix between all these signaling molecules, including complexes, is represented as +1 (production/generation), −1 (degradation/dissociation), and 0 (no interaction). The entries of Aij are chosen once for a network under the constraints that the total amount of each protein is conserved and the SN generated does have a stable inactive state. The fitness of a signaling network was tested by calculating its response to six different signals. For every signal, the dynamics of the pre-selected output node is tested whether it exceeds a threshold level of 1/10, defined to be the relative fraction of the dual phosphorylated protein to the total concentration of the protein. If the output node crosses the threshold level at any point during the dynamics, the network gains a fitness contribution. We have applied an evolutionary algorithm to evolve the networks. Before starting evolution, we create a set of diverse networks (N = 200) with the same randomly generated interaction matrix for three proteins with three states (un-, mono-, and dual-phosphorylated) and different randomly selected kinetic parameters. More details of the methodology related to mathematical modeling and evolution of the signaling networks could be seen in our previous studies [66,67,68].

We have established a signaling cascade that operates similarly to the MAPK signaling cascade. The target level is represented by target proteins, which are those proteins that transmit information to the nucleus in the form of the output response. This signaling cascade is divided into several different levels of signaling, such as the receptor level, intracellular signaling level, and the target level. Then, at various signaling levels, we have incorporated various loop types as well as cross-interactions (crosstalk) between two signaling cascades. In the next phase, we have a mass-action kinetic model which utilizes ODEs for all the molecules. The model equation (Equation (4)) used to express the temporal change in the concentration of the signaling molecules was mentioned in detail in our previous work [62,63,64].

\frac{d x_{i}}{d t} = \sum^{} P r o d u c t i o n_{r a t e} - \sum^{} C o n s u m p t i o n_{r a t e}

(4)

We estimated the fitness of all the cascades after calculating the kinetics of all the molecules. Six distinct input signals (

n_{s} = 6

) were used for all calculations; 0.0001, 0.001, 0.01, 0.1, 1, and 10 were the six different input signals (different in strength), and for each input signal (

n

), the kinetics of individual TP is tested as to whether it exceeds the threshold level (threshold level,

T L = \frac{1}{10} t h

of the initial concentration of TP), defined to be the relative fraction of double-phosphorylated form of TP. If this double-phosphorylated form of TP surpasses the threshold level at any time point in a cascade, with or without FBL and FFL, the cascade is given a value of 1.0, known as fitness factor 1 (

F_{f a c t o r 1}

), else

F_{f a c t o r 2} = 0

. In case of crosstalk of two cascades, in a similar way we check the kinetics of the double-phosphorylated form of TP of both the cascades; if the kinetics of cascade 1 crosses the threshold at any time point, then we assign fitness factor (

F_{f a c t o r 1}

) a value of 0.5, and if the kinetics of cascade 2 crosses the threshold at any time point, then we assign fitness factor (

F_{f a c t o r 2}

) a value of 0.5 for cascade 2. After evaluating the kinetics for each cascade for all six input signals (different in strength), we calculate the fitness (

F

) (as shown in Equation (5)) by taking the mean of the fitness factors (

F_{f a c t o r 1} a n d F_{f a c t o r 2}

), which can be represented as [66,67,68]:

F = \frac{\sum_{n = 1}^{n_{s}} F_{f a c t o r 1} (n) + \sum_{n = 1}^{n_{s}} F_{f a c t o r 2} (n)}{n_{s}}

(5)

We have created a set of cascades (total number of cascades 200) with randomly generated kinetic parameters (

k_{p a r}

) between 0.001 and 0.1. EA [68,76] has been applied to evolve the signaling cascades. For each cascade, we have

F

. After calculating

F

of all the cascades, we select the best 50 cascades (successful cascades) based on higher

F

values. In order to improve the response kinetics, these successful cascades are allowed to adapt new

k_{p a r}

. Four copies of all these 50 cascades with updated

k_{p a r}

are created to keep the total number of cascades equal in each iteration. All these processes are repeated for 200 iterations. Each iteration is called a generation [66,67,68].

Docking profiling: SwissDock has been applied for docking purpose, and the input files were prepared as per SwissDock guidelines [23,77,78]. The ligand structures were collected from PubChem and converted to mol2 format, and the 3D protein structures of the proteins for the selected genes have been predicted by using SWISS-MODEL. After preparing the input files, SwissDock was executed for the default parameters [22,56,57,79,80,81,82].

Experimental validation: We have collected human prostate biopsies for both normal and cancer condition and performed immunohistochemistry (IHC) for human prostate biopsies (normal and cancer). The primary antibodies were p44/42 MAPK (Erk1/2) (L34F12) Mouse mAb #4696 and Phospho-p44/42 MAPK (Erk1/2) (Thr202/Tyr204) (D13.14.4E) XP^® Rabbit mAb #4370, both in a ratio of 1:500 (bought from cell signalling). The secondary antibodies were goat anti-mouse immunoglobulin G (IgG)–Alexa Fluor 488 (1:500; Molecular Probes) and goat anti-rabbit IgG–Alexa Fluor 555 (1:500; Molecular Probes).

Details of the staining procedure: These sections were deparaffinized and rehydrated before antigen retrieval with R-Buffer A (Electron Microscopy Sciences) in a pressure cooker. After blocking with 2% bovine serum albumin, the sections were incubated with different primary antibodies (see above) at 4 °C overnight. Extensive rinsing was performed once; the sections were incubated with the secondary antibodies goat anti-mouse immunoglobulin G (IgG)–Alexa Fluor 488 (1:500; Molecular Probes) and goat anti-rabbit IgG–Alexa Fluor 555 (1:500; Molecular Probes) for 1 h at room temperature. DNA was counterstained with TO-PRO-3 iodide (Molecular Probes), and slides were mounted with ProLong Gold (Molecular Probes).

Fluorescence images were obtained with a Zeiss LSM 780 inverted confocal microscope, using a Plan-Apochromat 40×/NA 1.2 objective. Through-focus maximum projection images were acquired from optical sections 0.5 μm apart and with a section thickness of 1.0 μm.

All the MATLAB codes which have been used for entire calculation are available on request, and the pseudocode is presented below (Figure 2).

3. Results

3.1. Gene Expression Profiling, Enriched Pathways, and Crosstalk Score Calculations

In the first step, the raw microarray gene expression data were processed in order to identify the DEGs. We processed all the raw data through normalization with the robust multi-array average (RMA) method. After normalization, a two-sample t-test statistical analysis approach was applied to identify the differentially expressed genes (DEGs) in the three datasets [25,26,55]. In our analysis, we have analyzed the DEGs for all the selected datasets (Figure 3a). With the DEG lists, we next calculated the enriched KEGG pathways and ranked them based on p-values (p < 0.05; Supplementary Table S1). For pathway enrichment analysis, we have included metabolism, genetic information processing, environmental information processing, cellular processes, and organismal systems, but excluded human diseases and drug development from the altogether seven KEGG database pathway sections (Figure 3b). This exclusion was done to avoid duplication, since the human disease and drug development pathway components are also represented in the five included pathway sections. To calculate

C T_{d e g r e e}

, network of DEGs, genes shared between the pathways, total number of links within the DEG network, and number of links for every gene were included as input for the formulated mathematical equation (Figure 1). To acquire these parameters, we used the FunCoup2.0 network (homo sapiens) database [1,2,3,12,27,51]. There, the networks of functional coupling between genes are reconstructed by integrating heterogeneous data (such as mRNA co-expression, phylogenetic profile similarity, protein–protein interaction, protein co-expression, shared transcription factor binding, and domain association data) [10,11,51]. For all the DEGs, we generated the network by FunCoup2.0, and in order to calculate the

C T_{d e g r e e}

, the required parameters were extracted from this network and the KEGG database.

We found 27, 45, and 22 enriched pathways in Grasso, GSE8218, and GSE1431, respectively (Figure 3c). Grasso includes a gene expression dataset of prostate cancer samples (Agilent Human Genome 44K), GSE8218 includes gene expression prostate cancer samples (Affymetrix U133A array), and GSE1431 includes gene expression prostate cancer samples (Affymetrix U95Av2). Eleven pathways were common among these three datasets, and most of these pathways are known to be altered with different cancers, including prostate cancer (Figure 3d). Of note, most of the critical pathways with a known association to cancer [4,5,6,8,12,41,42,43,44] were represented in the highly enriched pathway list in different types of cancers [9,83,84]. Among the highly enriched signaling pathways were PI3K-Akt, focal adhesion, regulation of actin cytoskeleton, Ras, Hippo, MAPK, and Rap1 signaling pathways. Furthermore, we have also compared the enriched pathways with four more pathway databases (NCI, Reactome, Wiki, BioCarta), where the majority of our pathways were common (Supplementary Figure S1).

Since a large number of enriched pathways were evident in all three individual datasets, to unravel more comprehensively the crosstalk between the enriched pathways, all the DEGs of the three datasets were combined into one list, and we further calculated the

C T_{d e g r e e}

for all the enriched pathways (Figure 4a–c).

3.2. Pathway–Pathway Interactions (Crosstalk) Analysis for Prostate Cancer Gene Expression Data

Many of the enriched pathways were shared between the three datasets, and the pathways with highest

C T_{d e g r e e}

were common (focal adhesion) in the distinct datasets (Figure 4 and Supplementary Table S1). Therefore, analysis of

C T_{d e g r e e}

for all the enriched pathways can be expected to provide concrete information about the significance and/or putative targetability of the pathways and its components for a specific disease. In the combined analysis of the DEG list, focal adhesion, PI3K-Akt, MAPK, cAMP, and RAP1 signaling displayed the highest

C T_{d e g r e e}

(Figure 5). For DEGs of the individual dataset, in all the three datasets, focal adhesion was among the top-ranked, highlighting the central role of cell adhesion and the cytoskeleton in coordinating other prostate cancer signaling pathways (Figure 5a–c and Supplementary Table S1). Common to all our analyses, the enriched pathways which were ranked on top did not always correspond to high

C T_{d e g r e e}

. In other words, the top-ranked enriched pathway(s) were not always on the top in

C T_{d e g r e e}

ranking. Figure 6 shows the networks of enriched pathways, where the numbers in the nodes indicate the ranking of the enriched pathways (Supplementary Table S1). After analyzing the three datasets separately, we have analyzed the connectivities and the genes (Figure 5b), and the crosstalk between the commonly enriched pathways (Figure 5b,c). Based on these analyses, the putative individual contributors of promoting crosstalk were GSK3B, YWHAE, ATP1A3, GNB2, PRKCA, DLG3, MYC, NCKAP1, MAPT, ACTN2, MAPKAPK2, RELA, and RHEB genes. Likewise, in degree distribution, we observe that a significant number of these DEGs have very high connectivity, which leads to the conclusion that the genes with very high connectivities shared between the pathways are altered most dominantly in prostate cancer, which leads to drastic change in pathway–pathway interactions (crosstalk) and acts as a potential source of this crosstalk, as shown in Figure 5b.

The enriched pathways and

C T_{d e g r e e}

varied to some extent in the three different prostate cancer datasets (Figure 4a). Thus, we further analyzed the common enriched pathways and crosstalk network (pathway–pathway interaction network). Most importantly, among the common enriched pathway list were the well-established pathways known to be altered in different cancer types such as Hippo, regulation of actin cytoskeleton, PI3K-Akt, focal adhesion, and MAPK pathways (Figure 4). In this crosstalk network, we can clearly see the impact of one pathway on another (thickness of the edges stands for

C T_{d e g r e e}

) (Figure 5b) with the corresponding

C T_{d e g r e e}

(Figure 5c). Furthermore, Figure 5d presented the detailed view of the genes shared between different pathways and the size of red color circles are based on the connectivity. To understand the crosstalk network at the molecular level, the crosstalk network and the network of DEGs belonging to these pathways were visualized (Figure 5b and Figure 6).

3.3. Mathematical Modeling and Simulation, Docking, and Experimental Validation of Pathway Components Obtained from Gene Expression and Network-Based Crosstalk Calculation

After crosstalk calculations, as shown in Figure 5, we have selected a few components which are well-known to play a role in prostate cancer (Figure 7a), prepared a mathematical model by using mass-action kinetics and ordinary differential equations (ODEs), and, by using an evolutionary algorithm (EA), optimized the parameters. Based on mathematical modeling and simulation results, we observe that ppErk (double-phosphorylated or fully activated Erk) are transiently activated in normal prostate signaling, while in the case of prostate cancer, the signaling of ppErk is persistently activated irrespective of the input signal strength (Figure 7b). For further analysis, we have performed immunohistochemistry (IHC) for human prostate biopsies (normal and cancer), and we observe that in normal prostate biopsy samples, there is less expression of ppErk, while in the case of cancer, it is dominantly expressed (Figure 7c,d). We have prepared the model of the well-known genes/proteins for ERK signaling in a precise way to show how we could move from gene expression calculation to pathway enrichment analysis and crosstalk, and finally, we could extend it for the modeling and the simulation of the selective pathway, the selective components, and the verification of selective molecules. As far as the simulation values are concerned, the initial concentration of the signaling molecules was considered as 10 ul, and the kinetic parameters were generated randomly (of very weak strength <0.01). During the evolutionary period, the rate of the reactions was allowed to be any value in the range of 0.01–100. In summary, we could conclude that after gene expression and pathway analysis, we proceeded to the crosstalk analysis, then we moved to the analysis of dynamics of selective target signaling molecules, and finally performed experimental validation. In experimental validation, we selected the MAPK pathway which is dominantly enriched, and in crosstalk analysis, it also appeared to play a pivotal role in prostate cancer. Thus, for the same, we carried out the mathematical modeling and simulation and investigated the expression of the Erk protein in human prostate biopsy samples. Here we clearly observed that the mathematical modeling outcome (Figure 7a,b) and the experimental outcome (Figure 7c,d) for Erk (active and inactive) are closely similar to each other.

In addition to extending our entire approach, we have integrated our code with AutoDock vina software and also have our own code, which predicts putative binding sites on the protein for the given ligand or drug. Therefore, for more applications in terms of the clinical perspective or personalized therapy, this toolbox can be applied. In this study, three herbal drugs have been selected which are known for their application in herbal cancer therapy (apigenin, quercetin, and resveratrol), targeting the top three proteins which appear to be highly dominant based on our crosstalk analysis (Figure 6); we observe that the delta G values are very close to each other for apigenin and quercetin, which means a higher possibility to have a binding site for the three selected proteins GSK3B, HIF1A, and YWHAE. These values are comparatively lower than that of resveratrol (Figure 7e).

4. Discussion

Most of the cellular network architectures have been reported to fall within a scale-free topology [25,26,85]. In this study, the networks were mapped out for the DEGs from the FunCoup network database [51], and the mapped network gives the association of one gene with another as a protein–protein interaction score. After mapping the networks, we consider that one protein is connected with another protein when the protein–protein interaction score was present, i.e., otherwise 0 (no interaction). Thus, the new topology of the network will be with 0 or 1 instead of protein–protein interaction score, which will be a smooth input file for our next step. To find out the design principle of the network and the important network motifs [34,39,86], it is of key importance to understand the network topology, and thus we have analyzed degree distribution and network motifs [34,87]. Importantly, the degree distributions for all the DEGs’ networks, irrespective of the datasets, followed power-law connectivity distribution, which means that a few hubs dominate the overall connectivity of the network (SI1). In network motif analysis, we further found that the same network motif was present in all the three datasets (Supplementary Information Table S1) [88]. Based on our analysis results, we conclude that the top crosstalk pathway, the focal adhesion pathway highly altered in all the three datasets, can dominantly control cell fate decision and functions in prostate cancer by coordinated signals for the changes in cell morphology, as well as growth, migration, and survival dynamics. This was followed by PI3K-AKT as the second top crosstalk pathway in GSE8218 and Grasso datasets. GSK3B, YWHAE, ATP1A3, GNB2, PRKCA, DLG3, MYC, NCKAP1, MAPT, ACTN2, MAPKAPK2, RELA, and RHEB were identified as candidate genes to promote above shown crosstalk.

With the advancement in high-throughput technologies for gene expression, sequencing, and proteomics experiments, the primary goal has been to find the significant gene (differentially expressed or mutated) or protein list and infer the pathways for these lists. For differential gene expression analysis, there are a number of tools available such as DESeq2, DEseq, baySeq2, EBseq, edgeR, etc. [89]. Since we used a microarray dataset for the application of our developed tool for crosstalk analysis and the main goal was to understand the crosstalk, we simply applied the most common approach for differential gene expression. After normalization of raw .cel files, the approach remains similar to DESeq2. The most common approach for inferring the pathways is pathway enrichment analysis by, e.g., DAVID, EnrichNet, and PANTHER analysis tools. From these tools, we get a large number of pathways with significant p-values, but this does not clarify how these enriched pathways are associated or crosstalk with each other. By using GECIP, we add yet another step to such pathway-level understanding. The aim of this work is methodology-focused, and off course, there are few steps for which the methodology may resemble many different softwares, such as DEGs’ calculation, motif prediction, docking, and the prediction of enriched pathways, while the main goal was crosstalk (CT-score) prediction and is uniquely presented here. The prediction of crosstalk is completely different from any other work. We agree that elation will be helpful, which we have updated.

Based on our analysis, we consider GECIP particularly useful to translate the high-throughput transcriptomic, genetic, and proteomic data to pathway–pathway-interaction-network-level understanding. As a result, few pathways or representative pathway components with most or least overall crosstalk or connectivity can be selected as targets for more precise studies based on the study questions and goals. Instead of focusing on a large number of pathways, the study focus can thus be narrowed to the selected pathways. Based on the overall analysis results, we conclude that the DEGs and enriched pathways can vary from one dataset to another within the same cancer type, whereas there are also common frequently altered pathways. Notably, all the highly enriched pathways will not have comparable

C T_{d e g r e e}

; rather, the pathways ranked low may have high

C T_{d e g r e e}

, which means the major affected crosstalk pathways can also be less enriched (Supplementary Table S1). Out of a large list of enriched pathways, there are few highly interconnected pathways which appeared as the central point of transcriptional alterations. Mathematical simulation of selected pathway components and experimental validation also confirms the accuracy of our newly developed approach.

Finally, we conclude that our approach appears unique and novel, which reveals the systems-level understanding and application, validation, and integration of high-throughput data and the possible scope to predict putative drug targets. In summary, we could say that it helps understanding from large biological data integration to phenomics to final drug targets.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/cells11244121/s1, Figure S1. DEGs network analysis. (a) Prostate cancer DEGs network, (b) degree distribution, and network motif (subgraph). Subgraph (5th) with positive z-score means dominant (highly functional). Table S1: Datasets of all the selected cancer types with accession ID, number of samples, and the platforms for gene expression profiling. References [90,91,92,93,94,95] are cited in Supplementary Materials.

Author Contributions

Conceptualization, M.M., S.P.T., M.A.I., I.M.A., T.H. and K.L.; methodology, M.M., T.H. and K.L.; software, M.M.; validation, M.M., S.P.T., M.A.I., I.M.A., T.H. and K.L.; formal analysis, M.M.; investigation, M.M., T.H. and K.L.; resources, K.L.; data curation, M.M.; writing—original draft preparation, M.M., S.P.T., M.A.I., I.M.A., T.H. and K.L.; writing—review and editing, M.M., T.H. and K.L.; visualization, M.M., T.H. and K.L.; supervision, K.L.; project administration, K.L.; funding acquisition, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by funding Swedish Research Council, Swedish Cancer Foundation, and Strategic Research Program in Cancer (Cancer Research KI).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Karolinska Institute, Sweden.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

We have used publicly available datasets and the proper references were supplied in the main text and thus could be downloaded from there.

Acknowledgments

We are grateful to Swedish Research Council, Swedish Cancer Foundation, and Strategic Research Program in Cancer (StratCan-KICancer). The authors thank the members of Lehti’s lab for discussion, valuable comments, and feedback.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Emilsson, V.; Thorleifsson, G.; Zhang, B.; Leonardson, A.S.; Zink, F.; Zhu, J.; Carlson, S.; Helgason, A.; Walters, G.B.; Gunnarsdottir, S.; et al. Genetics of gene expression and its effect on disease. Nature 2008, 452, 423–428. [Google Scholar] [CrossRef] [PubMed]
Gonzalez-Perez, A.; Mustonen, V.; Reva, B.; Ritchie, G.R.S.; Creixell, P.; Karchin, R.; Vazquez, M.; Fink, J.L.; Kassahn, K.S.; Pearson, J.V.; et al. Com-putational approaches to identify functional genetic variants in cancer genomes. Nat. Meth. 2013, 10, 723–729. [Google Scholar]
Van’T Veer, L.J.; Dai, H.; Van De Vijver, M.J.; He, Y.D.; Hart, A.A.M.; Mao, M.; Peterse, H.L.; Van Der Kooy, K.; Marton, M.J.; Witteveen, A.T.; et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415, 530–536. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Werner, H.M.J.; Mills, G.B.; Ram, P.T. Cancer Systems Biology: A peek into the future of patient care? Nat. Rev. Clin. Oncol. 2014, 11, 167–176. [Google Scholar] [CrossRef] [Green Version]
Vogelstein, B.; Kinzler, K.W. The Path to Cancer—Three Strikes and You’re Out. N. Engl. J. Med. 2015, 373, 1895–1898. [Google Scholar] [CrossRef]
Vogelstein, B.; Kinzler, K.W. Cancer genes and the pathways they control. Nat. Med. 2004, 10, 789–799. [Google Scholar] [CrossRef]
Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinfor-matics resources. Nat. Protoc. 2008, 4, 44–57. [Google Scholar] [CrossRef]
Glaab, E.; Baudot, A.; Krasnogor, N.; Schneider, R.; Valencia, A. EnrichNet: Network-based gene set enrichment analysis. Bioinformatics 2012, 28, i451–i457. [Google Scholar] [CrossRef] [Green Version]
Mi, H.; Poudel, S.; Muruganujan, A.; Casagrande, J.T.; Thomas, P.D. PANTHER version 10: Expanded protein families and functions, and analysis tools. Nucleic Acids Res. 2015, 44, D336–D342. [Google Scholar] [CrossRef] [Green Version]
Croce, C.M. Oncogenes and Cancer. N. Engl. J. Med. 2008, 358, 502–511. [Google Scholar] [CrossRef] [Green Version]
Bishop, J. Molecular themes in oncogenesis. Cell 1991, 64, 235–248. [Google Scholar] [CrossRef] [PubMed]
Hanahan, D.; Weinberg, R.A. Hallmarks of cancer: The next generation. Cell 2011, 144, 646–674. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Almowallad, S.; Alqahtani, L.S.; Mobashir, A.M. NF-kB in Signaling Patterns and Its Temporal Dynamics Encode/Decode Human Diseases. Life 2022, 12, 2012. [Google Scholar] [CrossRef]
Mobashir, M. The Understanding of the Potential Linkage between COVID-19, Type-2 Diabetes, and Cancer(s) Could Help in Better Drug Targets and Therapeutics. Comb. Chem. High Throughput Screen. 2022, 25, 2370–2371. [Google Scholar] [CrossRef] [PubMed]
Helmi, N.; Alammari, D.; Mobashir, M. Role of Potential COVID-19 Immune System Associated Genes and the Potential Pathwayslinkage with Type-2 Diabetes. Comb. Chem. High Throughput Screen. 2021, 24, 1–11. [Google Scholar]
Khouja, H.I.; Ashankyty, I.M.; Bajrai, L.H.; Kumar, P.K.P.; Kamal, M.A.; Firoz, A.; Mobashir, M. Multi-staged gene expres-sion profiling reveals potential genesand the critical pathways in kidneycancer. Sci. Rep. 2022, 12, 7240. [Google Scholar] [CrossRef]
Huwait, E.; Mobashir, M. Potential and Therapeutic Roles of Diosmin in Human Diseases. Biomedicines 2022, 10, 1076. [Google Scholar] [CrossRef]
El-Kafrawy, S.A.; El-Daly, M.M.; Bajrai, L.H.; Alandijany, T.A.; Faizo, A.A.; Mobashir, M.; Ahmed, S.S.; Ahmed, S.; Alam, S.; Jeet, R.; et al. Genomic profiling and network-level un-derstanding uncover the potential genes and the pathways in hepatocellular carcinoma. Front. Genet. 2022, 13, 3304. [Google Scholar] [CrossRef]
Eldakhakhny, B.M.; Sadoun, A.H.; Choudhry, H.; Mobashir, M. In-Silico Study of Immune System Associated Genes in Case of Type-2 Diabetes With Insulin Action and Resistance, and/or Obesity. Front. Endocrinol. 2021, 12, 641888. [Google Scholar] [CrossRef]
Al-Amin, R.A.; Johansson, L.; Abdurakhmanov, E.; Landegren, N.; Löf, L.; Arngården, L.; Blokzijl, A.; Svensson, R.; Hammond, M.; Lönn, P.; et al. Monitoring drug–target interactions through target engagement-mediated amplification on arrays and in situ. Nucleic Acids Res. 2022. [Google Scholar] [CrossRef]
Krishnamoorthy, P.K.P.; Subasree, S.; Arthi, U.; Mobashir, M.; Gowda, C.; Revanasiddappa, P.D. T-cell Epitope-based Vac-cine Design for Nipah Virus by Reverse Vaccinology Approach. Comb. Chem. High Throughput Screen. 2020, 23, 788–796. [Google Scholar] [CrossRef] [PubMed]
Kamal, M.A.; Warsi, M.K.; Alnajeebi, A.; Ali, H.A.; Helmi, N.; Izhari, M.A.; Mustafa, S.; Mobashir, M. Gene expression pro-filing and clinical relevance unravel the role hypoxia and immune signaling genes and pathways in breast cancer: Role of hy-poxia and immune signaling genes in breast cancer. Jimsa 2020, 1, 2–10. [Google Scholar] [CrossRef]
Krishnamoorthy, P.K.P.; Kamal, M.A.; Warsi, M.K.; Alnajeebi, A.M.; Ali, H.A.; Helmi, N.; Izhari, M.A.; Mustafa, S.; Firoz, A.; Mobashir, M. In-silico study reveals immunological signaling pathways, their genes, and potential herbal drug targets in ovarian cancer. Inform. Med. Unlocked 2020, 20, 100422. [Google Scholar] [CrossRef]
Kumar, J.; Umar, T.; Kausar, T.; Mobashir, M.; Nayeem, S.M.; Hoda, N. Identification of lead BAY60-7550 analogues as poten-tial inhibitors that utilize the hydrophobic groove in PDE2A: A molecular dynamics simulation study. J. Mol. Model. 2017, 23, 7. [Google Scholar] [CrossRef] [PubMed]
Greenman, C.; Stephens, P.; Smith, R.; Dalgliesh, G.L.; Hunter, C.; Bignell, G.; Davies, H.; Teague, J.; Butler, A.; Stevens, C.; et al. Patterns of somatic mutation in human cancer genomes. Nature 2007, 446, 153–158. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ross, D.T.; Scherf, U.; Eisen, M.B.; Perou, C.M.; Rees, C.; Spellman, P.; Iyer, V.; Jeffrey, S.S.; van de Rijn, M.; Waltham, M.; et al. Systematic varia-tion in gene expression patterns in human cancer cell lines. Nat. Genet. 2000, 24, 227–235. [Google Scholar] [CrossRef] [Green Version]
Copeland, N.G.; Jenkins, N.A. Deciphering the genetic landscape of cancer—From genes to pathways. Trends Genet. 2009, 25, 455–462. [Google Scholar] [CrossRef]
Pinu, F.R.; Beale, D.J.; Paten, A.M.; Kouremenos, K.; Swarup, S.; Schirra, H.J.; Wishart, D. Systems Biology and Multi-Omics Integration: Viewpoints from the Metabolomics Research Community. Metabolites 2019, 9, 76. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Manzoni, C.; Kia, D.A.; Vandrovcova, J.; Hardy, J.; Wood, N.W.; Lewis, P.A.; Ferrari, R. Genome, transcriptome and prote-ome: The rise of omics data and their integration in biomedical sciences. Brief. Bioinform. 2018, 19, 286–302. [Google Scholar] [CrossRef] [Green Version]
Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinform. Biol. Insights 2020, 14, 1177932219899051. [Google Scholar] [CrossRef] [Green Version]
Ritchie, M.D.; Holzinger, E.R.; Li, R.; Pendergrass, S.; Kim, D. Methods of integrating data to uncover genotype–phenotype interactions. Nat. Rev. Genet. 2015, 16, 85–97. [Google Scholar] [CrossRef] [PubMed]
Bernardes, J.P.; Mishra, N.; Tran, F.; Bahmer, T.; Best, L.; Blase, J.I.; Bordoni, D.; Franzenburg, J.; Geisen, U.; Josephs-Spaulding, J.; et al. Longitudinal Multi-omics Analyses Identify Responses of Megakaryocytes, Erythroid Cells, and Plasmablasts as Hallmarks of Severe COVID-19. Immunity 2020, 53, 1296–1314. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Liu, Y.; Sun, N.; Wang, D.; Boyd-Kirkup, J.; Dou, X.; Han, J.-D.J. Integrating Genomic, Epigenomic, and Tran-scriptomic Features Reveals Modular Signatures Underlying Poor Prognosis in Ovarian Cancer. Cell Rep. 2013, 4, 542–553. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mitra, K.; Carvunis, A.-R.; Ramesh, S.K.; Ideker, T. Integrative approaches forfinding modular structure inbiological net-works. Nat. Rev. Genet. 2013, 14, 719–732. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Vongsangnak, W.; Chen, L.; Shen, B. Integrative analysis reveals disease-associatedgenes and biomarkers for prostate cancerprogression. BMC Med. Genom. 2014, 7, S3. [Google Scholar] [CrossRef] [Green Version]
Zeng, X.; Zong, W.; Lin, C.-W.; Fang, Z.; Ma, T.; Lewis, D.; Enwright, J.; Tseng, G. Comparative Pathway Integrator: A Framework of Meta-Analytic Integration of Multiple Transcriptomic Studies for Consensual and Differential Pathway Analysis. Genes 2020, 11, 696. [Google Scholar] [CrossRef]
Ovaska, K.; Laakso, M.; Haapa-Paananen, S.; Louhimo, R.; Chen, P.; Aittomäki, V.; Valo, E.; Núñez-Fontarnau, J.; Rantanen, V.; Karinen, S.; et al. Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med. 2010, 2, 65. [Google Scholar] [CrossRef] [Green Version]
Mes, S.W.; Beest, T.D.; Poli, T.; Rossi, S.; Scheckenbach, K.; van Wieringen, W.N.; Brink, A.; Bertani, N.; Lanfranco, D.; Silini, E.M.; et al. Prognostic modeling of oral cancer by gene profiles and clinicopathological co-variables. Oncotarget 2017, 8, 59312–59323. [Google Scholar] [CrossRef] [Green Version]
Creixell, P.; Schoof, E.M.; Erler, J.T.; Linding, R. Navigating cancer network attractors for tumor-specific therapy. Nat. Biotechnol. 2012, 30, 842–848. [Google Scholar] [CrossRef] [Green Version]
Gucciardo, E.; Mobashir, M.; Lehti, K. Proactive for invasion: Reuse of matrix metalloproteinase for structural memory. J. Cell Biol. 2016, 213, 11–13. [Google Scholar] [CrossRef] [Green Version]
Guruharsha, K.G.; Kankel, M.W.; Artavanis-Tsakonas, S. The Notch signalling system:recent insights into the complexityof a conserved pathway. Nat. Rev. Genet. 2012, 13, 654–666. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dotto, G.P. Crosstalk of Notch with p53 and p63 in cancer growth control. Nat. Rev. Cancer 2009, 9, 587–595. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Agarwal, P.; Rajagopalan, D. A global pathway crosstalk network. Bioinformatics 2008, 24, 1442–1447. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ranganathan, P.; Weaver, K.L.; Capobianco, A.J. Notch signalling in solid tumours:a little bit of everything but notall the time. Nat. Rev. Cancer 2011, 11, 338–351. [Google Scholar] [CrossRef]
Akdel, M.; Pires, D.E.V.; Pardo, E.P.; Jänes, J.; Zalevsky, A.O.; Mészáros, B.; Bryant, P.; Good, L.L.; Laskowski, R.A.; Pozzati, G.; et al. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 2022, 29, 1056–1067. [Google Scholar] [CrossRef]
Azeloglu, E.U.; Hardy, S.V.; Eungdamrong, N.J.; Chen, Y.; Jayaraman, G.; Chuang, P.Y.; Fang, W.; Xiong, H.; Neves, S.R.; Jain, M.R.; et al. Interconnected Network Motifs Control Podocyte Morphology and Kidney Function. Sci. Signal. 2014, 7, ra12. [Google Scholar] [CrossRef] [Green Version]
Rangamani, P.; Lipshtat, A.; Azeloglu, E.U.; Calizo, R.C.; Hu, M.; Ghassemi, S.; Hone, J.; Scarlata, S.; Neves, S.R.; Iyengar, R. Decoding Information in Cell Shape. Cell 2013, 154, 1356–1369. [Google Scholar] [CrossRef] [Green Version]
Parikshak, N.N.; Gandal, M.J.; Geschwind, D.H. Systems biology and gene networksin neurodevelopmental andneuro-degenerative disorders. Nat. Rev. Genet. 2015, 16, 441–458. [Google Scholar] [CrossRef]
Kanehisa, M.; Goto, S.; Sato, Y.; Furumichi, M.; Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2011, 40, D109–D114. [Google Scholar] [CrossRef] [Green Version]
Davis, S.; Meltzer, P.S. GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 2007, 23, 1846–1847. [Google Scholar] [CrossRef] [Green Version]
Alexeyenko, A.; Sonnhammer, E.L.L. Global networks of functional coupling in eukaryotes from comprehensive data inte-gration. Genome Res. 2009, 19, 1107–1116. [Google Scholar] [CrossRef] [Green Version]
Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of Biomolecular Interaction Networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef] [PubMed]
Warsi, M.K.; Kamal, M.A.; Baeshen, M.N.; Izhari, M.A.; Mobashir, A.F.A.M. Comparative Study of Gene Expression Pro-filing Unravels Functions associated with Pathogenesis of Dengue Infection. Curr. Pharm. Des. 2020, 26, 5293–5299. [Google Scholar] [CrossRef] [PubMed]
Quackenbush, J. Microarray data normalization and transformation. Nat. Genet. 2002, 32, 496–501. [Google Scholar] [CrossRef] [PubMed]
Pomeroy, S.L.; Tamayo, P.; Gaasenbeek, M.; Sturla, L.M.; Angelo, M.; McLaughlin, M.E.; Kim, J.Y.H.; Goumnerova, L.C.; Black, P.M.; Lau, C.; et al. Prediction of central nervous system em-bryonal tumour outcome based on gene expression. Nature 2002, 415, 436–442. [Google Scholar] [CrossRef]
Ahmed, S.; Mobashir, M.; Al-Keridis, L.A.; Alshammari, N.; Adnan, M.; Abid, M.; Hassan, I. A Network-Guided Approach to Discover Phytochemical-Based Anticancer Therapy: Targeting MARK4 for Hepatocellular Carcinoma. Front. Oncol. 2022, 12, 3379. [Google Scholar] [CrossRef]
Anwer, S.T.; Mobashir, M.; Fantoukh, O.I.; Khan, B.; Imtiyaz, K.; Naqvi, I.H.; Rizvi, M.M.A. Synthesis of Silver Nano Parti-cles Using Myricetin and the In-Vitro Assessment of Anti-Colorectal Cancer Activity: In-Silico Integration. IJMS 2022, 23, 11024. [Google Scholar] [CrossRef]
Fisher, R.A. A New Test for 2 × 2 Tables. Nature 1945, 156, 388. [Google Scholar] [CrossRef]
MEHTA, C.R.; PATEL, N.R.; TSIATIS, A.A. Exact Significance Testing to Establish Treatment Equivalence with Ordered Cat-egorical-Data. Biometrics 1984, 40, 819–825. [Google Scholar] [CrossRef]
Dudoit, S.; Shaffer, J.P.; Boldrick, J.C. Multiple Hypothesis Testing in Microarray Experiments. Stat. Sci. 2003, 18, 71–103. [Google Scholar] [CrossRef]
Storey, J.D.; Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 2003, 100, 9440–9445. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide ex-pression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Zhang, Z. A Semi-Supervised Method for Drug-Target Interaction Prediction with Consistency in Networks. PLoS ONE 2013, 8, e62975. [Google Scholar] [CrossRef] [Green Version]
West, J.; Beck, S.; Wang, X.; Teschendorff, A.E. An integrative network algorithm identifies age-associated differential meth-ylation interactome hotspots targeting stem-cell differentiation pathways. Sci. Rep. 2013, 3, 1630. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Atilgan, C.; Okan, O.B.; Atilgan, A.R. Network-Based Models as Tools Hinting at Nonevident Protein Functionality. Annu. Rev. Biophys. 2012, 41, 205–225. [Google Scholar] [CrossRef] [PubMed]
Mobashir, M. Mathematical Modeling and Evolution of Signal Transduction Pathways and Networks. Ph.D. Thesis, Universität, Diss., Magdeburg, Germany, 2013. [Google Scholar]
Mobashir, M.; Madhusudhan, T.; Isermann, B.; Beyer, T.; Schraven, B. Negative Interactions and Feedback Regulations Are Required for Transient Cellular Response. Sci. Rep. 2014, 4, 3718. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mobashir, M.; Schraven, B.; Beyer, T. Simulated Evolution of Signal Transduction Networks. PLoS ONE 2012, 7, e50905. [Google Scholar] [CrossRef]
Bhalla and Ravi Iyengar, U.S. Emergent Properties of Networks of Biological Signaling Pathways. Science 1999, 283, 381–387. [Google Scholar] [CrossRef] [Green Version]
Ma’Ayan, A.; Blitzer, R.D.; Iyengar, R. Toward Predictive Models of Mammalian Cells. Annu. Rev. Biophys. Biomol. Struct. 2005, 34, 319–349. [Google Scholar] [CrossRef] [Green Version]
Cui, Q.; Ma, Y.; Jaramillo, M.; Bari, H.; Awan, A.; Yang, S.; Zhang, S.; Liu, L.; Lu, M.; O’Connor-McCourt, M.; et al. A map of human cancer signaling. Mol. Syst. Biol. 2007, 3, 152. [Google Scholar] [CrossRef]
Kholodenko, B.N. Cell-signalling dynamics in time and space. Nat. Rev. Mol. Cell Biol. 2006, 7, 165–176. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kholodenko, B.N.; Demin, O.V.; Markevich, N.I.; Kiyatkin, A.; Moehren, G.; Hoek, J.B. Signal processing at the Ras circuit: What shapes Ras activation patterns? Syst. Biol. 2004, 1, 104–113. [Google Scholar]
Kholodenko, B.N.; Kiyatkin, A.; Bruggeman, F.J.; Sontag, E.; Westerhoff, H.V.; Hoek, J.B. Untangling the wires: A strategy to trace functional interactions in signaling and gene networks. Proc. Natl. Acad. Sci. USA 2002, 99, 12841–12846. [Google Scholar] [CrossRef] [PubMed]
François, P.; Hakim, V. Design of genetic networks with specified functions by evolution in silico. Proc. Natl. Acad. Sci. USA 2004, 101, 580–585. [Google Scholar] [CrossRef] [Green Version]
Kaneko, K. Evolution of Robustness to Noise and Mutation in Gene Expression Dynamics. PLoS ONE 2007, 2, e434. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Grosdidier, A.; Zoete, V.; Michielin, O. SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res. 2011, 39, W270–W277. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mustafa, S.; Mobashir, M. LC–MS and docking profiling reveals potential difference between the pure and crude fucoidan metabolites. Int. J. Biol. Macromol. 2019, 143, 11–29. [Google Scholar] [CrossRef]
Thiele, I.; Swainston, N.; Fleming, R.M.; Hoppe, A.; Sahoo, S.; Aurich, M.K.; Haraldsdottir, H.; Mo, M.L.; Rolfsson, O.; Stobbe, M.D.; et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 2013, 31, 419–425. [Google Scholar] [CrossRef]
Tao, W. CancerHSP: Anticancer herbsdatabase of systems pharmacology. Sci. Rep. 2015, 5, 11481. [Google Scholar] [CrossRef] [Green Version]
Hurle, M.R.; Yang, L.; Xie, Q.; Rajpal, D.K.; Sanseau, P.; Agarwal, P. Computational Drug Repositioning: From Data to Ther-apeutics. Clin. Pharmacol. Ther. 2013, 93, 335–341. [Google Scholar] [CrossRef]
Nigsch, F.; Macaluso, N.M.; Mitchell, J.B.; Zmuidinavicius, D. Computational toxicology: An overview of the sources of data and of modelling methods. Expert Opin. Drug Metab. Toxicol. 2009, 5, 1–14. [Google Scholar] [CrossRef] [PubMed]
Koboldt, D.C.; Fulton, R.S.; McLellan, M.D.; Schmidt, H.; Kalicki-Veizer, J.; McMichael, J.F.; Fulton, L.L.; Dooling, D.J.; Ding, L.; Mardis, E.R.; et al. Comprehensive molecular portraits of human breast tumours. Nature 2012, 490, 61–70. [Google Scholar]
Bell, D.; Berchuck, A.; Birrer, M.; Chien, J.; Cramer, D.W.; Dao, F.; Dhir, R.; DiSaia, P.; Gabra, H.; Glenn, P. Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474, 609–615. [Google Scholar]
Barabasi, A.-L.; Oltvai, Z.N. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5, 101–113. [Google Scholar] [CrossRef]
Hu, J.X.; Thomas, C.E.; Brunak, S. Network biology concepts in complex disease comorbidities. Nat. Rev. Genet. 2016, 17, 615–629. [Google Scholar] [CrossRef] [PubMed]
Vidal, M.; Cusick, M.E.; Barabási, A.-L. Interactome Networks and Human Disease. Cell 2011, 144, 986–998. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Milo, R.; Shen-Orr, S.; Itzkovitz, S.; Kashtan, N.; Chklovskii, D.; Alon, U. Network Motifs: Simple Building Blocks of Complex Networks. Science 2002, 298, 824–827. [Google Scholar] [CrossRef] [Green Version]
Froussios, K.; Schurch, N.J.; Mackinnon, K.; Gierliński, M.; Duc, C.; Simpson, G.G.; Barton, G.J. How well do RNA-Seq dif-ferential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana. Bioinformatics 2019, 35, 3372–3377. [Google Scholar] [CrossRef] [Green Version]
Rhodes, D.R.; Yu, J.; Shanker, K.; Deshpande, N. ONCOMINE: A Cancer Microarray Database and Integrated Data-Mining Platform—ScienceDirect. Neoplasia 2004, 6, 1–6. [Google Scholar] [CrossRef] [Green Version]
Yu, Y.P.; Landsittel, D.; Jing, L.; Nelson, J.; Ren, B.; Liu, L.; McDonald, C.; Thomas, R.; Dhir, R.; Finkelstein, S.; et al. Gene Expression Alterations in Prostate Cancer Predicting Tumor Aggression and Preceding Development of Malignancy. J. Clin. Oncol. 2004, 22, 2790–2799. [Google Scholar] [CrossRef]
Stuart, R.O.; Wachsman, W.; Berry, C.C.; Wang-Rodriguez, J.; Wasserman, L.; Klacansky, I.; Masys, D.; Arden, K.; Goodison, S.; McClelland, M.; et al. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc. Natl. Acad. Sci. USA 2004, 101, 615–620. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jia, Z.; Wang, Y.; Sawyers, A.; Yao, H.; Rahmatpanah, F.; Xia, X.-Q.; Xu, Q.; Pio, R.; Turan, T.; Koziol, J.A.; et al. Diagnosis of Prostate Cancer Using Differentially Expressed Genes in Stroma. Cancer Res 2011, 71, 2476–2487. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, Y.; Xia, X.-Q.; Jia, Z.; Sawyers, A.; Yao, H.; Wang-Rodriquez, J.; Mercola, D.; McClelland, M. In silico Estimates of Tissue Components in Surgical Samples Based on Expression Profiling Data. Cancer Res 2010, 70, 6448–6455. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Grasso, C.S.; Wu, Y.-M.; Robinson, D.R.; Cao, X.; Dhanasekaran, S.M.; Khan, A.P.; Quist, M.J.; Jing, X.; Lonigro, R.J.; Brenner, J.C.; et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 2012, 487, 239–243. [Google Scholar] [CrossRef] [PubMed]

Figure 1. GECIP (Gene set, Enrichment, and Crosstalk between the Inferred Pathways) workflow for pathway enrichment analysis and crosstalk degree calculation for the enriched pathways.

Figure 2. Pseudocode for carry out the entire work presented in this study.

Figure 3. GECIP pathway analysis: (a) total number of DEGs for three different datasets of prostate cancer, (b) pathways selected from different KEGG pathway classes, (c) total number of enriched KEGG pathways for three different datasets of prostate cancer, (d) p-values (the heat map has been plotted by converting p-values into −10 ∗ log10 of p-values (red: lowest p-value and gray: higher p-values and less than 0.05)) of 11 common enriched pathways.

Figure 4.

C T_{d e g r e e}

analysis for the enriched pathways: Degree of crosstalk between the pathways for (a) GSE8218, (b) GSE1431, and (c) Grasso dataset. Different thickness of the edges represents the degree of crosstalk between the pathways (higher the higher the

C T_{d e g r e e}

, and lower thickness means lower

C T_{d e g r e e}

). Different edges colors are for clarity only.

Figure 4.

C T_{d e g r e e}

analysis for the enriched pathways: Degree of crosstalk between the pathways for (a) GSE8218, (b) GSE1431, and (c) Grasso dataset. Different thickness of the edges represents the degree of crosstalk between the pathways (higher the higher the

C T_{d e g r e e}

, and lower thickness means lower

C T_{d e g r e e}

). Different edges colors are for clarity only.

Figure 5. Crosstalk between the common enriched pathways: (a) genes belonging (in yellow color) to the crosstalk (pathway–pathway interaction) network and total connectivities for each gene (bigger circle means higher connectivities), (b) heatmap to represent the corresponding crosstalk degree between the pathways, (c) pathway–pathway interaction network for common enriched pathways among all the three datasets and the pathway components (gene–gene associations), and (d) genes shared by the 11 pathways present in pathway–pathway interaction network and their degree of association within the crosstalk network for the corresponding genes.

Figure 6. Crosstalk between the common enriched pathways: Pathway–pathway interactions together with the corresponding genes. The hexagonal nodes with number 1–11 represent the pathways, and the genes with more than one color mean that they are shared by more than one pathway. The higher the size of circular nodes (genes), the higher the degree of connectivity.

Figure 7. Systems-level application, validation, and integration of high-throughput data: (a) A sketch for AKT signaling pathway components in case of prostate cancer, (b) kinetics of ppERK calculated by using mass-action kinetics and ODE approach in case of normal and prostate cancer signaling, (c) the quantification for immunohistochemistry data, (d) ppERK expression in normal human and cancer prostate biopsies, and (e) delta G representing the herbal drugs’ interactions with GSK3B, HIF1A, and YWHAE, which appear highly dominant based on high-throughput databased crosstalk result.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mobashir, M.; Turunen, S.P.; Izhari, M.A.; Ashankyty, I.M.; Helleday, T.; Lehti, K. An Approach for Systems-Level Understanding of Prostate Cancer from High-Throughput Data Integration to Pathway Modeling and Simulation. Cells 2022, 11, 4121. https://doi.org/10.3390/cells11244121

AMA Style

Mobashir M, Turunen SP, Izhari MA, Ashankyty IM, Helleday T, Lehti K. An Approach for Systems-Level Understanding of Prostate Cancer from High-Throughput Data Integration to Pathway Modeling and Simulation. Cells. 2022; 11(24):4121. https://doi.org/10.3390/cells11244121

Chicago/Turabian Style

Mobashir, Mohammad, S. Pauliina Turunen, Mohammad Asrar Izhari, Ibraheem Mohammed Ashankyty, Thomas Helleday, and Kaisa Lehti. 2022. "An Approach for Systems-Level Understanding of Prostate Cancer from High-Throughput Data Integration to Pathway Modeling and Simulation" Cells 11, no. 24: 4121. https://doi.org/10.3390/cells11244121

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Approach for Systems-Level Understanding of Prostate Cancer from High-Throughput Data Integration to Pathway Modeling and Simulation

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Gene Expression Profiling, Enriched Pathways, and Crosstalk Score Calculations

3.2. Pathway–Pathway Interactions (Crosstalk) Analysis for Prostate Cancer Gene Expression Data

3.3. Mathematical Modeling and Simulation, Docking, and Experimental Validation of Pathway Components Obtained from Gene Expression and Network-Based Crosstalk Calculation

4. Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI