Next Article in Journal
Health Benefits of Indoor Cycling: A Systematic Review
Previous Article in Journal
Temporary Fixation of Reduction with Fabric Adhesive Bandage in the Surgical Treatment of Pediatric Supracondylar Humerus Fractures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessment of Drugs Toxicity and Associated Biomarker Genes Using Hierarchical Clustering

by
Mohammad Nazmol Hasan
1,
Masuma Binte Malek
2,
Anjuman Ara Begum
3,
Moizur Rahman
4 and
Md. Nurul Haque Mollah
3,*
1
Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Agricultural University, Gazipur 1706, Bangladesh
2
Department of Statistics, Bangladesh Bank, Dhaka 1000, Bangladesh
3
Bioinformatics Lab., Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh
4
Animal Husbandry and Veterinary Science, University of Rajshahi, Rajshahi 6205, Bangladesh
*
Author to whom correspondence should be addressed.
Medicina 2019, 55(8), 451; https://doi.org/10.3390/medicina55080451
Submission received: 30 June 2019 / Revised: 4 August 2019 / Accepted: 6 August 2019 / Published: 8 August 2019

Abstract

:
Background and objectives: Assessment of drugs toxicity and associated biomarker genes is one of the most important tasks in the pre-clinical phase of drug development pipeline as well as in toxicogenomic studies. There are few statistical methods for the assessment of doses of drugs (DDs) toxicity and their associated biomarker genes. However, these methods consume more time for computation of the model parameters using the EM (expectation-maximization) based iterative approaches. To overcome this problem, in this paper, an attempt is made to propose an alternative approach based on hierarchical clustering (HC) for the same purpose. Methods and materials: There are several types of HC approaches whose performance depends on different similarity/distance measures. Therefore, we explored suitable combinations of distance measures and HC methods based on Japanese Toxicogenomics Project (TGP) datasets for better clustering/co-clustering between DDs and genes as well as to detect toxic DDs and their associated biomarker genes. Results: We observed that Word’s HC method with each of Euclidean, Manhattan, and Minkowski distance measures produces better clustering/co-clustering results. For an example, in the case of the glutathione metabolism pathway (GMP) dataset LOC100359539/Rrm2, Gpx6, RGD1562107, Gstm4, Gstm3, G6pd, Gsta5, Gclc, Mgst2, Gsr, Gpx2, Gclm, Gstp1, LOC100912604/Srm, Gstm4, Odc1, Gsr, Gss are the biomarker genes and Acetaminophen_Middle, Acetaminophen_High, Methapyrilene_High, Nitrofurazone_High, Nitrofurazone_Middle, Isoniazid_Middle, Isoniazid_High are their regulatory (associated) DDs explored by our proposed co-clustering algorithm based on the distance and HC method combination Euclidean: Word. Similarly, for the peroxisome proliferator-activated receptor signaling pathway (PPAR-SP) dataset Cpt1a, Cyp8b1, Cyp4a3, Ehhadh, Plin5, Plin2, Fabp3, Me1, Fabp5, LOC100910385, Cpt2, Acaa1a, Cyp4a1, LOC100365047, Cpt1a, LOC100365047, Angptl4, Aqp7, Cpt1c, Cpt1b, Me1 are the biomarker genes and Aspirin_Low, Aspirin_Middle, Aspirin_High, Benzbromarone_Middle, Benzbromarone_High, Clofibrate_Middle, Clofibrate_High, WY14643_Low, WY14643_High, WY14643_Middle, Gemfibrozil_Middle, Gemfibrozil_High are their regulatory DDs. Conclusions: Overall, the methods proposed in this article, co-cluster the genes and DDs as well as detect biomarker genes and their regulatory DDs simultaneously consuming less time compared to other mentioned methods. The results produced by the proposed methods have been validated by the available literature and functional annotation.

1. Introduction

Assessment of groups of similar toxic doses of drugs (DDs) and their regulatory biomarker genes is the most important objective of toxicity investigation in the pre-clinical phase of drug development process as well as in toxicogenomic studies. Biomarker genes are a set of genes that are differentially expressed in the treatment group of animal compared to the control group. This set of genes is also efficient to differentiate the toxic DDs from the non-toxic DDs. Biomarker genes and their regulatory DDs can be assessed by toxicogenomic study which emerges from toxicology. Toxicology is a field of science which studies the adverse effects of chemicals and environmental exposures in living organisms [1]. The prime objective of this study is the empirical and contextual characterization of adverse effects of chemicals/drugs from tissue, the cell, and the intracellular molecular systems of organisms. Presently, the rapid accumulation of omics (genomics, transcriptomics, proteomics, metabolomics) data, development of sophisticated statistical tools and gene and protein annotation techniques have capitalized the application of gene expression analysis to understand the toxicity mechanism of drugs or chemical compounds and environmental stressors on biological systems. The development of these technologies leads to the development of the new field “toxicogenomics” from toxicology targeting to study the response of the whole genome to DDs or environmental stressors [2,3,4,5,6,7]. The adverse effects of the toxicants in an organism cause pathological changes in certain organs which can be detected by changes in the expression of genes, protein synthesis, and metabolism. Among these, the gene expression or abundant of mRNA is the most sensitive measure of these changes. Thus, toxicogenomics, which enables us to comprehensively analyze gene expression changes caused by an external stimulus in a specific organ, is considered to be one of the most powerful strategies [8,9]. But the toxicogenomic experiment produces a gigantic size of gene expression data. Analysis of this gigantic size of data is very complex and sometimes produces non-robust results for knowledge discovery about biomarker genes and toxicants. Therefore, pathway or molecular network-based gene expression data analysis increases the predictive power and produces more stable biomarkers [10,11,12,13,14].
On the other hand, toxicogenomic data analysis as well as knowledge discovery about the biomarkers and the toxicity of the DDs and environmental stressors often becomes tardy due to the following reasons. (1) Improper selection of statistical/computational tools. (2) Traditional ways of interpretation on the results of computational tools which do not cover the objectives of the study. For example, t-test and Mann–Whitney U test [12,15], and ANOVA [16,17] have been used to detect toxicogenomic biomarker genes. However, none of these methods can assess the similar toxic DDs and their associated biomarker genes which is one of the important objectives of toxicity investigation of drug candidates in the pre-clinical phase of the drug development pipeline. The limitation of the above mention methods can be overcome by using hidden variable models [14,18,19]. The hidden variable models are capable to detect toxic DDs and their regulatory biomarker genes by co-clustering DDs and genes. Nevertheless, since hidden variable models are EM (expectation-maximization) [20] based iterative method, these methods require comparatively more time to compute the model parameters. Therefore, to overcome this problem, in this paper, we propose an alternative algorithm based on hierarchical clustering (HC) for co-clustering DDs and genes as well as to discover toxic DDs and their associated biomarker genes. The term cluster analysis refers to the process of assigning data to different groups (clusters) according to their similarity. This approach provides an intuitive method for interpreting complex data such as microarray, transcriptomic, and epigenomic data. There are several types of HC (ward, single, complete, average, mcquitty, median, centroid) approaches whose performance depends on different similarity/distance (euclidean, maximum, manhattan, canberra, minkowski) measures. Every combination of distance and HC methods do not perform equally in grouping objects for all types of datasets. Even the performance of some of these combinations is very poor in some specific fields of study. In the literature, any suitable combination of distance and HC method is not suggested yet for clustering/co-clustering of toxicogenomic data.Hence, in this paper, we explore suitable combinations of distance measures and HC methods based on known Japanese Toxicogenomics Project (TGP) datasets for better clustering/co-clustering between DDs and genes as well as to detect toxic DDs and their associated biomarker genes.

2. Methods and Materials

2.1. Data Processing

To investigate toxicity of drugs, mRNA abundance in the liver of Rattus Norvegicus is measured administering multiple dose levels and time points. A well-designed experiment set to measure gene expression is measured from the treatment group samples where the treatments are the underlying conditions (DDs with time combinations). There are also control samples concurrently to the treatment group samples. The fold change gene expression (FCGE) y p q r t for the p t h   ( p = 1 ,   2 ,   P ) drug, q t h   ( q = 1 ,   2 ,   3 ) dose level, t t h   ( t = 1 ,   2 ,   ,   T ) time point, and r t h   ( r = 1 ,   2 ,   3 ) animal sample can be computed from the gene expression of the treatment and control group of samples using the equation:
Y p q t r = l o g 2 ( x p q t r x p q t r ) = l o g 2 ( x p q t r ) l o g 2 ( x p q t r ) .
For single time point this equation can be written as
Y p q r = l o g 2 ( x p q r x p q r ) = l o g 2 ( x p q r ) l o g 2 ( x p q r ) .
In the Equation (1) x p q t r is the expression of a gene under the treatment group of animal and x p q t r is the expression of that gene under the control group of animal when the expression is measured at multiple points of time. Similarly, in Equation (2) x p q r and x p q r are the expression of a gene for the treatment and control group of animal, respectively when expression is measured at single time point. The average FCGE value over the animal samples of a gene are Y ¯ p q t . and Y ¯ p q . respectively for multiple and single time point. From these average FCGE values the effect of DDs over the genes can be measured. The values will be positive for upregulated genes and negative for downregulated genes. The datasets of the average FCGE value are the input of our analysis.

2.2. Hierarchical Clustering (HC) Algorithms and Distance Measures

The clustering task is solved by the application of various methods depending on the data. Each of these approaches will have peculiarities and the determination of what is the correct or what determines accurate clustering is not easily defined. Hierarchical clustering can proceed using various linkage/clustering and distance methods. The distance method determines how the distance between two observations is calculated. The linkage/clustering method is used when deciding the distance for observations that have already been merged together. Commonly used distance methods are shown in Table 1. In the analysis of biological data, the most commonly used clustering methods are of two types: Hierarchical and non-hierarchical (also known as partitioning). The hierarchical clustering approach builds clusters by repeatedly joining and merging the objects separated by the shortest distance. Following merging of the closest two points the distance matrix is updated and the process repeated until all objects are joined. In this article we have considered five distance methods (euclidean, maximum, manhattan, canberra, minkowski) and seven HC clustering methods (single, complete, average, ward, mcquitty, median, and centroid). We compare all the combinations of distance and HC methods for selecting more suitable combinations for clustering genes or DDs of toxicogenomic data. The description of these HC algorithms is as follows:

2.2.1. Single Linkage

The single linkage HC algorithm clusters objects (genes or doses of chemical compounds) of toxicogenomic data based on the distance or similarity between two pairs of genes/DDs. At the starting, the smallest distance D = { d G i , G i } will be found and merge the corresponding genes and form a cluster ( G i G i ). In the next step, the distance between the clusters ( G i G i ) and G i are computed by
d ( G i , G i ) G i = min { d G i , G i , d G i , G i }
to form the cluster ( G i , G i G i ). This process continues until all genes merge into a single cluster.

2.2.2. Complete Linkage

In the complete linkage HC algorithm two objects form a cluster together, when their distance is the largest. The general agglomerative algorithm starts finding the minimum entry D = { d G i , G i } and merges corresponding genes, such as G i and G i , to get cluster ( G i , G i ). In the next step clusters ( G i G i ) and G i will be merged into a cluster ( G i , G i G i ) based on their maximum distance which is computed as
d ( G i , G i ) G i = max { d G i , G i , d G i , G i } .
This process continues until all genes merge into a single cluster.

2.2.3. Average Linkage

Average linkage treats the distance between two clusters as the average between all pairs of items where one member of a pair belongs to each cluster. We begin searching the distance matrix D = { d G i , G i } to find the nearest genes, for example, G i and G i objects are merged to get the cluster ( G i G i ). In the subsequent step, the distance between ( G i G i ) and cluster G i is obtained by
d ( G i G i ) G i = i i d i i N G i G i N G i
where d i i is the distance between gene i in cluster ( G i G i ) and gene i in cluster G i and N G i G i and N G i are the number of genes in clusters ( G i G i ) and G i respectively.

2.2.4. Centroid

The centroid method involves finding out the mean vector for each of the clusters and talking distance between two centroids. Initially, each of the genes is a cluster then distance between two clusters G i and G i is calculated as:
D = d { F ( G i , C . ) ¯ ,   F ( G i , C .   ) ¯ }

2.2.5. Median

The median HC method seeks the median of each of the clusters and measures the distance between two median points. The distance between the median of two clusters G i and G i is
D = d { F ( G i , C M e d ) ,   F ( G i , C M e d ) } .

2.2.6. Ward’s Algorithm

Ward’s HC algorithm clusters objects based on minimizing ‘loss of information’ from joining two groups. This algorithm used error sum of squares (ESS) to measure the loss of information. Firstly, for a given cluster r, let E S S r be the sum of squared deviations of every item in the cluster from the cluster mean (centroid). If there are r clusters, define ESS as E S S = E S S 1 + E S S 2 + + E S S r . At each step in the analysis, the union of every possible pair of clusters is considered, and the two clusters whose combination results in the smallest increase in ESS (minimum loss of information) are joined. Initially, each cluster consists of a single item, and, if there are N items, E S S r = 0 ,   r = 1 ,   2 ,   N , so E S S = 0 .

2.2.7. Distance Measures for HC

Most of the distance measure quantifies the distance or dissimilarity among m-dimensional objects or items of a dataset. For example, for a n × m gene-DDs toxicogenomic data matrix consisting of G = ( G 1 ,   G 2 ,   ,   G n ) genes and C = ( C 1 ,   C 2 ,   ,   C m ) DDs. We consider the ( i , j ) t h input in the data matrix as F ( G i , C j ) for convenient using. This input actually represents average FCGE value Y ¯ p q . or Y ¯ p q t . for single or multiple time points. The following are important distance measure used in HC.

2.3. Selection of the Suitable Combination of Distance and HC Method

The hierarchical clustering methods group/cluster objects are based on distance matrix which is obtained from the original data matrix. There are also different methods to obtain distance matrix. We investigated the suitability of the combination of distance and HC methods for clustering genes or DDs using DDs clustering error rate (ER) based on known pathway based real datasets. The ER measures the percentage of miss-clustered DDs according to the known DDs which is calculated as:
E R = M i s s c l u s t e r e d   D D s   T o t a l   D D s × 100 .
The HC algorithm in combination with distance method which produces the least clustering ER is the more suitable combination of clustering and distance methods for grouping genes or DDs.

2.4. Co-Clustering between Genes and DDs and Detection of Toxic DDs and Associated Biomarker Genes Using HC

In the toxicity study, the subsets of DDs regulate the expression profile of the subsets of genes. Accordingly, the genes in a biological pathway perform specific functions and the toxic DDs alter the expression pattern of a subset of biomarker genes in that pathway [19,21]. These biomarker genes and the toxic DDs can be explored from the biomarker co-clusters. For this purpose, more suitable distance and HC methods that produce less ER are used to cluster genes and DDs of toxicogenomic data. Our proposed algorithm follows the following steps to make co-clusters between genes and DDs.
Step 1:
Fix the number of clusters in the genes as well as in DDs observing the dendrogram produced by HC according to the researchers’ interest.
Step 2:
Take absolute of the FCGE values within intersection areas for all pairs of genes and DDs clusters to give them equal weight in average calculations. Since the FCGE value for upregulated and downregulated genes consists of positive and negative expression values, respectively.
Step 3:
Compute the average of the absolute FCGE value for intersection areas of all pairs of genes and DDs clusters.
Step 4:
Rank the average FCGE values (computed in step 3) and the respective genes and DDs clusters simultaneously.
Step 5:
Assign cluster numbers for genes and DDs newly, based on the ranked average FCGE values which we get from step 4. For example, the gene and DD cluster intersection which produces the largest average FCGE value; we assign both of these gene and DD clusters as cluster 1. Simultaneously, the genes and DDs in cluster 1 together with form co-cluster 1. Similarly, we assign both of the gene and DD cluster as cluster 2 which produces the second largest average FCGE value and they form co-cluster 2 accordingly.
According to the characteristics of toxicogenomic data, a cluster of DDs can form co-cluster with single or more than one cluster of genes, when a DDs cluster might upregulate a set of genes and simultaneously downregulate another set of genes. Researchers consider a gene as differentially expressed or biomarker if its FCGE value is greater than 1.5. In that case, the expression intensity of that gene in the treatment group of samples is almost 3 times larger comparing to its expression in the control group of samples. But when the expression of a gene in the treatment group is 2 times larger than its expression in the control group, the FCGE value of that gene is 1. Therefore, we termed the co-clusters, which average FCGE value greater than one as biomarker co-clusters, and the genes and DDs in these co-clusters as biomarker genes and their regulatory DDs.

2.5. Real TGP Datasets to Investigate Clustering Performance

The Japanese Toxicogenomics Project (TGP) [22] collected gene expression data setting out a well-planned experimental condition. There were mainly two types of experiments, one is an in vivo experiment another is an in vitro experiment. The experimental condition pattern of the in vivo experiment was the combination of four time points (3 h, 6 h, 9 h, 24 h) and three dose levels (low, middle, high) and two organs (liver and kidney) of each of the drugs. These treatment conditions were applied on the Rattus norvegicus for collecting gene expression data from the target organ. There was also the control animal concurrently for each of the treatment group of animal in the experiment. The FCGE data can be computed from the gene expression data of the treatment group and control group samples produced by this experiment using the Equations (1) and (2). Toxygates a user-friendly interactive data analysis platform as well as database [15] where the FCGE data of the TGP experiment is available. The drugs’ toxicity effects are more clearly visible at 24 h time point compared to the 3 h, 6 h, and 9 h time points [15]. That is why in this paper, we have considered pathway level FCGE data from Rattus Norvegicus, in vivo, liver, and single and multiple dose experiments at the 24 h time point. We have downloaded the glutathione metabolism pathway (GMP) and peroxisome proliferator-activated receptor signaling pathway (PPAR-SP) datasets for some selected drugs along with their dose levels whose toxicity mechanism are known [15,23] from Toxygates (http://toxygates.nibiohn.go.jp/toxygates/#columns). Additionally, to investigate the performance of the selected distance and HC methods for clustering toxicogenomic data, datasets for the mentioned pathways for multiple time points and dose levels are also analyzed in this article.

3. Results

3.1. Selection of Suitable Combination of Distance and HC Methods

As mentioned earlier in the toxicogenomic data, the subsets of DDs regulates the expression patterns of the respective subsets of genes. Therefore, clustering/co-clustering of genes and DDs is an important issue in toxicogenomic studies. HC is a popular and widely used clustering algorithm that uses various distance measures and clustering methods for clustering genes or DDs of toxicogenomic data. However, none of the researchers suggested yet any suitable combination of distance and HC clustering methods for toxicogenomic data. Therefore, to do this we have used two known datasets GMP and PPAR-SP at the 24 h time point [14,15,23] because toxic effects of DDs are more clearly visible at this time point compared to the 3 h, 6 h, or 9 h time points [15]. In the GMP dataset, acetaminophen, methapyrilene, and nitrofurazone are considered as glutathione depleting and erythromycin, hexachlorobenzene, isoniazid, gentamicin, glibenclamide, penicillamine, and perhexilline are considered as non-glutathione depleting drugs [15]. In the PPAR-SP dataset WY-14643, clofibrate, gemfibrozil, benzbromarone, and aspirin are considered as PPARs regulated gene influencing drugs [23] and cisplatin, diltiazem, methapyrilene, phenobarbital, and triazolam are randomly selected drugs. The detail description of the datasets is given in Section 2.5. For comparing the 35 combinations of distance and HC clustering methods, we calculate the ER for both of the datasets in two ways. In the first way, we consider the glutathione depleting and PPARs-regulatory gene influencing drugs in one cluster and the others in another cluster for the respective datasets.
In that case, the FCGE value is merged into a single value averaging over the dose levels (low, middle, and high). In the second way, we consider high and middle doses of glutathione depleting drugs and PPARs-regulated gene influencing drugs in one cluster and other DDs in another cluster for GMP and PPAR-SP datasets, respectively. Therefore, each of the datasets is split into two datasets. For these datasets, the ER is displayed against the 35 combinations of distance and HC clustering methods in Table 2. From this table it is observed that the distance and HC method combinations euclidean: ward, manhattan: ward, and minkowski: ward produce smaller and stable ER in all datasets.
Therefore, we suggest these combinations of distance and HC methods for clustering DDs or genes of toxicogenomic data.

3.2. Detection of Biomarker Genes and Their Regulatory DDs from the Co-Clusters

The important objective of the toxicogenomics studies is to explore subset of DDs which have the similar mechanism of action over a subset of genes. This can be done by applying our proposed algorithm described in Section 2.4 on the results obtained from the suitable combination of distance and HC methods. It is observed from the results of the previous Section 3.1, the more suitable combinations of distance and HC methods are euclidean: ward, manhattan: ward, and minkowski: ward. As an example, in this article we show the analysis of GMP and PPAR-SP datasets at 24 h as well as multiple (3 h, 6 h, 9 h, and 24 h) time points using the combinations of HC (ward) and distance (Euclidean) methods.
The dendrogram of DDs and genes based on the distance (Euclidean) and HC (ward) methods for GMP and PPAR-SP datasets at 24 h as well as multiple time points are depicted in the figures Figure 1 and Figure S1 (Supplementary File), respectively. The ranked clusters/co-clusters (according to average FCGE value within the co-clusters) for the GMP and PPAR-SP datasets are given in Table 3 and Table 4, respectively. In these tables, the genes and DDs cluster numbers within the parenthesis represent the newly assigned cluster numbers based on the proposed co-clustering algorithm described in Section 2.4. For example, in the first row of Table 3 the original HC produced cluster number for both of the gene and DDs is 3. Since, the intersection mean of these genes and DDs cluster is the largest than other genes and DDs cluster intersection mean, we assign the both of the gene and DDs cluster as 1. Figure 2 represents the image of the co-clusters in which genes and DDs are arranged according to the ranked average FCGE values within the co-clusters (Table 3 and Table 4). The biomarker co-clusters along with the proposed method assigned cluster number having the largest average FCGE values (consisting of biomarker genes and their regulatory DDs) are given in Table 5 and Table 6 for GMP and PPAR-SP datasets, respectively. The results generated by the proposed methods for GMP and PPAR-SP datasets are validated by the literature [14,15,23] and functional annotation by the DAVID database [24]. The results of the functional annotation for biomarker genes are given in Table 7, Table 8, Table 9 and Table 10. The detail results of genes and DDs clustering results are given in Supplementary file (Tables S1–S4).

4. Discussion

The important objectives of the toxicity investigation in the pre-clinical phase of the drug development process as well as in toxicogenomic studies are the subsets of DDs which have the similar mechanism of action over the respective subsets of genes and to assess toxic DDs and their regulatory toxicogenomic biomarker genes. With a view to satisfy these objectives, different authors have incorporated a number of statistical tools in their works. For example, t-test and Mann–Whitney U test [12,15], and ANOVA [16,17] were used for the exploration of biomarker genes. Nonetheless, these methods cannot satisfy the mentioned objectives. Although, there are few statistical methods [14,18,19] for the assessment of doses of drugs (DDs) toxicity and their associated biomarker genes, these methods consume more time for computation of the model parameters using the EM (expectation-maximization) [20] based iterative approaches. To overcome this problem, in this paper, we have proposed an alternative approach based on hierarchical clustering (HC) for the same purpose. However, there are several types of HC approaches whose performance depend on different similarity/distance measures. Therefore, we explored suitable combinations of distance measures and HC methods based on Japanese Toxicogenomics Project (TGP) datasets for better clustering/co-clustering between DDs and genes as well as to detect toxic DDs and their associated biomarker genes.
We investigated the performance of 35 combinations of distance (euclidean, maximum, manhattan, canberra, minkowski) and HC (ward, single, complete, average, mcquitty, median, centroid) methods based on the known real glutathione metabolism pathway (GMP) and PPAR signaling pathway (PPAR-SP) datasets [15,23] using ER. It is observed that the combinations euclidean: ward, manhattan: ward, and minkowski: ward produce more stable and lower ER for the mentioned datasets. Therefore, we have proposed ward’s HC methods in combination with distance methods euclidean, manhattan, or minkowski for clustering/co-clustering genes and DCCs of toxicogenomic data. For example, we have analyzed GMP and PPAR-SP for single and multiple time points datasets using the distance and HC method combination euclidean: ward based proposed co-clustering algorithm described in Section 2.4. In the case of the glutathione metabolism pathway (GMP) dataset LOC100359539/Rrm2, Gpx6, RGD1562107, Gstm4, Gstm3, G6pd, Gsta5, Gclc, Mgst2, Gsr, Gpx2, Gclm, Gstp1, LOC100912604/Srm, Gstm4, Odc1, Gsr, Gss are the biomarker genes explored from biomarker co-clusters (for single and multiple time points datasets combined) and Acetaminophen_Middle, Acetaminophen_High, Methapyrilene_High, Nitrofurazone_High, Nitrofurazone_Middle, Isoniazid_Middle, Isoniazid_High are their regulatory (associated) DDs. Similarly, for the PPAR signaling pathway (PPAR-SP) dataset Cpt1a, Cyp8b1, Cyp4a3, Ehhadh, Plin5, Plin2, Fabp3, Me1, Fabp5, LOC100910385, Cpt2, Acaa1a, Cyp4a1, LOC100365047, Cpt1a, LOC100365047, Angptl4, Aqp7, Cpt1c, Cpt1b, Me1 are the biomarker genes and Aspirin_Low, Aspirin_Middle, Aspirin_High, Benzbromarone_Middle, Benzbromarone_High, Clofibrate_Middle, Clofibrate_High, WY14643_Low, WY14643_High, WY14643_Middle, Gemfibrozil_Middle, Gemfibrozil_High are their regulatory DDs. These results are validated by the available literature [14,15,23] and functional annotation.

5. Conclusions

Overall, the study has shown that the proposed methods have significant advantage over the existing biomarker gene detection as well as co-clustering methods due to the following reasons.
  • Detect the biomarker genes and the regulatory (associated) DDs simultaneously.
  • The method safe time, since it requires less time for preparing results compared to the other EM based iterative co-clustering methods.
  • The results produced by the method conform to the literature and database results.

Supplementary Materials

The following are available online at https://www.mdpi.com/1648-9144/55/8/451/s1. Figure S1: Gene clustering of glutathione metabolism (GMP) and PPAR signaling pathway (PPAR-SP) datasets based on Euclidean distance method in combination with ward HC method. Table S1: Gene and DDs clusters as well as co-clusters generated by the proposed co-clustering algorithm based on the combination of distance (Euclidean) and HC (ward) methods for glutathione metabolism pathway datasets at 24 h time point. Table S2: Gene and DDs clusters as well as co-clusters generated by the proposed co-clustering algorithm based on the combination of distance (Euclidean) and HC (ward) methods for glutathione metabolism pathway datasets at 3 h, 6 h, 9 h, and 24 h time points. Table S3: Gene and DDs clusters as well as co-clusters generated by the proposed co-clustering algorithm based on the combination of distance (Euclidean) and HC (ward) methods for PPAR signaling pathway dataset at 24 h time point. Table S4: Gene and DCCs clusters as well as co-clusters generated by the proposed co-clustering algorithm based on the combination of distance (Euclidean) and HC (ward) methods for PPAR signaling pathway dataset at 3 h, 6 h, 9 h, and 24 h time points.

Author Contributions

Conceptualization M.N.H. and M.N.H.M.; methodology M.N.H. and M.B.M.; software M.N.H.; validation M.N.H.M., M.B.M., A.A.B. and M.R.; formal analysis M.N.H.; investigation M.N.H.M.; resources M.N.H.M.; data curation M.N.H.; writing—original draft preparation M.N.H. and M.B.M.; writing—review and editing M.N.H.M., M.B.M., A.A.B. and M.R.; supervision M.N.H.M.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. Waters, M.D.; Fostel, J.M. Toxicogenomics and systems toxicology: Aims and prospects. Nat. Rev. Genet. 2004, 5, 936–948. [Google Scholar] [CrossRef] [PubMed]
  2. Aardema, M.J.; MacGregor, J.T. Toxicology and genetic toxicology in the new era of “toxicogenomics”: Impact of “-omics” technologies. Mutat. Res. 2002, 499, 13–25. [Google Scholar] [CrossRef]
  3. Afshari, C.A. Perspective: Microarray Technology, Seeing More Than Spots. Endocrinology 2002, 143, 1983–1989. [Google Scholar] [CrossRef] [PubMed]
  4. Ulrich, R.; Friend, S.H. Toxicogenomics and drug discovery: Will new technologies help us produce better drugs? Nat. Rev. Drug Discov. 2001, 1, 84–88. [Google Scholar] [CrossRef] [PubMed]
  5. Zacharewski, T.R.; Fielden, M.R. Challenges and Limitations of Gene Expression Profiling in Mechanistic and Predictive Toxicology. Toxicol. Sci. 2001, 60, 6–10. [Google Scholar] [Green Version]
  6. Olden, K.; Guthrie, J. Genomics: Implications for toxicology. Mutat. Res. 2001, 473, 3–10. [Google Scholar] [CrossRef]
  7. Knall, C.M.; Davis, J.W.; Paules, R.S.; Boggs, S.E.; Afshari, C.A.; Burchiel, S.W. Analysis of Genetic and Epigenetic Mechanisms of Toxicity: Potential Roles of Toxicogenomics and Proteomics in Toxicology. Toxicol. Sci. 2001, 59, 193–195. [Google Scholar]
  8. Uehara, T.; Hirode, M.; Ono, A.; Kiyosawa, N.; Omura, K.; Shimizu, T.; Mizukawa, Y.; Miyagishima, T.; Nagao, T.; Urushidani, T. A toxicogenomics approach for early assessment of potential non-genotoxic hepatocarcinogenicity of chemicals in rats. Toxicology 2008, 250, 15–26. [Google Scholar] [CrossRef]
  9. Igarashi, Y.; Nakatsu, N.; Yamashita, T.; Ono, A.; Ohno, Y.; Urushidani, T.; Yamada, H. Open TG-GATEs: A large-scale toxicogenomics database. Nucleic Acids Res. 2015, 43, D921–D927. [Google Scholar] [CrossRef]
  10. Yildirimman, R.; Brolén, G.; Vilardell, M.; Eriksson, G.; Synnergren, J.; Gmuender, H.; Kamburov, A.; Ingelman-Sundberg, M.; Castell, J.; Lahoz, A.; et al. Human Embryonic Stem Cell Derived Hepatocyte-Like Cells as a Tool for In Vitro Hazard Assessment of Chemical Carcinogenicity. Toxicol. Sci. 2011, 124, 278–290. [Google Scholar] [CrossRef] [Green Version]
  11. Hofree, M.; Shen, J.P.; Carter, H.; Gross, A.; Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 2013, 10, 1108–1115. [Google Scholar] [CrossRef] [PubMed]
  12. Hardt, C.; Beber, M.; Rasche, A.; Kamburov, A.; Hebels, D.; Kleinjans, J.; Herwig, R. ToxDB: Pathway-level interpretation of drug-treatment data. Database 2016, 2016, 1–6. [Google Scholar] [CrossRef] [PubMed]
  13. Kim, S. Identifying dynamic pathway interactions based on clinical information. Comput. Boil. Chem. 2017, 68, 260–265. [Google Scholar] [CrossRef] [PubMed]
  14. Hasan, M.N.; Rana, M.M.; Begum, A.A.; Rahman, M.R.R.; Mollah, M.N.H. Robust Co-clustering to Discover Toxicogenomic Biomarkers and Their Regulatory Doses of Chemical Compounds Using Logistic Probabilistic Hidden Variable Model. Front. Genet. 2018, 9, 516. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Nyström-Persson, J.; Igarashi, Y.; Ito, M.; Morita, M.; Nakatsu, N.; Yamada, H.; Mizuguchi, K. Toxygates: Interactive toxicity analysis on a hybrid microarray and linked data platform. Bioinformatics 2013, 29, 3080–3086. [Google Scholar] [CrossRef] [PubMed]
  16. Hasan, M.N.; Akond, Z.; Alam, M.J.; Begum, A.A.; Rahman, M.; Mollah, M.N.H. Toxic Dose prediction of Chemical Compounds to Biomarkers using an ANOVA based Gene Expression Analysis. Bioinformation 2018, 14, 369–377. [Google Scholar] [CrossRef]
  17. Otava, M.; Shkedy, Z.; Kasim, A. Prediction of gene expression in human using rat in vivo gene expression in Japanese Toxicogenomics Project. Syst. Biomed. 2014, 2, 8–15. [Google Scholar] [CrossRef] [Green Version]
  18. Zhu, S.; Okuno, Y.; Tsujimoto, G.; Mamitsuka, H. A probabilistic model for mining implicit ’chemical compound-gene’ relations from literature. Bioinformatics 2005, 21 (Suppl. 2), 245–251. [Google Scholar] [CrossRef]
  19. Chung, M.-H.; Wang, Y.; Tang, H.; Zou, W.; Basinger, J.; Xu, X.; Tong, W. Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics. Front. Pharmacol. 2015, 6, 1–7. [Google Scholar] [CrossRef]
  20. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar] [CrossRef]
  21. Afshari, C.A.; Hamadeh, H.K.; Bushel, P.R. The evolution of bioinformatics in toxicology: Advancing toxicogenomics. Toxicol. Sci. 2011, 120, S225–S237. [Google Scholar] [CrossRef] [PubMed]
  22. Uehara, T.; Ono, A.; Maruyama, T.; Kato, I.; Yamada, H.; Ohno, Y.; Urushidani, T. The Japanese toxicogenomics project: Application of toxicogenomics. Mol. Nutr. Food Res. 2010, 54, 218–227. [Google Scholar] [CrossRef] [PubMed]
  23. Kiyosawa, N.; Shiwaku, K.; Hirode, M.; Omura, K.; Uehara, T.; Shimizu, T.; Mizukawa, Y.; Miyagishima, T.; Ono, A.; Nagao, T.; et al. Utilization of a one-dimensional score for surveying chemical-induced changes in expression levels of multiple biomarker gene sets using a large-scale toxicogenomics database. J. Toxicol. Sci. 2006, 31, 433–448. [Google Scholar] [CrossRef] [PubMed]
  24. Huang da, W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4, 44–57. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Doses of drugs (DDs) clustering of GMP and PPAR-SP datasets based on the Euclidean distance method in combination with the ward HC method. (A) DDs clustering of GMP dataset at 24 h time point. (B) DDs clustering of GMP dataset at multiple (3 h, 6 h, 9 h, and 24 h) time points. (C) DDs clustering of PPAR-SP dataset at 24 h time point. (D) DDs clustering of PPAR-SP dataset at multiple (3 h, 6 h, 9 h, and 24 h) time point.
Figure 1. Doses of drugs (DDs) clustering of GMP and PPAR-SP datasets based on the Euclidean distance method in combination with the ward HC method. (A) DDs clustering of GMP dataset at 24 h time point. (B) DDs clustering of GMP dataset at multiple (3 h, 6 h, 9 h, and 24 h) time points. (C) DDs clustering of PPAR-SP dataset at 24 h time point. (D) DDs clustering of PPAR-SP dataset at multiple (3 h, 6 h, 9 h, and 24 h) time point.
Medicina 55 00451 g001
Figure 2. Structural view of co-clusters retrieved by our HC based proposed co-clustering algorithm of the GMP and PPAR-SP datasets. (A) GMP dataset for 24 h time point. (B) GMP dataset for multiple time points. (C) PPAR-SP dataset for 24 h time point. (D) PPAR-SP dataset for multiple time points.
Figure 2. Structural view of co-clusters retrieved by our HC based proposed co-clustering algorithm of the GMP and PPAR-SP datasets. (A) GMP dataset for 24 h time point. (B) GMP dataset for multiple time points. (C) PPAR-SP dataset for 24 h time point. (D) PPAR-SP dataset for multiple time points.
Medicina 55 00451 g002
Table 1. Important distance measures used in hierarchical clustering.
Table 1. Important distance measures used in hierarchical clustering.
Distance MeasureMathematical Form
Euclidean d G i , G i = ( j = 1 m ( F ( G i , C j ) F ( G i , C j ) ) 2 ) 1 / 2
Minkowski d G i , G i = ( j = 1 m | F ( G i , C j ) F ( G i , C j ) | v ) 1 / v
Manhattan d G i , G i = j = 1 m | F ( G i , C j ) F ( G i , C j ) |
Canbera d G i , G i = j = 1 m | F ( G i , C j ) F ( G i , C j ) | F ( G i , C j ) + F ( G i , C j )
Maximum d G i , G i = m a x j | F ( G i , C j ) F ( G i , C j ) |
Table 2. Percent of error rate (ER) for 35 combinations of distance and HC clustering methods calculated from the glutathione metabolism and PPAR signaling pathway datasets.
Table 2. Percent of error rate (ER) for 35 combinations of distance and HC clustering methods calculated from the glutathione metabolism and PPAR signaling pathway datasets.
SlCombination of Distance and HC Clustering MethodsDrug Clustering ER for GMP DataDrug Clustering ER for PPAR-SP DataDDs Clustering ER for GMP DataDDs Clustering ER for PPAR-SP Data
1euclidean:ward1006.66666666720
2euclidean:single104016.6666666736.66666667
3euclidean:complete103026.6666666720
4euclidean:average104026.6666666720
5euclidean:mcquitty404026.6666666713.33333333
6euclidean:median40403.33333333326.66666667
7euclidean:centroid404016.6666666730
8maximum:ward10016.6666666710
9maximum:single104016.6666666736.66666667
10maximum:complete20016.6666666726.66666667
11maximum:average104026.6666666736.66666667
12maximum:mcquitty104026.6666666736.66666667
13maximum:median404026.6666666736.66666667
14maximum:centroid404016.6666666730
15manhattan:ward1006.66666666720
16manhattan:single404016.6666666736.66666667
17manhattan:complete10303.33333333320
18manhattan:average104026.6666666720
19manhattan:mcquitty10403.33333333320
20manhattan:median404026.6666666736.66666667
21manhattan:centroid404016.6666666730
22canberra:ward50103020
23canberra:single501023.3333333336.66666667
24canberra:complete50103020
25canberra:average50103023.33333333
26canberra:mcquitty50403023.33333333
27canberra:median50404036.66666667
28canberra:centroid504033.3333333336.66666667
29minkowski:ward1006.66666666720
30minkowski:single104016.6666666736.66666667
31minkowski:complete103026.6666666720
32minkowski:average104026.6666666720
33minkowski:mcquitty404026.6666666713.33333333
34minkowski:median40403.33333333326.66666667
35minkowski:centroid404016.6666666730
Table 3. Doses of drug and gene co-clustering mean (ranked) of the glutathione metabolism pathway datasets for the combination (Euclidean: ward) of distance and hierarchical clustering methods.
Table 3. Doses of drug and gene co-clustering mean (ranked) of the glutathione metabolism pathway datasets for the combination (Euclidean: ward) of distance and hierarchical clustering methods.
Euclidean: ward, Dataset: glutathione metabolism pathway at 24 h time point
Gene and compound co-clusterCo-cluster mean
Gene-Cluster-3(1): Compound-Cluster-3(1)2.5550390
Gene-Cluster-2(2): Compound-Cluster-2(2)1.6619841
Gene-Cluster-3(1): Compound-Cluster-2(2)0.8249199
Gene-Cluster-3(1): Compound-Cluster-1(3)0.8129127
Gene-Cluster-2(2): Compound-Cluster-3(1)0.5994644
Gene-Cluster-1(3): Compound-Cluster-3(1)0.5991663
Gene-Cluster-1(3): Compound-Cluster-2(2)0.4653372
Gene-Cluster-2(2): Compound-Cluster-1(3)0.3402437
Gene-Cluster-1(3): Compound-Cluster-1(3)0.2481545
Euclidean: ward, Dataset: glutathione metabolism pathway at 3 h, 6 h 9 h, and 24 h time points
Gene and compound co-clusterCo-cluster mean
Gene-Cluster-3(1): Compound-Cluster-2(1)1.2954907
Gene-Cluster-1(2): Compound-Cluster-1(2)0.6118177
Gene-Cluster-2(3): Compound-Cluster-1(2)0.5850958
Gene-Cluster-3(1): Compound-Cluster-1(2)0.5157947
Gene-Cluster-3(1): Compound-Cluster-3(3)0.3513179
Gene-Cluster-1(2): Compound-Cluster-2(1)0.3360666
Gene-Cluster-2(3): Compound-Cluster-2(1)0.3285539
Gene-Cluster-1(1): Compound-Cluster-3(3)0.2478899
Gene-Cluster-2(3): Compound-Cluster-3(3)0.2424664
Table 4. Doses of drug and gene co-clustering mean (ranked) of the PPAR signaling pathway datasets for the combination (Euclidean: ward) of distance and hierarchical clustering methods.
Table 4. Doses of drug and gene co-clustering mean (ranked) of the PPAR signaling pathway datasets for the combination (Euclidean: ward) of distance and hierarchical clustering methods.
Euclidean: ward, Dataset: PPAR signaling pathway at 24 h time point
Gene and compound co-clusterCo-cluster mean
Gene-Cluster-1(1): Compound-Cluster-1(1)1.5972416
Gene-Cluster-3(2): Compound-Cluster-2(2)0.6596625
Gene-Cluster-3(2): Compound-Cluster-1(1)0.6522308
Gene-Cluster-1(1): Compound-Cluster-2(2)0.4973316
Gene-Cluster-2(3): Compound-Cluster-1(1)0.3994878
Gene-Cluster-2(3): Compound-Cluster-2(2)0.2378871
Euclidean: ward, Dataset: PPAR signaling pathway at 3 h, 6 h 9 h, and 24 h time points
Gene and compound co-clusterCo-cluster mean
Gene-Cluster-3(1): Compound-Cluster-2(1)1.5863836
Gene-Cluster-1(2): Compound-Cluster-2(1)0.5842037
Gene-Cluster-1(2): Compound-Cluster-1(2)0.4385611
Gene-Cluster-3(1): Compound-Cluster-1(2)0.4025768
Gene-Cluster-2(3): Compound-Cluster-2(1)0.2569643
Gene-Cluster-2(3): Compound-Cluster-1(2)0.1757952
Table 5. Biomarker co-clusters consisting of biomarker genes and their regulatory doses of drugs explored by the combination (Euclidean: ward) of distance and hierarchical clustering methods for glutathione metabolism pathway datasets.
Table 5. Biomarker co-clusters consisting of biomarker genes and their regulatory doses of drugs explored by the combination (Euclidean: ward) of distance and hierarchical clustering methods for glutathione metabolism pathway datasets.
Biomarker GenesRegulatory Doses of Drugs
Euclidean: ward, Dataset: glutathione metabolism pathway at 24 h time point
Gene-cluster-3: LOC100359539/Rrm2, LOC100359539/Rrm2, Gpx6, RGD1562107DCCs-cluster-3: isoniazid_Middle, isoniazid_High
Gene-cluster-2: Gclc, Gstm4, Gstm3, G6pd, Gsta5, Gclc, Mgst2, Gsr, Gpx2, Gclm, Gstp1DCCs-cluster-2: acetaminophen_Middle, acetaminophen_High, methapyrilene_High, nitrofurazone_High
Euclidean:ward, Dataset: glutathione metabolism pathway at 3 h, 6 h 9 h, and 24 h time points
Gene-cluster-2: LOC100912604/Srm, Gclc, Gstm4, Gstm3, G6pd, Gsta5, Gclc, Odc1, Mgst2, Gsr, Gss, Gpx2, Gclm, Gstp1DCCs-cluster-3: acetaminophen_High_24.hr, acetaminophen_Middle_24.hr, methapyrilene_High_6.hr, methapyrilene_High_24.hr, methapyrilene_High_9.hr, nitrofurazone_High_24.hr, nitrofurazone_High_6.hr, nitrofurazone_Middle_6.hr, nitrofurazone_High_9.hr, nitrofurazone_Middle_9.hr,
Table 6. Biomarker co-clusters consisting of biomarker genes and their regulatory doses of drugs explored by the combination (Euclidean: ward) of distance and hierarchical clustering methods for PPAR signaling pathway datasets.
Table 6. Biomarker co-clusters consisting of biomarker genes and their regulatory doses of drugs explored by the combination (Euclidean: ward) of distance and hierarchical clustering methods for PPAR signaling pathway datasets.
Biomarker GenesRegulatory Doses of Drugs
Euclidean: ward, Dataset: PPAR signaling pathway at 24 h time point
Gene-cluster-1: Cpt1a, Cyp8b1, Cyp4a3, Ehhadh, Plin5, Fabp3, Me1, Fabp5, LOC100910385, Cpt2, Acaa1a, Cyp4a1, LOC100365047, Cpt1a, LOC100365047, Angptl4, Aqp7, Cpt1c, Cpt1b, Me1DCCs-cluster-1: aspirin_Low, aspirin_High, aspirin_Middle, benzbromarone_High, clofibrate_High, WY14643_Low, WY14643_High, WY14643_Middle
Euclidean: ward, Dataset: PPAR signaling pathway at 3 h, 6 h 9 h, and 24 h time points
Gene-cluster-3: Cyp4a3, Ehhadh, Plin2, Plin5, Me1, LOC100910385, Cpt2, Acaa1a, Cyp4a1, Angptl4, Cpt1b DCCs-cluster-2: aspirin_Low_9.hr, aspirin_Low_24.hr, aspirin_High_9.hr, aspirin_High_24.hr, aspirin_Middle_24.hr, benzbromarone_Middle_6.hr, benzbromarone_High_9.hr, benzbromarone_High_3.hr, benzbromarone_Middle_9.hr, enzbromarone_High_24.hr, benzbromarone_High_6.hr, benzbromarone_Middle_3.hr, clofibrate_Middle_6.hr, clofibrate_High_24.hr, clofibrate_Middle_9.hr, clofibrate_High_6.hr, clofibrate_High_9.hr, gemfibrozil_High_24.hr, gemfibrozil_Middle_24.hr, gemfibrozil_High_9.hr, WY.14643_High_6.hr, WY.14643_Middle_6.hr, WY.14643_Middle_24.hr, WY.14643_Low_3.hr, WY.14643_Low_24.hr, WY.14643_Middle_9.hr, WY.14643_Low_6.hr, WY.14643_High_9.hr, WY.14643_Middle_3.hr, WY.14643_High_3.hr, WY.14643_Low_9.hr, WY.14643_High_24.hr
Table 7. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-1 discovered by the distance and HC method combination Euclidean: ward, Dataset: glutathione metabolism pathway at 24 h time point.
Table 7. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-1 discovered by the distance and HC method combination Euclidean: ward, Dataset: glutathione metabolism pathway at 24 h time point.
TermCount%p-ValueFDRGenes
rno00480: Glutathione metabolism266.667.48E−32.04E−38RGD1562107, Gpx6
Table 8. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-2 discovered by the distance and HC method combination Euclidean: ward, Dataset: glutathione metabolism pathway at 24 h time point.
Table 8. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-2 discovered by the distance and HC method combination Euclidean: ward, Dataset: glutathione metabolism pathway at 24 h time point.
TermCount%p-ValueGenes
rno00480: Glutathione metabolism101003.85E−20Mgst2, Gpx2, G6pd, Gclm, Gsr, Gsta5, Gclc, Gclc, Gstp1, Gstm3, Gstm4
rno00980: Metabolism of xenobiotics by cytochrome P450550.07.43E−7Mgst2, Gsta5, Gstp1, Gstm3, Gstm4
rno00982: Drug metabolism—cytochrome P450550.07.87E−7Mgst2, Gsta5, Gstp1, Gstm3, Gstm4
rno05204: Chemical carcinogenesis550.02.14E−6Mgst2, Gsta5, Gstp1, Gstm3, Gstm4
rno04918: Thyroid hormone synthesis220.00.076Gpx2, Gsr
Table 9. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-1 discovered by the distance and HC method combination Euclidean: ward, Dataset: PPAR signaling pathway at 24 h time point.
Table 9. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-1 discovered by the distance and HC method combination Euclidean: ward, Dataset: PPAR signaling pathway at 24 h time point.
TermCount%p-ValueGenes
rno03320: PPAR signaling pathway1376.474.88E−24Cpt1b, Aqp7, Cpt1c, Cpt1a, Cyp4a3, Cpt1a, Cpt2, Cyp8b1, Fabp3, Ehhadh, Acaa1a, Cyp4a1, Angptl4, Fabp5
rno00071: Fatty acid degradation847.063.16E−13Cpt1b, Cpt2, Ehhadh, Acaa1a, Cpt1c, Cpt1a, Cyp4a3, Cpt1a, Cyp4a1
rno01212: Fatty acid metabolism635.291.67E−8Cpt1b, Cpt2, Ehhadh, Acaa1a, Cpt1c, Cpt1a, Cpt1a
rno04920: Adipocytokine signaling pathway317.650.0067Cpt1b, Cpt1c, Cpt1a, Cpt1a
rno04922: Glucagon signaling pathway317.650.0117Cpt1b, Cpt1c, Cpt1a, Cpt1a
rno04152: AMPK signaling pathway317.650.0187Cpt1b, Cpt1c, Cpt1a, Cpt1a
rno01100: Metabolic pathways635.290.0500Me1, Me1, Cyp8b1, Ehhadh, Acaa1a, Cyp4a3, Cyp4a1
rno00280: Valine, leucine and isoleucine degradation211.760.0885Ehhadh, Acaa1a
Table 10. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-1 discovered by the distance and HC method combination Euclidean: ward, Dataset: PPAR signaling pathway at 3 h, 6 h, 9 h, and 24 h time points.
Table 10. Functional annotation of KEGG pathway on the biomarker genes in co-cluster-1 discovered by the distance and HC method combination Euclidean: ward, Dataset: PPAR signaling pathway at 3 h, 6 h, 9 h, and 24 h time points.
TermCount%p-ValueGenes
rno03320: PPAR signaling pathway763.635.49E−12Cpt1b, Cpt2, Ehhadh, Acaa1a, Cyp4a3, Angptl4, Cyp4a1
rno00071: Fatty acid degradation654.541.37E−10Cpt1b, Cpt2, Ehhadh, Acaa1a, Cyp4a3, Cyp4a1
rno01212: Fatty acid metabolism436.361.09E−5Cpt1b, Cpt2, Ehhadh, Acaa1a
rno01100: Metabolic pathways545.450.0172Me1, Ehhadh, Acaa1a, Cyp4a3, Cyp4a1
rno00280: Valine, leucine and isoleucine degradation218.180.0486Ehhadh, Acaa1a
rno00590: Arachidonic acid metabolism218.180.0709Cyp4a3, Cyp4a1
rno00830: Retinol metabolism218.180.0726Cyp4a3, Cyp4a1
rno04146: Peroxisome218.180.0743Ehhadh, Acaa1a
rno04750: Inflammatory mediator regulation of TRP channels218.180.0994Cyp4a3, Cyp4a1

Share and Cite

MDPI and ACS Style

Hasan, M.N.; Malek, M.B.; Begum, A.A.; Rahman, M.; Mollah, M.N.H. Assessment of Drugs Toxicity and Associated Biomarker Genes Using Hierarchical Clustering. Medicina 2019, 55, 451. https://doi.org/10.3390/medicina55080451

AMA Style

Hasan MN, Malek MB, Begum AA, Rahman M, Mollah MNH. Assessment of Drugs Toxicity and Associated Biomarker Genes Using Hierarchical Clustering. Medicina. 2019; 55(8):451. https://doi.org/10.3390/medicina55080451

Chicago/Turabian Style

Hasan, Mohammad Nazmol, Masuma Binte Malek, Anjuman Ara Begum, Moizur Rahman, and Md. Nurul Haque Mollah. 2019. "Assessment of Drugs Toxicity and Associated Biomarker Genes Using Hierarchical Clustering" Medicina 55, no. 8: 451. https://doi.org/10.3390/medicina55080451

Article Metrics

Back to TopTop