Next Article in Journal
Studies on the Dual Activity of EGFR and HER-2 Inhibitors Using Structure-Based Drug Design Techniques
Next Article in Special Issue
BGFE: A Deep Learning Model for ncRNA-Protein Interaction Predictions Based on Improved Sequence Information
Previous Article in Journal
Heterogeneity in Colorectal Cancer: A Challenge for Personalized Medicine?
Previous Article in Special Issue
Prediction of Signal Peptides in Proteins from Malaria Parasites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dual Convolutional Neural Network Based Method for Predicting Disease-Related miRNAs

1
School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
2
School of Information Science and Technology, Heilongjiang University, Harbin 150080, China
3
School of Mathematical Science, Heilongjiang University, Harbin 150080, China
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2018, 19(12), 3732; https://doi.org/10.3390/ijms19123732
Submission received: 29 October 2018 / Revised: 15 November 2018 / Accepted: 19 November 2018 / Published: 23 November 2018
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)

Abstract

:
Identification of disease-related microRNAs (disease miRNAs) is helpful for understanding and exploring the etiology and pathogenesis of diseases. Most of recent methods predict disease miRNAs by integrating the similarities and associations of miRNAs and diseases. However, these methods fail to learn the deep features of the miRNA similarities, the disease similarities, and the miRNA–disease associations. We propose a dual convolutional neural network-based method for predicting candidate disease miRNAs and refer to it as CNNDMP. CNNDMP not only exploits the similarities and associations of miRNAs and diseases, but also captures the topology structures of the miRNA and disease networks. An embedding layer is constructed by combining the biological premises about the miRNA–disease associations. A new framework based on the dual convolutional neural network is presented for extracting the deep feature representation of associations. The left part of the framework focuses on integrating the original similarities and associations of miRNAs and diseases. The novel miRNA and disease similarities which contain the topology structures are obtained by random walks on the miRNA and disease networks, and their deep features are learned by the right part of the framework. CNNDMP achieves the superior prediction performance than several state-of-the-art methods during the cross-validation process. Case studies on breast cancer, colorectal cancer and lung cancer further demonstrate CNNDMP’s powerful ability of discovering potential disease miRNAs.

Graphical Abstract

1. Introduction

miRNAs are non-coding single-stranded RNA molecules encoded by endogenous genes with a length of about 22 nucleotides. miRNAs exert their biological functions primarily via regulating the expression of target genes (mRNAs). miRNAs usually target to a specific sequence in the 3′ untranslated terminal of mRNAs, inhibiting the translation of the target genes [1,2,3,4,5]. With the development of molecular biology and biotechnology, scientists find that the abnormal expression of miRNAs is closely related to various human diseases [6,7,8]. Therefore, predicting the potential disease-associated miRNAs is of great significance for understanding disease etiology and pathogenesis.
In recent years, several computational methods have been proposed for predicting disease-associated miRNAs, which can be classified into two main categories in general. miRNAs implement their biological functions by regulating the expression of their target mRNAs [9]. Therefore, the first category of methods is based on target genes to predict the potential associations between diseases and miRNAs. Jiang et al. [10] estimated the functional similarities of miRNAs through the number of target genes co-associated with miRNAs. The similarities among diseases is measured according to the phenotype of the disease, and the known miRNA–disease associations are combined to predict the potential miRNA–disease associations. However, the number of experimentally validated target genes is not sufficient, which cannot provide sufficient and effective data to support the prediction. Li et al. [11] used target gene prediction software TargetScan [12], MiRanda [13], and PITA [14] to predict target genes that a certain miRNA might regulate. The disease-related miRNAs are then predicted by measuring the functional consistency between the predicted target genes and existing disease-related genes. As the false-positive rate of target genes predicted by the software are very high, it is difficult for this method to achieve high prediction accuracy. The methods in the second category are based on the biological observation that miRNAs with similar functions are usually associated with similar diseases and vice versa [15,16,17,18]. Xuan et al. [19] and Xiao et al. [20] proposed the method based on non-negative matrix factorization from the similarity and association perspective of miRNAs and diseases. Liu et al. [21] and Liao et al. [22] proposed the method of predicting miRNA–disease associations via random walking in networks composed of multiple data sources. Zeng et al. [23] proposed a disease miRNA prediction algorithm based on the structural perturbation method. Chen et al. [24] and Zhang et al. [25] proposed a path-based method for predicting miRNAs that are associated with diseases. Ding et al. [26] integrated known miRNA–disease associations and experimentally validated miRNA–target associations and proposed a prediction method based on a disease–miRNA–target heterogeneous network. As these methods are based on the traditional computing model [27,28,29], it is difficult to extract the deep feature representation from the multiple kinds of data.
There are limited associations between miRNAs and diseases, so their associations are sparse. The similarities between diseases are also sparse. Since convolutional neural networks (CNNs) are suitable for dealing with this kind of sparse data [30], we propose a CNN-based prediction method. The topological structures of miRNAs and diseases are also very important for miRNA–disease association prediction. Therefore, we construct a dual CNN-based prediction model to learn the depth feature representation in sparse data and capture the topological information in miRNA and disease networks.

2. Results and Discussion

2.1. Performance Evaluation Metrics

Considering that most of the diseases in the HMDD database are only associated with a few miRNAs, they are not sufficient to evaluate the prediction performance of our method. Therefore, we performed five-fold cross-validation on the 15 diseases associated with more than 90 miRNAs to compare the prediction performance between CNNDMP and several state-of-the-art methods. First, we regard the known miRNA–disease associations as positive samples, and randomly divide them into five equal parts, and the unknown associations are regarded as negative samples. The negative samples (whose quantity is equal to that of the positive samples) are selected randomly from all the negative ones. These negative samples are also divided into five equal parts. Four parts of positive samples and four parts of negative samples are used as the training data in each-fold cross-validation. The remaining positive and the remaining negative samples are used as the testing data to verify the prediction performance.
We can obtain the association prediction scores in the testing data via the CNN prediction model and sort them by their values in descending order. If a known association exists between a pair of miRNA–disease sample, and the prediction score of the association is higher than the given threshold δ , it is a successfully identified positive sample. If the prediction score of a negative sample is lower than δ , it is a successfully identified negative sample. By changing the threshold, we can calculate the corresponding true positive rate (TPR), false positive rate (FPR), precision (Precision) and recall rate (Recall). They are defined as follows,
T P R = T P T P + F N , F P R = F P T P + F P
P r e c i s i o n = T P T P + F P , R e c a l l = T P T P + F N
where TP and TN represent the number of positive and negative samples correctly identified, FP represents the number of negative samples misidentified as positive samples, and FN represents the number of positive samples misidentified as negative samples. Each time the threshold δ is changed, the corresponding TPR and FPR values, as well as the Precision and Recall values, are obtained. The receiver operating feature curve (ROC) and the precision–recall curve (PR) are then drawn using these values. The areas under the ROC curve (ROC-AUC) and the PR curve (PR-AUC) are used to evaluate the whole prediction performance.
Biologists usually select the top-ranked miRNA candidates from the prediction result to further validate their associations with the disease. Therefore, we calculate the average recall values of the top 30, 60, 90–210 and 240 candidates for 15 diseases. Through the recall, we compare how many positive samples appear in the top k candidates in different methods. The larger the recall value, the more positive samples are identified successfully.

2.2. Comparison with Other Methods

CNNDMP is compared with GSTRW [22], DMPred [19], PBMDA [24] and Liu’s Method [21], which are state-of-the-art prediction methods for miRNA–disease associations. The parameters involved in each method need to be adjusted to achieve the best prediction performance. In our method, w f , w p and d are set to 3, 2 and 11, respectively. Each convolutional layer contains 20 convolution filters, so n c o n v is set to 20. The restart probability β of random walk is 0.8, and the harmonic parameter λ is set to 0.9. λ varies from 0.1 to 0.9, and the corresponding performances of CNNDMP are listed in Table 1. For the other methods, we use the parameters mentioned in the corresponding papers ( γ = θ = 0.2 , α = β = 0.8 , λ = η = 0.2 , w = 0.6 for GSTRW, L = 3 , α = 2.26 for PBMDA, λ M = 1 70 , λ D = 1 10 , θ = 1 20 for DMPred, λ = 0.8 , δ = 0.9 , η = 0.1 , γ = 0.5 for Liu’s Method).
The AUC-ROC values of the five methods (CNNDMP, GSTRW, DMPred, PBMDA, and Liu’s Method) for 15 diseases are 0.956, 0.802, 0.917, 0.844, and 0.865, respectively (Table 2, Figure 1). CNNDMP achieved the best prediction performance, and its average AUC-ROC is 0.956, which is higher by 15.4%, 3.9%, 11.2%, and 9.1% compared to the other four methods, respectively. The miRNA–disease association scores of GSTRW are dependent on the calculation of miRNA similarities and disease similarities. Therefore, GSTRW performs the worst in all methods. The performance of PBMDA is similar to that of Liu’s Method as they all exploit the network topology information. DMPred utilizes miRNA- and disease-related information and achieves a competitive predictive performance. Our method, CNNDMP, completely integrates the original feature of miRNAs, diseases and network topology, combines them with the powerful representation learning capability of CNN and achieves the best prediction performance.
There are far more unobserved miRNA–disease associations than known ones, so there is a serious class imbalance between them. For the imbalanced associations, the PR curves are better than ROC curves in reflecting the prediction performance of different methods. Figure 2 shows the PR curves of CNNDMP, GSTRW, DMPred, PBMDA and Liu’s Method for 15 diseases. Their PR-AUCs are 0.538, 0.177, 0.392, 0.324, and 0.334, respectively. The PR-AUC of CNNDMP is 36.1%, 14.6%, 21.4%, and 20.4% higher than the other methods. As shown in Table 3, CNNDMP yields the best average performance in terms of PR-AUCs and achieves the best performance for 14 of 15 common diseases.
For the top k miRNA candidates, the higher recall rate means that there are more positive samples successfully identified. Figure 3 shows the average recall rates for 15 diseases in the top k miRNA candidates. CNNDMP’s recall rates for the top 30 to 240 candidate results are 0.629, 0.878, 0.966, 0.990, 0.998, 0.999, 1.0, and 1.0, respectively. The results in Figure 1, Figure 2 and Figure 3 and Table 2 and Table 3 show that our method is indeed effective in discovering potential disease miRNAs.
In addition, to further verify that the ROC-AUC and PR-AUC of CNNDMP are significantly higher than the other methods, we performed a paired t-test. All paired t-test results are less than 0.05, which indicates that CNNDMP’s performance is significantly better than the other methods (Table 4).

2.3. Comparison between the Individual Networks and the Integrated Network

To verify that the performance of the integrated network is better than the individual networks, we evaluate the prediction performances of the left and right networks within CNNDMP, respectively. The values of ROC-AUC and PR-AUC of the left network are 0.916 and 0.509, respectively. For the right network, the values of ROC-AUC and PR-AUC are 0.905 and 0.494, respectively. Compared with the left and right networks, the ROC-AUC of the integrated network increased by 4% and 5.1%, and the PR-AUC increased by 2.9% and 4.4%.

2.4. Case Studies on Breast Cancer, Colorectal Cancer and Lung Cancer

To further demonstrate CNNDMP’s ability to discover potential disease-associated miRNAs, we used three independent databases, dbDEMC [31], miRCancer [32], and PhenomiR [33], as well as the relevant literature to verify the candidates of breast cancer, colorectal cancer and lung cancer. We take the prediction results of breast cancer as an example, and list the results of this case analysis in detail.
We list the case study of the top 50 miRNA candidates related to breast cancer in Table 5. dbDEMC is a database of differentially expressed miRNAs in human cancers, and it contains 2224 differentially expressed miRNAs in 36 cancer types. Forty-three of the 50 miRNA candidates are included in this database, which confirmed the differential expression of these candidates in breast cancer. PhenomiR is also a database of differentially expressed miRNAs in human cancers. miRCancer is a miRNA–cancer associations database that collects 6323 miRNA–cancer associations from 4875 academic papers covering 184 cancers. PhenomiR includes two miRNA candidates, and miRCancer contains two candidates. Five miRNA candidates are confirmed in the relevant literature.
The top 50 colorectal cancer-related candidates are given in Supplementary Table S1. The databases of dbDEMC and miRCancer respectively include 48 candidates and one candidate whose abnormal expressions have been identified in colorectal cancer. A candidate marked ‘Unconfirmed’ means that it is not currently supported by the databases and the relevant literature.
In terms of lung cancer, the top 50 candidates are listed in Supplementary Table S2. Forty candidates are included in dbDEMC and three candidates are contained by miRCancer which have abnormal expression in lung cancer. A candidate is supported by PhenomiR to have abnormal regulation in lung cancer. Four candidates are supported by the relevant literature to be differentially expressed in lung cancer. Three candidates marked ‘Unconfirmed’ are not currently supported by the databases and the relevant literature. The case studies on the three diseases confirm that the CNNDMP has a powerful ability to discover potential disease miRNAs.

2.5. Predicting Novel Disease-Related miRNAs

By comparing the ROC curve, PR curve and the recall rate of the top k candidates for the five methods by cross-validation, CNNDMP has achieved the best prediction performance. Subsequent case analysis results further confirm that CNNDMP has good prediction performance in discovering the associations between miRNAs and diseases. Therefore, we further apply this method to all 326 diseases. We take all the positive samples and the corresponding negative samples as training data. Finally, the top 100 miRNA candidates for each disease are given in Supplementary Table S3.

3. Materials and Methods

3.1. Dataset

The miRNA–disease associations used in this study derive from the human miRNA–disease database (HMDD) [39]. HMDD has collected thousands of reliable association pairs between miRNAs and diseases. After integrating different miRNA records and unifying the miRNA and disease names, we finally retained 5088 miRNA–disease associations, involving 490 miRNAs and 326 diseases. Disease terms are available from the National Library of Medicine (http://www.ncbi.nlm.nih.gov/mesh). The phenotypic similarities and the semantic similarities are obtained from a published study [18].

3.2. Construction of a miRNA–Disease Heterogeneous Network

miRNA similarity measurement. Based on the biological observation that miRNAs with similar functions usually tend to be associated with similar diseases, the similarity of two miRNAs is estimated by measuring the similarities of their associated diseases. For example, miRNA m a is associated with diseases d 1 , d 3 , d 5 , d 6 , and d 7 , whereas miRNA m b is associated with diseases d 2 , d 3 , d 4 , and d 6 . Wang et al. [40] calculated the similarity between S a = { d 1 , d 3 , d 5 , d 6 , d 7 } and S b = { d 2 , d 3 , d 4 , d 6 } as the similarity of m a and m b , denoted as M ( m a , m b ) . The similarity between S a and S b includes the following three steps: first, the similarities between d 1 and each of the diseases in S b are calculated, and the maximum similarity is taken as the similarity between d 1 and S b . Similarly, the similarities between d 3 , d 5 , d 6 , d 7 and S b are obtained, respectively. Second, the similarities between each of diseases in S b and S a are calculated. Finally, these similarities are accumulated and divided by the total number of diseases in S a and S b . We use the matrix M R N m × N m to represent the similarities of miRNAs, where N m is the number of miRNAs. The values of miRNA similarities are distributed between 0 and 1.
Disease similarity measurement. The disease similarity measures how similar they are from the perspectives of disease semantics and phenotype. The terms related to a disease are represented by a directed acyclic graph (DAG). If there are more common terms between the DAGs of two diseases, it means that the two diseases are more similar. At the same time, two diseases that share more common phenotypes are often more similar. Therefore, we quantify the similarity of two diseases based on their semantics and phenotype. Xuan et al. have successfully integrated this information and calculated the similarities between diseases. Therefore, disease similarities can be obtained from published studies [19,41]. We use the matrix D R N d × N d to represent the similarities between diseases and values of the similarities vary from 0 and 1, where N d represents the number of diseases.
miRNA–disease associations. If miRNA m i is associated with disease d j then A i j = 1 , or A i j = 0 when their association has not been observed. We use A R N m × N d to represent the associations between miRNAs and diseases.
By exploiting the similarities of miRNAs and diseases, as well as the known associations between miRNAs and diseases, we construct a heterogeneous network including two kinds of nodes (miRNAs and diseases), and the matrix representation of the network (Figure 4).

3.3. Prediction Model Based on Dual CNN

We construct a prediction model based on dual CNN, which is composed of left and right parts. The left part learns from the original feature information of miRNAs and diseases. The complex, implicit and nonlinear miRNA–disease feature information is captured by the CNN layer. The right part combines miRNA and disease network topology information and represents it deeply by the CNN layer. Finally, we integrate the results of the left and right to obtain final prediction scores for disease-associated miRNAs.

3.3.1. Embedding Layer

Embedding in the left part by integrating miRNA and disease original feature information. Functionally similar miRNAs are usually involved in similar diseases and vice versa. Therefore, we integrate miRNA and disease similarities and miRNA–disease associations to construct the embedding in the left part. We take the miRNA m 1 and disease d 2 in Figure 5 as an example to elaborate the integration process. The first row of M represents the similarities between m 1 and all the miRNAs, and the second row of A T represents the associations between d 2 and all the miRNAs. The miRNA m 1 is similar to m 2 and m 4 , and the disease d 2 has a known association with m 2 , m 4 and m 5 . Thus, miRNA m 1 and disease d 2 are likely to be associated. Similarly, we integrate the first row of A with the second row of D . Among them, miRNA m 1 is associated with d 1 , d 3 and d 6 , and disease d 2 is similar to d 1 and d 3 , so miRNA m 1 and disease d 2 are likely to be associated. The final integration result is represented by the feature matrix X R 2 × ( N m + N d ) .
Embedding in the right part by integrating the networks topology. We firstly obtain network topology information by random walking in the miRNA and disease networks, respectively. The basic principle of a random walk with restart is that the walker starts from a node in the network at 0th time and walks randomly in the miRNA (or disease) network. When the current node of the walker is more similar to a neighbor node, the probability that the walker turns to it is greater. Therefore, after the walking process converges, the probability that the walker reaches a certain node is greater, indicating that the node is more similar to the starting node. We define the convergent vector as p , which represents the similarities between the starting node and all the nodes.
We take the miRNA network as an example to illustrate its computational process in detail. Firstly, we need to row-normalize the original miRNA similarities matrix M to obtain the probabilistic transfer matrix W. Then, based on the following random walk with restart iteration formula,
p ( t + 1 ) = ( 1 β ) W T p ( t ) + β p ( 0 )
the network topology-based miRNA similarities are obtained. Taking miRNA m 1 as an example, the current random walk from node m 1 , the first element of p ( 0 ) is then set to 1 and the other elements are 0. The parameter β ( 0 , 1 ) represents the probability that the walker returns to the starting node m 1 for re-walking. W T is the transposed matrix of W , p ( t ) represents the probability that the walker arrives at each miRNA node at time t , and p ( t + 1 ) represents the arrival probability at time t + 1 . After the walking process is converged, the vector p m 1 is obtained and regarded as a part of the embedding in the right part. When L 1 norm between p ( t + 1 ) and p ( t ) is less than 10 6 , the convergence condition is satisfied. Similarly, in the disease network, we randomly walk from the disease d 2 node, and finally get the vector p d 2 as a part of the right embedding.
We integrate the similarity and association information of miRNA m 1 and disease d 2 based on network topology to form the embedding in Figure 6. The final integration result is represented by the feature matrix Y R 2 × ( N m + N d ) .

3.3.2. Convolutional Module on the Left

We treat the embedding X R 2 × ( N m + N d ) as the input data of the CNN module to learn the original feature representation (Figure 7). For the convolutional layer, we set the length and width of a convolution filter to w f and d , and the n c o n v convolution filters can be represented as W c o n v R w f × d × n c o n v . We apply W c o n v to X to get the feature output Z 1 R 2 × ( N m + N d w f + 1 ) × n c o n v ,
X c o n v , i = ( X i 1 , X i 2 , , X i ( i 1 + w f 1 ) ) , X c o n v , i R W f × d
Z 1 ( 2 , i , j ) = g ( X c o n v , i W c o n v ( : , : , j ) + b c o n v ( j ) ) i [ 1 , N m + N d w f + 1 ] , j [ 1 , n c o n v ]
where Z 1 ( 2 , i , j ) is the convolution result when the jth convolution filter slides to the ith position of X , and g is a nonlinear activation function (relu). b c o n v is a bias vector, and X i 1 is the first column vector in the sliding window when the filter moves to the ith position of X . In the pooling layer, the max-pooling operation is performed on Z 1 to get Q 1 R 2 × 1 2 ( N m + N d ) × n c o n v ,
Q 1 ( 2 , p , j ) = max ( Z 1 ( 2 , r , j ) , Z 1 ( 2 , r + w p 1 , j ) )
where Q 1 ( 2 , p , j ) is the pooling value of the pth position of the jth convolution filter and w p is the sliding window length of the pooling operation. We use Q 1 as the input of the second convolutional layer, and obtain the output of the second pooling layer Q 2 R 2 × 1 4 ( N m + N d ) × 2 n c o n v . Similarly, Q 2 as the input of the third convolutional layer can obtain Q 3 R 2 × 1 8 ( N m + N d ) × 3 n c o n v . Finally, we flatten Q 3 to a column vector q R v × 1   ( v = 2 × 1 8 ( N m + N d ) × 3 n c o n v ) , and obtain the association prediction score of m 1 and d 2 through the fully connected layer. The score is defined as s c o r e 1 R 2 × 1 ,
score 1 = H × q
where H R 2 × v is a weight matrix between the fully connected layer and the output layer.

3.3.3. Convolutional Module on the Right

The embedding Y R 2 × ( N m + N d ) in the right part is input to learn the feature representation of the network topology (Figure 7). The convolution and pooling processes of the right part are similar to that in the left part. The convolutional operation Z 2 and the max-pooling operation U 1 are defined as follows,
Y c o n v , i = ( Y i 1 , Y i 2 , , Y i ( i 1 + w f 1 ) ) , Y c o n v , i R w f × d
Z 2 ( 2 , i , j ) = g ( Y c o n v , i * W c o n v ( : , : , j ) + b c o n v ( j ) )
U 1 ( 2 , p , j ) = max ( Z 2 ( 2 , r , j ) , Z 2 ( 2 , r + w p 1 , j ) )
where Z 2 is the feature output of the convolution operation and Y i 1 is the first column vector in the sliding window when the filter moves to the ith position of Y . U 1 is obtained by performing the max-pooling operation on Z 2 . We use U 1 as the input of the second convolutional layer, and obtain the output of the second pooling layer U 2 R 2 × 1 4 ( N m + N d ) × 2 n c o n v . Similarly, U 2 as the input of the third convolutional layer can obtain U 3 R 2 × 1 8 ( N m + N d ) × 3 n c o n v . Finally, we flatten U 3 to the column vector p R v × 1 , ( v = 2 × 1 8 ( N m + N d ) × 3 n c o n v ) and get the association score between m 1 and d 2 by the fully connected layer. The score is defined as s c o r e 2 R 2 × 1 ,
score 2 = K × p
where K R 2 × v is the weight matrix between the fully connected layer and the output layer.

3.3.4. Combined Strategy

The association scores s c o r e 1 and s c o r e 2 are obtained from different perspectives of miRNA–disease information. To take complete advantage of the prediction results from the left and right parts, we integrate the two scores as the final association score between a miRNA and a disease. It is defined as follows,
score = λ × score 1 + ( 1 λ ) × score 2
where the parameter λ ( 0 , 1 ) is used to adjust the importance of s c o r e 1 and s c o r e 2 . The loss functions of the left and right CNNs are defined as l o s s 1 and l o s s 2 ,
l o s s 1 = i = 1 T [ y l a b e l × log a + ( 1 y l a b e l ) × log ( 1 a ) ]
a = e s c o r e 1 ( 2 ) j = 1 2 e s c o r e 1 ( j )
l o s s 2 = i = 1 T [ y l a b e l × log b + ( 1 y l a b e l ) × log ( 1 b ) ]
b = e s c o r e 2 ( 2 ) j = 1 2 e s c o r e 2 ( j )
where y l a b e l indicates the actual association between a miRNA and a disease. y l a b e l is 1 when the miRNA is associated with the disease, otherwise y l a b e l is 0. s c o r e 1 ( 1 ) and s c o r e 1 ( 2 ) represent the scores of miRNA–disease associations that are classified as the negative sample and the positive one, respectively. a and b indicate the corresponding probabilities obtained by the softmax function. T represents the number of training samples.

4. Conclusions

A novel method based on a dual convolutional neural network, CNNDMP, is developed for prioritizing potential disease miRNAs. CNNDMP’s embedding layer is constructed from the biological perspective by combining the biological premise about miRNA–disease associations. At the same time, the embedding layer captures the original similarities and associations of miRNAs and diseases, as well as the topology structure of the miRNA and disease networks. The new framework based on a dual convolutional neural network is constructed for learning the deep features of the original similarities and associations of miRNAs and diseases, and the new miRNA and disease similarities. The results of cross-validation on 15 common diseases confirms CNNDMP’s superior performance. The case studies on three diseases further show that CNNDMP has a strong ability to discover candidate disease miRNAs.

Supplementary Materials

The following are available online at https://www.mdpi.com/1422-0067/19/12/3732/s1.

Author Contributions

P.X. and T.Z. conceived the prediction method, and P.X. wrote the paper. Y.D. and Y.L. developed the computer programs. Y.G. and T.Z. analyzed the results and revised the paper.

Funding

The work was supported by the Natural Science Foundation of Heilongjiang Province (F2015013, F2017024), the Fundamental Research Foundation of Universities in Heilongjiang Province for Technology Innovation (KJCX201805), the Fundamental Research Foundation of Universities in Heilongjiang Province for Youth Innovation Team (RCYJTD201805), the Postdoctoral Science Foundation of Heilongjiang Province, and the Young Innovative Talent Research Foundation of Harbin Science and Technology Bureau (2017RAQXJ094, 2015RAQXJ004, 2016RQQXJ135).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Meister, G.; Tuschl, T. Mechanisms of gene silencing by double-stranded RNA. Nature 2004, 431, 343. [Google Scholar] [CrossRef] [PubMed]
  2. Bartel, D.P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 2004, 116, 281–297. [Google Scholar] [CrossRef]
  3. Ambros, V. microRNAs: Tiny regulators with great potential. Cell 2001, 107, 823–826. [Google Scholar] [CrossRef]
  4. Ambros, V. The functions of animal microRNAs. Nature 2004, 431, 350. [Google Scholar] [CrossRef] [PubMed]
  5. Xu, Y.; Guo, M.; Liu, X.; Wang, C.; Liu, Y.; Liu, G. Identify bilayer modules via pseudo-3D clustering: Applications to miRNA-gene bilayer networks. Nucleic Acids Res. 2016, 44, e152. [Google Scholar] [CrossRef] [PubMed]
  6. Calin, G.A.; Croce, C.M. MicroRNA-cancer connection: The beginning of a new tale. Cancer Res. 2006, 66, 7390–7394. [Google Scholar] [CrossRef] [PubMed]
  7. Meola, N.; Gennarino, V.A.; Banfi, S. MicroRNAs and genetic diseases. Pathogenetics 2009, 2, 7. [Google Scholar] [CrossRef] [PubMed]
  8. Sayed, D.; Abdellatif, M. MicroRNAs in development and disease. Physiol. Rev. 2011, 91, 827–887. [Google Scholar] [CrossRef] [PubMed]
  9. Pasquinelli, A.E. MicroRNAs and their targets: Recognition, regulation and an emerging reciprocal relationship. Nat. Rev. Genet. 2012, 13, 271. [Google Scholar] [CrossRef] [PubMed]
  10. Jiang, Q.; Hao, Y.; Wang, G.; Juan, L.; Zhang, T.; Teng, M.; Liu, Y.; Wang, Y. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst. Boil. 2010, 4, S2. [Google Scholar] [CrossRef] [PubMed]
  11. Li, X.; Wang, Q.; Zheng, Y.; Lv, S.; Ning, S.; Sun, J.; Huang, T.; Zheng, Q.; Ren, H.; Xu, J.; et al. Prioritizing human cancer microRNAs based on genes’ functional consistency between microRNA and cancer. Nucleic Acids Res. 2011, 39, e153. [Google Scholar] [CrossRef] [PubMed]
  12. Lewis, B.P.; Shih, I.H.; Jones-Rhoades, M.W.; Bartel, D.P.; Burge, C.B. Prediction of mammalian microRNA targets. Cell 2003, 115, 787–798. [Google Scholar] [CrossRef]
  13. John, B.; Enright, A.J.; Aravin, A.; Tuschl, T.; Sander, C.; Marks, D.S. Human microRNA targets. PLoS Biol. 2004, 2, e363. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Kertesz, M.; Iovino, N.; Unnerstall, U.; Gaul, U.; Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 2007, 39, 1278. [Google Scholar] [CrossRef] [PubMed]
  15. Chen, X.; Yan, C.C.; Zhang, X.; You, Z.H.; Deng, L.; Liu, Y.; Zhang, Y.; Dai, Q. WBSMDA: Within and between score for miRNA-disease association prediction. Sci. Rep. 2016, 6, 21106. [Google Scholar] [CrossRef] [PubMed]
  16. Pasquier, C.; Gardès, J. Prediction of miRNA-disease associations with a vector space model. Sci. Rep. 2016, 6, 27036. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Li, J.Q.; Rong, Z.H.; Chen, X.; Yan, G.Y.; You, Z.H. MCMDA: Matrix completion for miRNA-disease association prediction. Oncotarget 2017, 8, 21187. [Google Scholar] [CrossRef] [PubMed]
  18. Lan, W.; Wang, J.; Li, M.; Liu, J.; Wu, F.X.; Pan, Y. Predicting microRNA-disease associations based on improved microRNA and disease similarities. IEEE/ACM Trans. Comput. Boil. Bioinform. 2016. [Google Scholar] [CrossRef] [PubMed]
  19. Zhong, Y.; Xuan, P.; Wang, X.; Zhang, T.; Li, J.; Liu, Y.; Zhang, W. A non-negative matrix factorization based method for predicting disease-associated miRNAs in miRNA-disease bilayer network. Bioinformatics 2017, 34, 267–277. [Google Scholar] [CrossRef] [PubMed]
  20. Xiao, Q.; Luo, J.; Liang, C.; Cai, J.; Ding, P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 2017, 34, 239–248. [Google Scholar] [CrossRef] [PubMed]
  21. Liu, Y.; Zeng, X.; He, Z.; Zou, Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans. Comput. Boil. Bioinform. 2017, 14, 905–915. [Google Scholar] [CrossRef] [PubMed]
  22. Chen, M.; Liao, B.; Li, Z. Global Similarity Method Based on a Two-tier Random Walk for the Prediction of microRNA–Disease Association. Sci. Rep. 2018, 8, 6481. [Google Scholar] [CrossRef] [PubMed]
  23. Zeng, X.; Liu, L.; Lü, L.; Zou, Q.; Valencia, A. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics 2018, 1, 8. [Google Scholar] [CrossRef] [PubMed]
  24. You, Z.H.; Huang, Z.A.; Zhu, Z.; Yan, G.Y.; Li, Z.W.; Wen, Z.; Chen, X. PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput. Boil. 2017, 13, e1005455. [Google Scholar] [CrossRef] [PubMed]
  25. Zhang, X.; Zou, Q.; Rodriguez-Paton, A. Meta-path methods for prioritizing candidate disease miRNAs. IEEE/ACM Trans. Comput. Boil. Bioinform. 2017. [Google Scholar] [CrossRef]
  26. Ding, L.; Wang, M.; Sun, D.; Li, A. A novel method for identifying potential disease-related miRNAs via a disease–miRNA–target heterogeneous network. Mol. BioSyst. 2017, 13, 2328–2337. [Google Scholar] [CrossRef] [PubMed]
  27. Zeng, X.; Zhang, X.; Zou, Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief. Bioinform. 2015, 17, 193–203. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Zou, Q.; Li, J.; Song, L.; Zeng, X.; Wang, G. Similarity computation strategies in the microRNA-disease network: A survey. Brief. Funct. Genom. 2015, 15, 55–64. [Google Scholar] [CrossRef] [PubMed]
  29. Zou, Q.; Chen, L.; Huang, T.; Zhang, Z.; Xu, Y. Machine learning and graph analytics in computational biomedicine. Artif. Intell. Med. 2017, 83, 1. [Google Scholar] [CrossRef] [PubMed]
  30. Xu, Y.; Wang, Y.; Luo, J.; Zhao, W.; Zhou, X. Deep learning of the splicing (epi)genetic code reveals a novel candidate mechanism linking histone modifications to ESC fate decision. Nucleic Acids Res. 2017, 45, 12100–12112. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Yang, Z.; Ren, F.; Liu, C.; He, S.; Sun, G.; Gao, Q.; Yao, L.; Zhang, Y.; Miao, R.; Cao, Y.; et al. dbDEMC: A database of differentially expressed miRNAs in human cancers. BMC Genom. 2010, 11, S5. [Google Scholar] [CrossRef] [PubMed]
  32. Xie, B.; Ding, Q.; Han, H.; Wu, D. miRCancer: A microRNA–cancer association database constructed by text mining on literature. Bioinformatics 2013, 29, 638–644. [Google Scholar] [CrossRef] [PubMed]
  33. Ruepp, A.; Kowarsch, A.; Schmidl, D.; Buggenthin, F.; Brauner, B.; Dunger, I. PhenomiR: A knowledgebase for microRNA expression in diseases and biological processes. Genome Boil. 2010, 11, R6. [Google Scholar] [CrossRef] [PubMed]
  34. Wang, X.; Jiang, D.R.; Xu, C.C.; Zhu, G.L.; Wu, Z.S.; Wu, Q. Differential expression profile analysis of miRNAs with HER-2 overexpression and intervention in breast cancer cells. Int. J. Clin. Exp. Pathol. 2017, 10, 5039–5062. [Google Scholar]
  35. Maltseva, D.V.; Galatenko, V.V.; Samatov, T.R.; Zhikrivetskaya, S.O.; Khaustova, N.A.; Nechaev, I.N.; Shkurnikov, M.U.; Lebedev, A.E.; Mityakina, I.A.; Kaprin, A.D.; et al. miRNome of inflammatory breast cancer. BMC Res. Notes 2014, 7, 871. [Google Scholar] [CrossRef] [PubMed]
  36. Hu, J.Y.; Yi, W.; Zhang, M.Y.; Xu, R.; Zeng, L.S.; Long, X.R.; Zhou, X.; Zheng, X.; Kang, Y.; Wang, H.Y. MicroRNA-711 is a prognostic factor for poor overall survival and has an oncogenic role in breast cancer. Oncol. Lett. 2016, 11, 2155–2163. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Xu, J.; Zhou, X.; Wong, C.W. Genome-wide identification of estrogen receptor alpha regulated miRNAs using transcription factor binding data. Bioinform.-Trends Methodol. 2011. [Google Scholar] [CrossRef]
  38. Sun, Y.; Su, B.; Zhang, P.; Xie, H.; Zheng, H.; Xu, Y.; Du, Q.; Zeng, H.; Zhou, X.; Chen, C.; et al. Expression of miR-150 and miR-3940-5p is reduced in non-small cell lung carcinoma and correlates with clinicopathological features. Oncol. Rep. 2013, 29, 704–712. [Google Scholar] [CrossRef] [PubMed]
  39. Li, Y.; Qiu, C.; Tu, J.; Geng, B.; Yang, J.; Jiang, T.; Cui, Q. HMDD v2.0: A database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2013, 42, D1070. [Google Scholar] [CrossRef] [PubMed]
  40. Wang, D.; Wang, J.; Lu, M.; Song, F.; Cui, Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics 2010, 26, 1644–1650. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Xuan, P.; Han, K.; Guo, Y.; Li, J.; Li, X.; Zhong, Y.; Zhang, Z.; Ding, J. Prediction of potential disease-associated microRNAs based on random walk. Bioinformatics 2015, 31, 1805–1815. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Receiver operating feature curve (ROC) curve of CNNDMP and the other four methods. AUC = area under the curve.
Figure 1. Receiver operating feature curve (ROC) curve of CNNDMP and the other four methods. AUC = area under the curve.
Ijms 19 03732 g001
Figure 2. Precision–recall (PR) curve of CNNDMP and the other four methods.
Figure 2. Precision–recall (PR) curve of CNNDMP and the other four methods.
Ijms 19 03732 g002
Figure 3. Recall values of top k candidates of CNNDMP and the other four methods.
Figure 3. Recall values of top k candidates of CNNDMP and the other four methods.
Ijms 19 03732 g003
Figure 4. Construction of a miRNA–disease heterogeneous network and matrix representation. (a) The miRNA similarities network is constructed based on two miRNAs whose similarity are greater than 0 and the matrix representation M . We represent miRNA network topology information and the similarity values between miRNAs by a weighted network. Each node represents a miRNA entity, and the weight on edge represents miRNA similarity values in the weighted network. (b) The disease similarities network and its matrix representation D . (c) The miRNA–disease associations network is constructed based on the known associations between miRNAs and diseases, and its corresponding matrix representation A . When a disease is associated with a miRNA, they are connected by a dotted line. (d) miRNA–disease heterogeneous network. It effectively integrates miRNA similarities, disease similarities and miRNA–disease association information.
Figure 4. Construction of a miRNA–disease heterogeneous network and matrix representation. (a) The miRNA similarities network is constructed based on two miRNAs whose similarity are greater than 0 and the matrix representation M . We represent miRNA network topology information and the similarity values between miRNAs by a weighted network. Each node represents a miRNA entity, and the weight on edge represents miRNA similarity values in the weighted network. (b) The disease similarities network and its matrix representation D . (c) The miRNA–disease associations network is constructed based on the known associations between miRNAs and diseases, and its corresponding matrix representation A . When a disease is associated with a miRNA, they are connected by a dotted line. (d) miRNA–disease heterogeneous network. It effectively integrates miRNA similarities, disease similarities and miRNA–disease association information.
Ijms 19 03732 g004
Figure 5. Integration of miRNA and disease original features to construct the embedding in the left part.
Figure 5. Integration of miRNA and disease original features to construct the embedding in the left part.
Ijms 19 03732 g005
Figure 6. Integration of miRNA and disease network topological features to construct the embedding in the right part.
Figure 6. Integration of miRNA and disease network topological features to construct the embedding in the right part.
Ijms 19 03732 g006
Figure 7. miRNA–disease association prediction framework based on dual CNN.
Figure 7. miRNA–disease association prediction framework based on dual CNN.
Ijms 19 03732 g007
Table 1. ROC-AUCs and PR-AUCs at different values of λ .
Table 1. ROC-AUCs and PR-AUCs at different values of λ .
Parameter   λ 0.10.20.30.40.40.50.70.80.9
ROC-AUC0.8900.9180.9340.9390.9460.9500.9520.9540.956
PR-AUC0.3400.4010.4420.4620.4910.5030.5130.5210.538
Table 2. Prediction results of CNNDMP and the other four methods for 15 diseases in terms of ROC-AUCs.
Table 2. Prediction results of CNNDMP and the other four methods for 15 diseases in terms of ROC-AUCs.
Disease NameROC-AUC CNNDMPGSTRWDMPredPBMDALiu’s Method
Breast neoplasm0.9870.8220.9380.8520.863
Hepatocellular carcinoma0.9860.7790.9000.8030.845
Renal cell carcinoma0.9500.8160.9030.8130.832
Squamous cell carcinoma0.9360.8170.9080.8810.890
Colorectal neoplasm0.9100.7370.8420.8260.857
Glioblastoma0.9260.8140.9040.8030.842
Heart failure0.9720.8170.9870.7910.828
Acute myeloid leukemia0.9610.7880.8900.8440.874
Lung neoplasm0.9620.7910.9480.9050.920
Melanoma0.9780.7890.9130.8360.860
Ovarian neoplasm0.9580.8300.9290.8890.897
Pancreatic neoplasm0.9450.8380.9160.8910.904
Prostatic neoplasm0.9640.8220.9510.8430.855
Stomach neoplasm0.9540.7620.9080.8210.836
Urinary bladder neoplasm0.9560.8160.9190.8540.865
Average AUC0.9560.8020.9170.8440.865
Table 3. Prediction results of CNNDMP and the other four methods for 15 diseases in terms of PR-AUCs.
Table 3. Prediction results of CNNDMP and the other four methods for 15 diseases in terms of PR-AUCs.
Diseases NamePR-AUC CNNDMPGSTRWDMPredPBMDALiu’s Method
Breast neoplasm0.8940.3220.6990.5740.573
Hepatocellular carcinoma0.8930.2790.5010.4540.498
Renal cell carcinoma0.3650.1500.2930.1810.186
Squamous cell carcinoma0.2870.1090.2130.2110.208
Colorectal neoplasm0.3670.1410.1860.3670.371
Glioblastoma0.3300.1510.2190.2170.243
Heart failure0.6020.1910.7000.1680.189
Acute myeloid leukemia0.3680.1400.2110.1910.236
Lung neoplasms0.6360.1470.5110.5370.503
Melanoma0.6570.1710.3890.3630.397
Ovarian neoplasm0.4900.1690.4040.3610.361
Pancreatic neoplasm0.5550.1370.3290.3640.354
Prostatic neoplasm0.5680.1660.4630.2820.264
Stomach neoplasm0.6080.2200.4460.3440.346
Urinary bladder neoplasm0.4700.1630.3150.2520.280
Average AUC0.5380.1770.3920.3240.334
Table 4. Comparison of different methods based on AUCs with a paired t-test.
Table 4. Comparison of different methods based on AUCs with a paired t-test.
p-ValueDMPredGSTRWPBMDALiu’s Method
p-value of ROC-AUC between CNNDMP and other methods6.44998 × 10−49.60973 × 10−162.65553 × 10−101.25344 × 10−10
p-value of PR-AUC between CNNDMP and other methods0.029721.75747 × 10−60.001110.00151
Table 5. The top 50 breast cancer-related candidates.
Table 5. The top 50 breast cancer-related candidates.
RankmiRNA NameEvidenceRankmiRNA NameEvidence
1hsa-mir-1266dbDEMC26hsa-mir-663dbDEMC
2hsa-mir-942dbDEMC27hsa-mir-545dbDEMC
3hsa-mir-384dbDEMC28hsa-mir-525dbDEMC
4hsa-mir-374bdbDEMC29hsa-mir-520fdbDEMC
5hsa-mir-1293dbDEMC30hsa-mir-520gdbDEMC
6hsa-mir-3148Literature [34]31hsa-mir-659dbDEMC
7hsa-mir-569Literature [35]32hsa-mir-150miRCancer, PhenomiR
8hsa-mir-431dbDEMC33hsa-mir-592dbDEMC
9hsa-mir-711Literature [36]34hsa-mir-1254dbDEMC
10hsa-mir-325dbDEMC35hsa-mir-548cdbDEMC
11hsa-mir-1302Literature [37]36hsa-mir-675miRCancer
12hsa-mir-33adbDEMC37hsa-mir-3940Literature [38]
13hsa-mir-1246dbDEMC38hsa-mir-1299dbDEMC
14hsa-mir-376bdbDEMC39hsa-mir-377dbDEMC
15hsa-mir-487adbDEMC40hsa-mir-519adbDEMC
16hsa-mir-1236dbDEMC41hsa-mir-1180dbDEMC
17hsa-mir-548adbDEMC42hsa-mir-1184dbDEMC
18hsa-mir-624dbDEMC43hsa-mir-3151dbDEMC
19hsa-mir-633dbDEMC44hsa-mir-627dbDEMC
20hsa-mir-1181dbDEMC45hsa-mir-1273adbDEMC
21hsa-mir-382dbDEMC46hsa-mir-1972dbDEMC
22hsa-mir-448dbDEMC47hsa-mir-208adbDEMC, PhenomiR
23hsa-mir-583dbDEMC48hsa-mir-668dbDEMC
24hsa-mir-518adbDEMC49hsa-mir-635dbDEMC
25hsa-mir-433dbDEMC50hsa-mir-619dbDEMC

Share and Cite

MDPI and ACS Style

Xuan, P.; Dong, Y.; Guo, Y.; Zhang, T.; Liu, Y. Dual Convolutional Neural Network Based Method for Predicting Disease-Related miRNAs. Int. J. Mol. Sci. 2018, 19, 3732. https://doi.org/10.3390/ijms19123732

AMA Style

Xuan P, Dong Y, Guo Y, Zhang T, Liu Y. Dual Convolutional Neural Network Based Method for Predicting Disease-Related miRNAs. International Journal of Molecular Sciences. 2018; 19(12):3732. https://doi.org/10.3390/ijms19123732

Chicago/Turabian Style

Xuan, Ping, Yihua Dong, Yahong Guo, Tiangang Zhang, and Yong Liu. 2018. "Dual Convolutional Neural Network Based Method for Predicting Disease-Related miRNAs" International Journal of Molecular Sciences 19, no. 12: 3732. https://doi.org/10.3390/ijms19123732

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop