ijms-logo

Journal Browser

Journal Browser

Special Protein or RNA Molecules Computational Identification 2018

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Biophysics".

Deadline for manuscript submissions: closed (31 October 2018) | Viewed by 52796

Special Issue Editor

School of Computer Science and Technology, Tianjin University, Tianjin, China
Interests: bioinformatics; machine learning; string algorithm
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear colleagues,

New molecules discovery is still an important and challenging task. For some special proteins or RNA molecules, it is difficult, time consuming and costly to detect new ones. These special proteins include cytokines, enzymes, cell-penetrating peptides, anticancer peptides, cancerlectins, G protein-coupled receptors, etc. Some noncoding RNAs are also required to be annotated in the sequencing data, such as microRNA, snoRNA, snRNA, circle RNA, tRNA, etc. Researchers often employed computer programs to list some candidates, and validated the candidates using molecular experiments. The “computer program” is a key issue, which could save on wet experiments costs. High false positive software would lead to high costs in the validation process.

We have successfully organized a related Special Issue last year (see Int. J. Mol. Sci. 2018, 19, 536 as a summary). Eighteen papers were published in the Special Issue. I hope more follow-up study could appear in the new coming issue. For all the new submissions, please state the relationship and differences from the papers in the 2017 Special Issue. You can refer to the Editorial of the last edition Int. J. Mol. Sci. 2018, 19, 536.

In addition to proteins, we encourage authors to pay attention to noncoding RNA molecules. MicroRNA and other noncoding RNA detections are still open challenging for bioinformatic researchers. Perfect performance could save the cost of Northern Blot or rtPCR. RNA function and RNA-disease relationship is also interesting and welcome. Some network methods, including random walk and matrix factorization, were employed in the RNA–disease relationship prediction. However, they are not robust. Sometimes, state-of-art methods would be invalid upon updating the datasets. I hope to see more novel and robust methods and golden benchmark datasets in the new Special Issue.

Prof. Quan Zou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bioinformatics
  • machine learning
  • feature selection
  • protein classification
  • PseAAC features
  • anticancer peptides
  • Cell-Penetrating Peptides
  • oncogene
  • DNA/RNA binding proteins
  • MHC binding peptide
  • noncoding RNA
  • microRNA
  • RNA-disease relationship
  • network

Related Special Issue

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 173 KiB  
Editorial
Special Protein or RNA Molecules Computational Identification
by Ren Qi and Quan Zou
Int. J. Mol. Sci. 2023, 24(14), 11312; https://doi.org/10.3390/ijms241411312 - 11 Jul 2023
Viewed by 625
Abstract
The identification of special protein or RNA molecules via computational methods is of great importance in understanding their biological functions and developing new treatments for diseases [...] Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)

Research

Jump to: Editorial

14 pages, 1536 KiB  
Article
BGFE: A Deep Learning Model for ncRNA-Protein Interaction Predictions Based on Improved Sequence Information
by Zhao-Hui Zhan, Li-Na Jia, Yong Zhou, Li-Ping Li and Hai-Cheng Yi
Int. J. Mol. Sci. 2019, 20(4), 978; https://doi.org/10.3390/ijms20040978 - 23 Feb 2019
Cited by 15 | Viewed by 3802
Abstract
The interactions between ncRNAs and proteins are critical for regulating various cellular processes in organisms, such as gene expression regulations. However, due to limitations, including financial and material consumptions in recent experimental methods for predicting ncRNA and protein interactions, it is essential to [...] Read more.
The interactions between ncRNAs and proteins are critical for regulating various cellular processes in organisms, such as gene expression regulations. However, due to limitations, including financial and material consumptions in recent experimental methods for predicting ncRNA and protein interactions, it is essential to propose an innovative and practical approach with convincing performance of prediction accuracy. In this study, based on the protein sequences from a biological perspective, we put forward an effective deep learning method, named BGFE, to predict ncRNA and protein interactions. Protein sequences are represented by bi-gram probability feature extraction method from Position Specific Scoring Matrix (PSSM), and for ncRNA sequences, k-mers sparse matrices are employed to represent them. Furthermore, to extract hidden high-level feature information, a stacked auto-encoder network is employed with the stacked ensemble integration strategy. We evaluate the performance of the proposed method by using three datasets and a five-fold cross-validation after classifying the features through the random forest classifier. The experimental results clearly demonstrate the effectiveness and the prediction accuracy of our approach. In general, the proposed method is helpful for ncRNA and protein interacting predictions and it provides some serviceable guidance in future biological research. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Figure 1

15 pages, 3297 KiB  
Article
Dual Convolutional Neural Network Based Method for Predicting Disease-Related miRNAs
by Ping Xuan, Yihua Dong, Yahong Guo, Tiangang Zhang and Yong Liu
Int. J. Mol. Sci. 2018, 19(12), 3732; https://doi.org/10.3390/ijms19123732 - 23 Nov 2018
Cited by 29 | Viewed by 3714
Abstract
Identification of disease-related microRNAs (disease miRNAs) is helpful for understanding and exploring the etiology and pathogenesis of diseases. Most of recent methods predict disease miRNAs by integrating the similarities and associations of miRNAs and diseases. However, these methods fail to learn the deep [...] Read more.
Identification of disease-related microRNAs (disease miRNAs) is helpful for understanding and exploring the etiology and pathogenesis of diseases. Most of recent methods predict disease miRNAs by integrating the similarities and associations of miRNAs and diseases. However, these methods fail to learn the deep features of the miRNA similarities, the disease similarities, and the miRNA–disease associations. We propose a dual convolutional neural network-based method for predicting candidate disease miRNAs and refer to it as CNNDMP. CNNDMP not only exploits the similarities and associations of miRNAs and diseases, but also captures the topology structures of the miRNA and disease networks. An embedding layer is constructed by combining the biological premises about the miRNA–disease associations. A new framework based on the dual convolutional neural network is presented for extracting the deep feature representation of associations. The left part of the framework focuses on integrating the original similarities and associations of miRNAs and diseases. The novel miRNA and disease similarities which contain the topology structures are obtained by random walks on the miRNA and disease networks, and their deep features are learned by the right part of the framework. CNNDMP achieves the superior prediction performance than several state-of-the-art methods during the cross-validation process. Case studies on breast cancer, colorectal cancer and lung cancer further demonstrate CNNDMP’s powerful ability of discovering potential disease miRNAs. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Graphical abstract

16 pages, 1186 KiB  
Article
Prediction of Signal Peptides in Proteins from Malaria Parasites
by Michał Burdukiewicz, Piotr Sobczyk, Jarosław Chilimoniuk, Przemysław Gagat and Paweł Mackiewicz
Int. J. Mol. Sci. 2018, 19(12), 3709; https://doi.org/10.3390/ijms19123709 - 22 Nov 2018
Cited by 9 | Viewed by 13085
Abstract
Signal peptides are N-terminal presequences responsible for targeting proteins to the endomembrane system, and subsequent subcellular or extracellular compartments, and consequently condition their proper function. The significance of signal peptides stimulates development of new computational methods for their detection. These methods employ learning [...] Read more.
Signal peptides are N-terminal presequences responsible for targeting proteins to the endomembrane system, and subsequent subcellular or extracellular compartments, and consequently condition their proper function. The significance of signal peptides stimulates development of new computational methods for their detection. These methods employ learning systems trained on datasets comprising signal peptides from different types of proteins and taxonomic groups. As a result, the accuracy of predictions are high in the case of signal peptides that are well-represented in databases, but might be low in other, atypical cases. Such atypical signal peptides are present in proteins found in apicomplexan parasites, causative agents of malaria and toxoplasmosis. Apicomplexan proteins have a unique amino acid composition due to their AT-biased genomes. Therefore, we designed a new, more flexible and universal probabilistic model for recognition of atypical eukaryotic signal peptides. Our approach called signalHsmm includes knowledge about the structure of signal peptides and physicochemical properties of amino acids. It is able to recognize signal peptides from the malaria parasites and related species more accurately than popular programs. Moreover, it is still universal enough to provide prediction of other signal peptides on par with the best preforming predictors. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Figure 1

15 pages, 3529 KiB  
Article
An Efficient Algorithm for Sensitively Detecting Circular RNA from RNA-seq Data
by Xuanping Zhang, Yidan Wang, Zhongmeng Zhao and Jiayin Wang
Int. J. Mol. Sci. 2018, 19(10), 2897; https://doi.org/10.3390/ijms19102897 - 24 Sep 2018
Cited by 12 | Viewed by 3450
Abstract
Circular RNA (circRNA) is an important member of non-coding RNA family. Numerous computational methods for detecting circRNAs from RNA-seq data have been developed in the past few years, but there are dramatic differences among the algorithms regarding the balancing of the sensitivity and [...] Read more.
Circular RNA (circRNA) is an important member of non-coding RNA family. Numerous computational methods for detecting circRNAs from RNA-seq data have been developed in the past few years, but there are dramatic differences among the algorithms regarding the balancing of the sensitivity and precision of the detection and filtering strategies. To further improve the sensitivity, while maintaining an acceptable precision of circRNA detection, a novel and efficient de novo detection algorithm, CIRCPlus, is proposed in this paper. CIRCPlus accurately locates circRNA candidates by identifying a set of back-spliced junction reads by comparing the local similar sequence of each pair of spanning junction reads. This strategy, thus, utilizes the important information provided by unbalanced spanning reads, which facilitates the detection especially when the expression levels of circRNA are unapparent. The performance of CIRCPlus was tested and compared to the existing de novo methods on the real datasets as well as a series of simulation datasets with different configurations. The experiment results demonstrated that the sensitivities of CIRCPlus were able to reach 90% in common simulation settings, while CIRCPlus held balanced sensitivity and reliability on the real datasets according to an objective assessment criteria based on RNase R-treated samples. The software tool is available for academic uses only. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Figure 1

11 pages, 2364 KiB  
Article
A Hybrid Deep Learning Model for Predicting Protein Hydroxylation Sites
by Haixia Long, Bo Liao, Xingyu Xu and Jialiang Yang
Int. J. Mol. Sci. 2018, 19(9), 2817; https://doi.org/10.3390/ijms19092817 - 18 Sep 2018
Cited by 27 | Viewed by 5179
Abstract
Protein hydroxylation is one type of post-translational modifications (PTMs) playing critical roles in human diseases. It is known that protein sequence contains many uncharacterized residues of proline and lysine. The question that needs to be answered is: which residue can be hydroxylated, and [...] Read more.
Protein hydroxylation is one type of post-translational modifications (PTMs) playing critical roles in human diseases. It is known that protein sequence contains many uncharacterized residues of proline and lysine. The question that needs to be answered is: which residue can be hydroxylated, and which one cannot. The answer will not only help understand the mechanism of hydroxylation but can also benefit the development of new drugs. In this paper, we proposed a novel approach for predicting hydroxylation using a hybrid deep learning model integrating the convolutional neural network (CNN) and long short-term memory network (LSTM). We employed a pseudo amino acid composition (PseAAC) method to construct valid benchmark datasets based on a sliding window strategy and used the position-specific scoring matrix (PSSM) to represent samples as inputs to the deep learning model. In addition, we compared our method with popular predictors including CNN, iHyd-PseAAC, and iHyd-PseCp. The results for 5-fold cross-validations all demonstrated that our method significantly outperforms the other methods in prediction accuracy. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Graphical abstract

15 pages, 2292 KiB  
Article
IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields
by Yumeng Liu, Xiaolong Wang and Bin Liu
Int. J. Mol. Sci. 2018, 19(9), 2483; https://doi.org/10.3390/ijms19092483 - 22 Aug 2018
Cited by 22 | Viewed by 3637
Abstract
Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region [...] Read more.
Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP–CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP–CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP–CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP–CRF will facilitate the development of protein sequence analysis. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Graphical abstract

13 pages, 1342 KiB  
Article
RFAmyloid: A Web Server for Predicting Amyloid Proteins
by Mengting Niu, Yanjuan Li, Chunyu Wang and Ke Han
Int. J. Mol. Sci. 2018, 19(7), 2071; https://doi.org/10.3390/ijms19072071 - 16 Jul 2018
Cited by 46 | Viewed by 6286
Abstract
Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based [...] Read more.
Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy’s overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Graphical abstract

12 pages, 1326 KiB  
Article
Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree
by Yanyuan Pan, Hui Gao, Hao Lin, Zhen Liu, Lixia Tang and Songtao Li
Int. J. Mol. Sci. 2018, 19(6), 1779; https://doi.org/10.3390/ijms19061779 - 15 Jun 2018
Cited by 29 | Viewed by 3322
Abstract
Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct [...] Read more.
Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct identification of bacteriophage virion proteins is of great importance for understanding both life at the molecular level and genetic evolution. However, few computational methods are available for identifying bacteriophage virion proteins. In this paper, we proposed a new method to predict bacteriophage virion proteins using a Multinomial Naïve Bayes classification model based on discrete feature generated from the g-gap feature tree. The accuracy of the proposed model reaches 98.37% with MCC of 96.27% in 10-fold cross-validation. This result suggests that the proposed method can be a useful approach in identifying bacteriophage virion proteins from sequence information. For the convenience of experimental scientists, a web server (PhagePred) that implements the proposed predictor is available, which can be freely accessed on the Internet. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Figure 1

11 pages, 840 KiB  
Article
SeqSVM: A Sequence-Based Support Vector Machine Method for Identifying Antioxidant Proteins
by Lei Xu, Guangmin Liang, Shuhua Shi and Changrui Liao
Int. J. Mol. Sci. 2018, 19(6), 1773; https://doi.org/10.3390/ijms19061773 - 15 Jun 2018
Cited by 81 | Viewed by 4242
Abstract
Antioxidant proteins can be beneficial in disease prevention. More attention has been paid to the functionality of antioxidant proteins. Therefore, identifying antioxidant proteins is important for the study. In our work, we propose a computational method, called SeqSVM, for predicting antioxidant proteins based [...] Read more.
Antioxidant proteins can be beneficial in disease prevention. More attention has been paid to the functionality of antioxidant proteins. Therefore, identifying antioxidant proteins is important for the study. In our work, we propose a computational method, called SeqSVM, for predicting antioxidant proteins based on their primary sequence features. The features are removed to reduce the redundancy by max relevance max distance method. Finally, the antioxidant proteins are identified by support vector machine (SVM). The experimental results demonstrated that our method performs better than existing methods, with the overall accuracy of 89.46%. Although a proposed computational method can attain an encouraging classification result, the experimental results are verified based on the biochemical approaches, such as wet biochemistry and molecular biology techniques. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Figure 1

13 pages, 2668 KiB  
Article
PCLPred: A Bioinformatics Method for Predicting Protein–Protein Interactions by Combining Relevance Vector Machine Model with Low-Rank Matrix Approximation
by Li-Ping Li, Yan-Bin Wang, Zhu-Hong You, Yang Li and Ji-Yong An
Int. J. Mol. Sci. 2018, 19(4), 1029; https://doi.org/10.3390/ijms19041029 - 29 Mar 2018
Cited by 27 | Viewed by 4629
Abstract
Protein–protein interactions (PPI) are key to protein functions and regulations within the cell cycle, DNA replication, and cellular signaling. Therefore, detecting whether a pair of proteins interact is of great importance for the study of molecular biology. As researchers have become aware of [...] Read more.
Protein–protein interactions (PPI) are key to protein functions and regulations within the cell cycle, DNA replication, and cellular signaling. Therefore, detecting whether a pair of proteins interact is of great importance for the study of molecular biology. As researchers have become aware of the importance of computational methods in predicting PPIs, many techniques have been developed for performing this task computationally. However, there are few technologies that really meet the needs of their users. In this paper, we develop a novel and efficient sequence-based method for predicting PPIs. The evolutionary features are extracted from the position-specific scoring matrix (PSSM) of protein. The features are then fed into a robust relevance vector machine (RVM) classifier to distinguish between the interacting and non-interacting protein pairs. In order to verify the performance of our method, five-fold cross-validation tests are performed on the Saccharomyces cerevisiae dataset. A high accuracy of 94.56%, with 94.79% sensitivity at 94.36% precision, was obtained. The experimental results illustrated that the proposed approach can extract the most significant features from each protein sequence and can be a bright and meaningful tool for the research of proteomics. Full article
(This article belongs to the Special Issue Special Protein or RNA Molecules Computational Identification 2018)
Show Figures

Graphical abstract

Back to TopTop