Next Article in Journal
Novel Tdp1 Inhibitors Based on Adamantane Connected with Monoterpene Moieties via Heterocyclic Fragments
Next Article in Special Issue
Learning the Fastest RNA Folding Path Based on Reinforcement Learning and Monte Carlo Tree Search
Previous Article in Journal
Antioxidant Properties of Fruit and Vegetable Whey Beverages and Fruit and Vegetable Mousses
Previous Article in Special Issue
Generative Adversarial Learning of Protein Tertiary Structures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Graph Neural Networks with Multiple Feature Extraction Paths for Chemical Property Estimation

Graduate School of Engineering, Tohoku University, Sendai 9808579, Japan
*
Author to whom correspondence should be addressed.
Molecules 2021, 26(11), 3125; https://doi.org/10.3390/molecules26113125
Submission received: 5 March 2021 / Revised: 14 May 2021 / Accepted: 21 May 2021 / Published: 24 May 2021
(This article belongs to the Special Issue Deep Learning for Molecular Structure Modelling)

Abstract

:
Feature extraction is essential for chemical property estimation of molecules using machine learning. Recently, graph neural networks have attracted attention for feature extraction from molecules. However, existing methods focus only on specific structural information, such as node relationship. In this paper, we propose a novel graph convolutional neural network that performs feature extraction with simultaneously considering multiple structures. Specifically, we propose feature extraction paths specialized in node, edge, and three-dimensional structures. Moreover, we propose an attention mechanism to aggregate the features extracted by the paths. The attention aggregation enables us to select useful features dynamically. The experimental results showed that the proposed method outperformed previous methods.

1. Introduction

Each molecule has its unique chemical properties. Estimation of the chemical properties is the first step in the field of drug discovery. Reagent testing is a standard estimation method. However, its process requires long time and equipment cost. Machine learning methods have been widely studied to reduce time and cost.
Most machine learning methods transform molecules into feature vectors and estimate chemical properties using a neural network. There is a high correlation between molecular structure and chemical properties. For example, molecules with benzene rings have a sweet aroma and flammability, and hydroxy groups (OH groups) are readily soluble in water. Therefore, feature extraction of molecular structures is essential in the estimation of chemical properties using machine learning. Since chemical properties depend on an essential structure, a flexible feature extraction method is necessary. A general feature extraction method is Molecular Fingerprints [1,2,3,4], which transform a molecular structure into a one-hot vector of the presence or absence of specific structures designed by humans. However, the specific structures are hard for modification according to chemical properties since experts need to change the specific structure of Molecular Fingerprints.
Recently, feature extraction using graph convolutional neural networks [5] has been attracting attention as a learnable feature extraction method. As shown in Figure 1, the graph represents the molecule using nodes (atoms) and edges (bonds). Node features are extracted by updating their features and neighboring node features. The node feature propagates to further nodes by the number of update processes. Besides, the update is based on a neural network. Thus, a feature extraction model can learn the essential substructures in a molecule according to the chemical characteristics of the estimation target. Various models using graph convolutional neural networks have been developed [6,7]. The weave model [6] extracts edge features to consider relationships between nodes. The 3DGCN model used relative coordinates between nodes to extract features of three-dimensional structures [7].
Graph convolutional neural networks worked well in a classification problem, such as active or inactive. However, there is room for improvement in the regression problem due to its extensive estimation range. Furthermore, substructures can be different for target properties. Thus, it is essential to consider multiple structural features simultaneously.
In this paper, we propose a method for chemical property estimation of molecules using multiple structural features. Specifically, we integrate feature extraction paths that consider nodes, edges, and three-dimensional structures, respectively. For more flexible feature extraction, we utilize an attention mechanism to select useful features dynamically.

2. Related Work

In the estimation of chemical properties by machine learning, the estimator uses feature vectors extracted from molecules. Molecular Fingerprint is a method for extracting feature vectors from molecules [1,2,3]. This method uses a one-hot vector to represent the presence or absence of human-designed molecular structures. An improved method is Extended Connectivity Fingerprints, or ECFP [4]. ECFP extracts the presence or absence of subgraphs within molecular radius as a feature vector. However, these methods only consider pre-designed molecular structures and, thus, cannot extract features according to the chemical properties of the target.
A flexible feature extraction method has been developed using machine learning. Duvenaud et al. used neural networks to refine the features of ECFP [8]. Recently, the graph convolutional neural network [5] has attracted much attention. Graph convolutional neural networks sequentially update node features using the features of their neighborhood nodes. Finally, all the node features are merged into a one-dimensional feature vector, resulting in a feature vector of a molecule. In addition to estimating molecular properties, graph convolutional neural networks are used in a wide range of fields, including language processing [9,10,11], human motion estimation [12,13,14], graph similarity estimation [15,16], and class identification [17,18].
For the estimation of chemical properties, various models exist [19,20,21,22,23,24]. Directed graphs are used to reduce computation and update node features [22,23]. Edge features are extracted in References [6,25]. Relative coordinates between nodes are used to extract features of three-dimensional structures [7]. There are methods that learn the importance of node features [26,27]. However, the aforementioned methods specialized in a specific molecular structure, such as edge and three-dimensional structures. In this study, we propose to integrate three feature extraction methods [5,6,7] to simultaneously extract multiple molecular structures. Furthermore, we dynamically select features using an attention mechanism to improve the estimation performance.

3. Materials and Methods

We propose a graph convolutional neural network that integrates three different approaches of feature extraction. Depending on the chemical properties of the estimation target, we need to extract different features. Therefore, we simultaneously extract node features, edge features, and three-dimensional features. Furthermore, we use attention to calculate the importance of each feature dynamically. As shown in Figure 2, the proposed method extracts features using multiple paths (node features, edge features, and three-dimensional features) and aggregates each feature. Firstly, we extract features through each path. Then, we form a molecular feature by aggregating the features. The proposed method enables us to consider various structures of the molecule by extracting features through multiple paths.

3.1. Node Feature Extraction Path

This path extracts node features using relationships between nodes. Let H i t R M × 1 represents a M-dimensional feature vector of node i at t-th update round. We produce the pair feature P i j R M × 1 between node i and j by Equation (1), where σ represents the rectified linear unit, and ‖ is the concatenation operation. A weight W np R M × 2 M and a bias B np R M × 1 are learning parameters. Subsequently, we update the node features H i as Equation (2). N ( i ) is the set of neighboring nodes of node i. The weight is W n R M × M , and the bias is B n R M × 1 .
P i j = σ ( W np ( H i t H j t ) + B np ) ,
H i t + 1 = σ ( j N ( i ) W n P i j + B n ) .

3.2. Edge Feature Extraction Path

The extraction path of edge features takes into account the edge relationships between nodes. The atoms in a molecule can have various bonds, such as single bonds and double bonds. We incorporate these bond types into the feature extraction to consider the molecular structure’s connectivity.
We use the five bonds: single bonds, double bonds, triple bonds, aromatic bonds, and bonds to themselves. Let N represent the number of atoms. We represent the bonds using the edge parameter E R N × N to describe the connectivity types. A naive parameter for E is using categorical values, such as 1 for the single bond. Inspired by Reference [25], we learn parameters to represent the bonds rather than merely representing the bonds using five categorical values. As shown in Figure 3, we create five adjacency matrices and learn the edge parameters using convolutional filter. The convolution filter has a kernel size of 1, and the number of channels is 5.
We obtain a pair feature P i j by Equation (3) using E i j , the ( i , j ) th element of E. Note that E i j is a scalar value. Then, we update the feature H i by Equation (4). The learning parameters are weight W ep R M × 2 M , W e R M × M , bias B ep , B e R M × 1 . We take the molecular bonds into account by multiplying E and the paired features.
P i j = σ ( E i j W ep ( H i t H j t ) + B ep ) ,
H i t + 1 = σ ( j N ( i ) W e P i j + B e ) .

3.3. Three-Dimensional Feature Extraction Path

We incorporate three-dimensional structural information into feature updates based on Reference [7]. Let ( x i , y i , z i ) represent the absolute coordinate of node i, we calculate the relative coordinate R ( x ) i j = x i x j of the x-coordinate. Likewise, we obtain relative coordinate in y and z, R ( y ) i j = y i y j and R ( z ) i j = z i z j .
We calculated the pair feature P i j using the relative coordinates R as defined in Equation (5). Then, we obtain the intermediate feature Q i by accumulating the pair features as in Equation (6). However, Q i exclude node feature H i due to R i i = 0 . Therefore, as shown in Equation (7), we propose to concatenate H i and Q i .
P i j = σ ( k ( x , y , z ) R ( k ) i j W tp ( H i t H j t ) + B tp ) ,
Q i = σ ( j N ( i ) W tq P i j + B tq ) ,
H i t + 1 = σ ( W t ( H i t Q i ) ) .
There is a drawback in relative coordinates. The difference of relative coordinates is affected by translation and rotation. For further improvements, it is promising to use distance between atoms.

3.4. Feature Aggregation

We propose to extract more useful features by merging the features extracted through the paths. We integrate the three features using attention to dynamically select important features for each node. We integrate the features as Equation (8), where H node , H edge , and H 3 d represent features extracted by the paths. Where α i node represents an attention for H i node at node i, which is defined in Equation (9). We used the softmax function to obtain α . Inspired by Reference [26], we calculate e i node , e i edge , e i 3 d for each feature by Equation (10). We use the initial feature H i init of the node i and H p , p { node , edge , 3 d } .
H i = σ ( W agg p { node , edge , 3 d } α i p H i p ) ,
α i p = softmax ( e i p ) = e x p ( e i p ) k { node , edge , 3 d } e x p ( e i k ) ,
e i k = W att ( σ ( H i init W init ) σ ( H i k W k ) ) .

3.5. Details of the Proposed Model

We illustrated the structure of the proposed model in Figure 4. The proposed model extracts features using the paths and aggregation, which are composed of graph convolutional neural networks. Then, we sum up the features along to each dimension to produce a molecular feature vector. Finally, we estimate chemical properties by applying a fully connected layer.
We adopted two-stage training. Specifically, we independently trained each path. Then, we fixed the paths and trained the aggregation layer and the fully connected layer. We used the mean square error (MSE) loss for training. We followed Reference [7] to determine initial features, resulting in 60 dimensions feature vectors. The batch size was set to 16.

3.6. Datasets and Metrics

We mainly used two datasets in the experiments: Freesolv and ESOL. Each of these datasets has been compiled in Reference [28] and is widely used as a dataset to evaluate methods for estimating chemical properties. Freesolv is a dataset for estimating the free energy of hydration of molecules and contains 1128 molecules. ESOL is a dataset for estimating solubility and contains 643 molecules. Overall, Freesolv and ESOL are regression task, which directly predicts the values. Besides, we used four datasets for verification of the proposed method. We summarized the datasets in Table 1. QM8 has four excited state properties calculated by three different methods. Thus, 12 properties in total.
We randomly split the dataset to 8:1:1 for training data, validation data, and test data. We evaluated the proposed method and the comparison methods for 10 trials. We calculated the average of the metrics over the trials. The evaluation metrics is Mean Absolute Error (MAE). The smaller MAE is better.

4. Results

4.1. Comparison Methods

As comparison methods, we used the graph convolutional neural network (GCN) [5], the Weave model [6], and the 3DGCN [7]. Broadly, the comparison methods extract node features (GCN), edge features (Weave), and three-dimensional features (3DGCN), respectively. We set the number of updating layers to two in the proposed method and the comparison methods for equivalent comparison. In addition, the summation is used for producing molecular features as same as the proposed method. The main difference between the comparison methods and the proposed method is the number of feature extraction paths. The comparison methods have a single path. In contrast, the proposed method has multiple paths to consider node, edge, and three-dimensional structure simultaneously.

4.2. Main Results

We trained models until they converged. We stopped training if the loss is no longer improving for ten successive epochs. We defined no improvement if improvements are less than 0.0001. Figure 5 shows typical loss curves of the proposed method in training and validation. The loss curves of the validation also converged. Thus, there was no overfitting. The models successfully converged. The proposed model has 143,286 parameters. Compared to 135 M and 11.4 M parameters in VGG-16 and ResNet-18 models, the number of parameters is significantly small. Therefore, the numbers of data points in Freesolv and ESOL are satisfactory for the proposed method.
We showed the numerical results in Table 2. GCN was better among the comparison method. Moreover, the proposed method outperformed the comparison methods on all datasets. The proposed method successfully learned the essential features. Thus, the results showed the effectiveness of the multiple feature extraction for chemical property regression.

4.3. Results on Quantum Mechanism

We conducted experiments using QM7 and QM8 datasets. We trained models until they converged. The results are shown in Table 3. The results show that the proposed method outperformed the comparison methods at ten tasks. In addition, the proposed method was the second-best on the rest tasks. Thus, we verified the effectiveness of the proposed method in various tasks.

4.4. Evaluation on Aggregation Approaches

We carried out experiments to discuss the effectiveness of the feature aggregation. Besides the attention, there can be various aggregation approaches, such as concatenation, summing, and maximum. We define them in Equations (11)–(13).
H i = σ ( W concat ( H i node H i edge H i 3 d ) ) ,
H i = σ ( W sum ( H i node + H i edge + H i 3 d ) ) ,
H i = σ ( W max max ( H i node , H i edge , H i 3 d ) ) .
Table 4 shows the results. The attention and the concatenation aggregations were the best on Freesolv and ESOL, respectively. All the aggregations achieved accurate estimation results. Thus, the proposed method has capability to aggregate different features using various approaches.

4.5. Impacts of Feature Extraction Paths

We conducted experiments to clarify the impacts of feature extraction paths on the datasets. We built models with one-, and two-paths. Specifically, one-path models have a single path of node, edge, and three-dimensional features, respectively. The two-path models have node and edge paths, node and three-dimensional paths, and edge and three-dimensional paths, respectively. We used the attention aggregation in the two-path models.
The results are shown in Table 5. In Freesolv, the two-path models outperformed the one-path models. Furthermore, the proposed model was the best at 0.639. Likewise, two-path models were superior to the one-path models on ESOL. Overall, multiple features were significant in chemical property estimation.

4.6. Results in Classification Tasks

We conducted classification experiments on BACE and BBBP. In addition, we trained models until they converged. We used four metrics: Accuracy, Recall, Precision, and F-score. Table 6 and Table 7 show the results. The two path model of the proposed method achieved the best results on BACE at all metrics. In addition, the proposed method was the best at precision and the second-best at the other metrics on BBBP. The ROC curves in Figure 6 shows the significant performance of the proposed method. These results show that the effectiveness of the proposed method in a classification task.

4.7. Verification of Edge Parameters

We carried out experiments to verify the effect of edge parameter E, a learning parameter in Equation (3). We compared two modes on the basis of the edge path model. One used edge parameters learned by convolution with kernel sizes. The other adopted the fixed edge parameter, e.g., categorical values, one for the self node, 2 for single bonds, 3 for double bonds, 4 for triple bonds, 5 for aromatic bonds. We adopted 50 epochs for training. The results are shown in Table 8. The model using the learned edge parameters improved on all the datasets. Therefore, the proposed method learned an optimal representation of edge types.

4.8. Effect of the Self Node in Three-Dimensional Features

We evaluated the effect of the self node in the three-dimensional feature. We used the model of the extraction path of three-dimensional features. Then, we compared the model with and without the self node. Specifically, we defined the model without self node as Equation (14) instead of Equation (7).
H i 3 d = σ ( W tq Q i ) .
If we omit the self nodes, the self nodes were not considered when aggregating the pair features and updating the node features. The experimental results are shown in Table 9. There were specific improvements by the self node. Therefore, we confirmed that the performance could improve by incorporating the self nodes into three-dimensional features.

4.9. Attention Visualization

We conducted experiments to confirm the capability of dynamic determination for the attention values in the proposed method by visualizing each node’s attentions α i . According to Equation (9), the summation of α i among the paths is normalized to one. Thus, we directly illustrated α using bar charts. The visualization results are shown in Figure 7. The various attention values were assigned to each node. The result shows that the proposed method flexibly determined attention for each node.

5. Conclusions

In this study, we proposed a method for chemical property estimation in molecules. The proposed method uses multiple paths to extract features focusing on specific structures, such as node relationship, edge relationship, and three-dimensional structure in a molecule. Furthermore, we proposed to obtain more useful features by aggregating multiple features by selecting essential features dynamically. Compared to existing methods that focus on only one structure, the experimental results showed that the proposed method outperformed the comparison methods in regression tasks. Therefore, multiple feature extraction can improve the performance of chemical property estimation in molecules.

Author Contributions

Conceptualization, S.I. and T.M.; methodology, S.I.; software, S.I.; validation, S.I. and T.M.; formal analysis, S.I.; investigation, S.I. and T.M.; resources, S.I.; data curation, S.I.; writing—original draft preparation, S.I. and T.M.; writing—review and editing, Y.S. and S.O.; visualization, S.I.; supervision, Y.S. and S.O.; project administration, S.O.; funding acquisition, T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by JSPS KAKENHI Grant Numbers 19K11848.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets are available at http://moleculenet.ai/datasets-1 (accessed on 14 May 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Carhart, R.E.; Smith, D.H.; Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: Definition and applications. J. Chem. Inf. Comput. Sci. 1985, 25, 64–73. [Google Scholar] [CrossRef]
  2. Nilakantan, R.; Bauman, N.; Dixon, J.S.; Venkataraghavan, R. Topological torsion: A new molecular descriptor for SAR applications. Comparison with other descriptors. J. Chem. Inf. Comput. Sci. 1987, 27, 82–85. [Google Scholar] [CrossRef]
  3. Gedeck, P.; Rohde, B.; Bartels, C. QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets. J. Chem. Inf. Model. 2006, 46, 1924–1936. [Google Scholar] [CrossRef] [PubMed]
  4. Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef] [PubMed]
  5. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  6. Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular graph convolutions: Moving beyond fingerprints. J. Comput.-Aided Mol. Des. 2016, 30, 595–608. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Cho, H.; Choi, I.S. Enhanced Deep-Learning Prediction of Molecular Properties via Augmentation of Bond Topology. ChemMedChem 2019, 14, 1604–1609. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Duvenaud, D.K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; Volume 28, pp. 2224–2232. [Google Scholar]
  9. Johnson, J.; Gupta, A.; Li, F. Image Generation from Scene Graphs. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1219–1228. [Google Scholar]
  10. Li, K.; Zhang, Y.; Li, K.; Li, Y.; Fu, Y. Visual Semantic Reasoning for Image-Text Matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; pp. 4653–4661. [Google Scholar]
  11. Wang, Z.; Liu, X.; Li, H.; Sheng, L.; Yan, J.; Wang, X.; Shao, J. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; pp. 5763–5772. [Google Scholar]
  12. Zhang, X.; Xu, C.; Tian, X.; Tao, D. Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3047–3060. [Google Scholar] [CrossRef] [PubMed]
  13. Li, M.; Chen, S.; Chen, X.; Zhang, Y.; Wang, Y.; Tian, Q. Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; pp. 3590–3598. [Google Scholar]
  14. Qi, M.; Li, W.; Yang, Z.; Wang, Y.; Luo, J. Attentive Relational Networks for Mapping Images to Scene Graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3952–3961. [Google Scholar]
  15. Bai, Y.; Ding, H.; Bian, S.; Chen, T.; Sun, Y.; Wang, W. SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11–15 February 2019; pp. 384–392. [Google Scholar]
  16. Li, G.; Müller, M.; Thabet, A.; Ghanem, B. DeepGCNs: Can GCNs Go As Deep As CNNs? In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; pp. 9266–9275. [Google Scholar]
  17. Kim, J.; Kim, T.; Kim, S.; Yoo, C.D. Edge-Labeling Graph Neural Network for Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11–20. [Google Scholar]
  18. Li, A.; Luo, T.; Lu, Z.; Xiang, T.; Wang, L. Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7205–7213. [Google Scholar]
  19. Pope, P.E.; Kolouri, S.; Rostami, M.; Martin, C.E.; Hoffmann, H. Explainability Methods for Graph Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10764–10773. [Google Scholar]
  20. Gong, L.; Cheng, Q. Exploiting Edge Features for Graph Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9203–9211. [Google Scholar]
  21. Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1263–1272. [Google Scholar]
  22. Song, Y.; Zheng, S.; Niu, Z.; Fu, Z.h.; Lu, Y.; Yang, Y. Communicative Representation Learning on Attributed Molecular Graphs. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 7–15 January 2021; pp. 2831–2838. [Google Scholar]
  23. Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M.; et al. Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Miyazaki, T.; Omachi, S. Structural Data Recognition With Graph Model Boosting. IEEE Access 2018, 6, 63606–63618. [Google Scholar] [CrossRef]
  25. Shang, C.; Liu, Q.; Chen, K.S.; Sun, J.; Lu, J.; Yi, J.; Bi, J. Edge Attention-based Multi-Relational Graph Convolutional Networks. arXiv 2018, arXiv:1802.04944. [Google Scholar]
  26. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  27. Chen, B.; Barzilay, R.; Jaakkola, T. Path-Augmented Graph Transformer Network. arXiv 2019, arXiv:cs.LG/1905.12712. [Google Scholar]
  28. Wu, Z.; Ramsundar, B.; Feinberg, E.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Graph convolutional neural network. Node features are updated with weights w (node features after arrows in the figure). Then, the node of interest (orange) is updated with the features of its neighbors.
Figure 1. Graph convolutional neural network. Node features are updated with weights w (node features after arrows in the figure). Then, the node of interest (orange) is updated with the features of its neighbors.
Molecules 26 03125 g001
Figure 2. Overview of the proposed method. For simplicity, we illustrated the flow of feature update for a single attention node (green circle). Firstly, we generate pair features (yellow triangles) representing the relationship between the node and its neighbors. We use bond types and relative coordinates to extract edge relationships and three-dimensional structures. Then, we update the node features using the pair features (orange, yellow, and blue circles). Finally, we aggregate each path’s features of the attention node to obtain the node features, which are the output of this layer (purple circles). We repeat the above processes for feature updates.
Figure 2. Overview of the proposed method. For simplicity, we illustrated the flow of feature update for a single attention node (green circle). Firstly, we generate pair features (yellow triangles) representing the relationship between the node and its neighbors. We use bond types and relative coordinates to extract edge relationships and three-dimensional structures. Then, we update the node features using the pair features (orange, yellow, and blue circles). Finally, we aggregate each path’s features of the attention node to obtain the node features, which are the output of this layer (purple circles). We repeat the above processes for feature updates.
Molecules 26 03125 g002
Figure 3. Edge parameters.
Figure 3. Edge parameters.
Molecules 26 03125 g003
Figure 4. The structure of the model. The initial features are the 60-dimensional features.
Figure 4. The structure of the model. The initial features are the 60-dimensional features.
Molecules 26 03125 g004
Figure 5. Loss curves of the proposed model.
Figure 5. Loss curves of the proposed model.
Molecules 26 03125 g005
Figure 6. ROC curves.
Figure 6. ROC curves.
Molecules 26 03125 g006
Figure 7. Visualization results of attention values.
Figure 7. Visualization results of attention values.
Molecules 26 03125 g007
Table 1. Summary of the dataset used in the experiments.
Table 1. Summary of the dataset used in the experiments.
Dataset#MolsCategoryTask
Freesolv1128Physical chemistryRegression for water solubility
ESOL643Physical chemistryRegression for hydration free energy
QM77160Quantum mechanismRegression for Atomization energy
QM821,786Quantum mechanismRegression for excited state properties
BACE1513BiophysicsClassification for inhibitors of β -secretase 1
BBBP2039PhysiologyClassification for blood-brain barrier penetration
Table 2. Averages of MAE over 10 trials (Bold and underline are the best and the second-best, respectively).
Table 2. Averages of MAE over 10 trials (Bold and underline are the best and the second-best, respectively).
GCNWeave3DGCNProposed
Freesolv0.7640.8170.7430.717
ESOL0.5030.6650.5310.498
Table 3. Averages of MAE over 10 trials on quantum mechanism datasets (Bold and underline are the best and the second-best, respectively).
Table 3. Averages of MAE over 10 trials on quantum mechanism datasets (Bold and underline are the best and the second-best, respectively).
GCNWeave3DGCNProposed
QM710.7512.2212.899.13
QM8 (E1-CC2)0.008460.006080.006510.00611
QM8 (E2-CC2)0.00990.00800.00810.0077
QM8 (f1-CC2)0.01800.01610.01480.0141
QM8 (f2-CC2)0.03460.03270.03240.0303
QM8 (E1-PBE0)0.00820.00640.00710.0066
QM8 (E2-PBE0)0.009450.007050.008220.00712
QM8 (f1-PBE0)0.01540.01200.01240.0114
QM8 (f2-PBE0)0.02910.02590.02610.0247
QM8 (E1-CAM)0.00740.00610.00650.0058
QM8 (E2-CAM)0.00840.00650.00710.0063
QM8 (f1-CAM)0.01660.01320.01270.0123
QM8 (f2-CAM)0.03080.02680.02750.0259
Table 4. Averages of MAE for aggregation approaches.
Table 4. Averages of MAE for aggregation approaches.
ConcatSumMaxAttention
Freesolv0.6660.6630.7030.639
ESOL0.4720.4780.4880.484
Table 5. Average of MAE for path combinations.
Table 5. Average of MAE for path combinations.
Path Combinations
Node
Edge
3D
Freesolv0.7100.7020.8640.6400.6760.6850.639
ESOL0.4830.4980.5380.4770.4760.4820.484
Table 6. Classification results on BACE (Bold and underline are the best and the second-best, respectively).
Table 6. Classification results on BACE (Bold and underline are the best and the second-best, respectively).
GCNWeave3DGCNProposedProposed
(Node & Edge)
Accuracy0.8070.7510.7740.7990.811
Recall0.7790.7140.7260.7820.782
Precision0.7770.7390.7490.7630.783
F-score0.7780.7260.7370.7730.782
Table 7. Classification results on BBBP (Bold and underline are the best and the second-best, respectively).
Table 7. Classification results on BBBP (Bold and underline are the best and the second-best, respectively).
GCNWeave3DGCNProposedProposed
(Edge)
Accuracy0.8860.8710.8730.8740.884
Recall0.9420.9150.9230.9220.928
Precision0.9120.9170.9120.9150.922
F-score0.9270.9160.9180.9180.925
Table 8. Averages of MAE using fixed and learned edge parameters.
Table 8. Averages of MAE using fixed and learned edge parameters.
FixedConv-1Conv-3Conv-5Conv-7
Freesolv1.1810.7990.8720.9570.927
ESOL0.7060.5250.5980.5850.654
Table 9. Aberage of MAE with and without self node features.
Table 9. Aberage of MAE with and without self node features.
w/o Selfw/ Self
Freesolv1.5250.864
ESOL0.7260.538
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ishida, S.; Miyazaki, T.; Sugaya, Y.; Omachi, S. Graph Neural Networks with Multiple Feature Extraction Paths for Chemical Property Estimation. Molecules 2021, 26, 3125. https://doi.org/10.3390/molecules26113125

AMA Style

Ishida S, Miyazaki T, Sugaya Y, Omachi S. Graph Neural Networks with Multiple Feature Extraction Paths for Chemical Property Estimation. Molecules. 2021; 26(11):3125. https://doi.org/10.3390/molecules26113125

Chicago/Turabian Style

Ishida, Sho, Tomo Miyazaki, Yoshihiro Sugaya, and Shinichiro Omachi. 2021. "Graph Neural Networks with Multiple Feature Extraction Paths for Chemical Property Estimation" Molecules 26, no. 11: 3125. https://doi.org/10.3390/molecules26113125

Article Metrics

Back to TopTop