Abstract
For a graph , a degree-based graphical index takes the general form , where is a symmetric map and is the degree of . For , if (resp. ), the index is called the general product-connectivity (resp. general sum-connectivity ) index. In this paper, by formulating an optimization problem, we determine the value(s) of , for which the linear/multiple correlation coefficient of and with physicochemical properties of benzenoid hydrocarbons is the strongest. This, in turn, fills some research gaps left by similar studies in this area.
Keywords:
chemical graph theory; graph; benzenoid hydrocarbon; boiling point; enthalpy of formation; general sum-connectivity index; general product-connectivity index; optimization MSC:
05C92; 05C90; 05C09
1. Introduction
Structure–property modeling employs molecular descriptors [1] to generate regression models correlating the physicochemical, biological, or thermodynamic properties of chemical compounds. Degree-based graphical indices are a class of graph-theoretic molecular descriptors that gained popularity in efficiently correlating the physicochemical properties of benzenoid hydrocarbons (BHs). In 1975, Randić introduced the connectivity index, commonly referred to as the Randić index (cf. [2]). Over the years, this index has emerged as the predominant molecular descriptor in Quantitative Structure–Property Relationship (QSPR) and Quantitative Structure–Activity Relationship (QSAR) studies (cf. [2]). Its mathematical properties have been extensively examined, as succinctly outlined in two recent monographs [2,3]. Moreover, various modifications and alternative formulations of this index have been proposed in the scientific literature (cf. [4,5]). In the present discourse, we also explore a closely affiliated variant of the connectivity index, denoted as the sum-connectivity index [6]. For some recent progress on the structure–property modeling of the physicochemical properties of nanostructures and bio-molecular networks, we refer to [7,8,9,10].
In order to test the quality of a certain class of molecular graphical descriptors, it is customary to conduct comparative testing by selecting suitable test molecules and their particular chemical properties. Gutman and Tošović [11] tested the quality of degree-dependent graphical descriptors for correlating the physicochemical properties of ismeric octanes (representatives of alkanes). Malik et al. [12] extended this study of degree-based molecular indices from octane-isomers to benzenoid hydrocarbons (BHs). Hayat et al. [13] (resp. Hayat et al. [14]) further extended the work from physicochemical properties to the quantum-theoretical (resp. thermodynamic) properties of BHs.
In their study, Gutman and Tošović [11] selected isomeric octanes as test molecules, whereas, other studies [11,12,14] opted for the lower 20–30 BHs as test molecules for their investigation. Moreover, Gutman and Tošović [11] and Malik et al. [12] selected the normal boiling point () and the standard enthalpy of formation to represent physicochemical characteristics. Van der Waals and intermolecular forms of interactions are represented by , whereas, advocates for the thermal characteristics of a compound. On the other hand, the total -electronic energy () was selected to represent quantum-theoretical characteristics by Hayat et al. [13] and the entropy and heat capacity were selected to advocate for thermodynamic properties by Hayat et al. [14].
All of the aforementioned quality testing revealed the strong potential of both general product-connectivity and sum-connectivity indices to efficiently correlate the physicochemical, thermodynamical, and quantum-theoretical characteristics of benzenoid hydrocarbons. For instance, Malik et al. [12] showed that, among all degree-based descriptors, and are the top two indices in correlating physicochemical characteristics of BHs. Similarly, Hayat et al. [13] showcased that and are the best descriptors in predicting the of BHs, whereas Hayat et al. [14] showed that and are best two indices for correlating the thermodynamic properties of BHs. However, the disadvantage to these studies is that they consider both and in their comparative testing for only finite values of , i.e., . Since both and deliver strong potential in correlating various properties of BHs, it is natural to consider these indices by considering the general . Note that there might be a possibility that some other nonlinear function , for instance considering other powers of , could work even better. However, the current study is restricted to investigating the estimation potential of and only.
In summary, current comparative studies considered and for and showed that both and with some of these values of correlate well with the physicochemical properties as well as the total -electron energy () of benzenoid hydrocarbons (BHs). For instance, Malik et al. [12] showed that and are the top two best degree-based predictors for correlating the physicochemical properties of BHs. Moreover, Hayat et al. [13] showed that correlates well with the of BHs. The only limitation of these studies was that they considered and for only. So, if and deliver good predictors for these fixed integral values, both and might deliver even better predictors if we consider the general values of .
In this paper, we determine the value(s) of for which both and deliver strong predictive potential for the physicochemical properties of BHs. Multiple correlation and regression analyses were also conducted to find the best for which the strongest multiple correlation is delivered both by and simultaneously. Following Gutman and Tošović [11], the physicochemical properties and were selected as the test properties of BHs. Moreover, 22 lower BHs were selected as the test molecules as the public availability of the experimental values of and is ensured for these test molecules. A computational method was used to calculate the and of these 22 BHs and then a detailed statistical analysis was conducted to find the suitable values of for which both and deliver strong predictive potential.
2. Mathematical Preliminaries
For a chemical graph , a degree-based graphical index takes the general form , where is a symmetric map and is the degree of . The product-connectivity index of G, proposed by Randić in [15] back in 1975, is one of the earliest degree-based graphical indices. Later on, the index was renamed as the Randić index. Mathematically, it takes in . Thus, the product-connectivity descriptor is defined as:
The diversity of its applicability in cheminformatics makes the Randić index one of most-studied structure graphical descriptors. For instance, its mathematical and chemical properties were extensively examined in [2,16,17,18,19].
Introduced by Zhou and Trinajstić [6], the sum-connectivity index is another degree-related molecular graphical descriptor. For a graph G, it considers in . Therefore, the sum-connectivity of G has the defining structure:
The reader is suggested [12,13,20,21] for further studies on both applicative and mathematical perspectives of the sum-connectivity index.
The successful applicability of the product-connectivity and sum-connectivity indices motivated researchers to consider variants of these descriptors. Perhaps, the most well-studied variants are the generalized variants of the product- and sum-connectivity indices. For , if (resp. ), the index is called the general product-connectivity (resp. general sum-connectivity ) index. The general product-connectivity index was put forward by Bollobás and Erdös [4] in 1998 while generalizing the classical index:
where . There have been numerous contributions in the chemical and mathematical literature published on the general product-connectivity index, see, for example, [2,22,23,24,25].
Similarly, Zhou and Trinajstić [26] in 2010 proposed the general sum-connectivity index with the following defining structure:
where . A detailed mathematical treatment is reported in [27,28,29,30]. The application perspective of is reported in Gutman and Tošović [11] and Hayat et al. [14]. Obviously,
In the field of statistics, the correlation coefficient between two finite-mean random variables X and Y is defined to be , where cov is the covariance function, and and represent the standard deviations of the random variables X and Y, respectively. The correlation coefficient measures both the direction and strength of the linear relationship between a predictor Y and a response variable X. For a series of k measurements of these variables, denoted by and , the value is estimated by
where and . Values of closer to 1 indicate a strong linear relationship between X and Y.
The correlation coefficient is strongly linked to the concept of the linear regression of Y against X by assuming a regression line where represents random errors, and are coefficients to be estimated. The ordinary least squares method is typically employed, with closed-form solutions of the estimators and for a and b, respectively, being readily available and widely known. In particular, for this simple linear regression model, , where and are the unbiased estimators of and , respectively, while . Evidently, the correlation is related to the slope of the regression line.
The standard error of fit and correlation coefficient are both key goodness-of-fit measures in regression analysis. The standard error of fit is defined as
where (the regression line’s resulting predicted value). This quantifies how much the observed values deviate from the values predicted by the model. Using various types of mathematical or statistical software, they can be calculated.
The linear regression model can be extended to include multiple predictors, e.g., . Suppose we have two predictors and , we may define the multiple correlation measure between these predictors and a single response variable Y as follows:
In the context of multiple linear regression, the quantity is usually referred to as the coefficient of determination. It is interpreted as the proportion of variability in the response variable Y that is accounted for by the predictor variables and . The value R, thus, provides a measure of the correlation between the observed values of Y and the values predicted by the multiple linear regression model involving and .
3. Materials and Methods
Every benzenoid hydrocarbon can be inherently depicted through a benzenoid system, defined as a finite, connected plane graph devoid of cut vertices, wherein each internal face is enclosed by a regular hexagon possessing sides of unit length.
The following definitions, as presented in [31], are applicable. Let B be a benzenoid system with v vertices and p hexagons. For any path of length ) within B, the associated vertex degree sequence is defined as . Subsequently, a fjord, cove, bay, and fissure refer to paths with degree sequences (2, 3, 3, 3, 3, 2), (2, 3, 3, 3, 2), (2, 3, 3, 2), and (2, 3, 2), respectively. These paths are traversed along the perimeter of B, as depicted in Figure 1. Fjords, coves, bays, and fissures are all considered different types of inlets. The number of inlets, k, is then defined as the total number of fjords, coves, bays, and fissures summed.
Figure 1.
Fissure, cove, bay, and fjord in a benzenoid system.
Suppose a benzenoid system B has p hexagons, k inlets, and v vertices. Let denote the number of B’s edges that satisfies the conditions and , where and , respectively, are the degrees of the ends a and b of an edge. By Lemma 1 in [31], we have
By (3) and (7), the benzenoid system B has the general product-connectivity index as follows:
Table 1.
The structural configuration of molecules, along with the experimental data pertaining to the standard enthalpy of formation () and the normal boiling point (), is provided for a set of 22 lower benzenoid hydrocarbons.
Table 1 provides information on the molecular structure, normal boiling point (), and standard enthalpy of formation () for various polycyclic aromatic hydrocarbons (PAHs). Additionally, Table 2 presents data on the general product-connectivity index and the general sum-connectivity index for the 22 lower BHs.
Table 2.
The general product-connectivity index and the general sum-connectivity index of the 22 lower benzenoid hydrocarbons.
4. Results and Discussion
Recall that the general product-connectivity index and the general sum-connectivity index considering a range of values exhibit a high degree of accuracy in predicting the boiling point and enthalpy of formation for the lower benzenoid hydrocarbons (BHs).
First, we employed the method described in Section 3 to evaluate the exact analytical expressions for and for the 22 lower BHs provided in Table 1. In particular, we utilized expressions for and in (8) and (9), respectively, to compute their exact values. Note that, we only needed the number of vertices v, the number of inlets k, and the number of hexagons p for a given hexagonal system to compute its and values. The next example explains the methodology in Section 3 to compute the general sum- and product-connectivity indices for a given BH graph.
Example 1.
From the data shown in Table 2, we generated four curves, as illustrated in Figure 2, Figure 3, Figure 4 and Figure 5. For these 22 lower BHs, the correlation coefficient curves for their physicochemical properties ( in Figure 2 and Figure 3; in Figure 4 and Figure 5) and the indices ( or ) are drawn in the respective figures in solid lines, distinguished by colors.
Figure 2.
Correlation coefficient curves between general indices and of lower benzenoids (far view).
Figure 3.
Correlation coefficient curves between general indices and of lower benzenoids.
Figure 4.
Correlation coefficient curves between general indices and of lower benzenoids (far view).
Figure 5.
Correlation coefficient curves between general indices and of lower benzenoids.
Comparing the two general indices, the general product-connectivity index is the best measure of the boiling point for BHs for , as shown in Figure 3, while for any other , the sum-connectivity index is the best. On the other hand, as measures of the enthalpy of formation of benzenoid hydrocarbons, the general product-connectivity index is better for , as can be seen in Figure 5, while for any other , the sum-connectivity index is better.
There exists a good correlation between and when is in some interval. For example, for , and have a correlation coefficient greater than 0.996558. Similarly, there also exists—for in different intervals—a good correlation between and , between and , and between and , as shown in Figure 6.
Figure 6.
Intervals for good for -, -, -, - for lower BHs.
By Figure 3 and Figure 5, we have that, for the 22 lower BHs, and are the most linearly correlated with and , respectively, among all product-connectivity indices, and and are the most linearly correlated with and , respectively, among all sum-connectivity indices. The linear correlations (with 95% confidence intervals) between the physicochemical properties ( and ) and both of the aforementioned indices, respectively, are given below:
Note that s and are the standard error of fit and correlation coefficient, respectively. Figure 7 shows scatter plots between the boiling point and the indices and , and scatter plots between the enthalpy of formation and the indices and for the 30 lower benzenoids.
Figure 7.
Scatter plots of -, -, -, - for lower benzenoids.
It is obvious from (10)–(13) that the product-connectivity indices and , respectively, are the best for measuring the boiling point and enthalpy of formation among all the examined indices. All the Octave codes have been made publicly accessible. See the Supplimentary Information at the end of the paper.
Recall that Gutman and Tošović [11] considered and to be representatives of physicochemical properties. Moreover, they considered isomeric octanes as test molecules. We applied our study on the 18 isomeric octanes and the preliminary results showed that the value(s) of for the 22 lower BHs yielding a good estimate of and were not the same as they were for isomeric octanes. Thus, the current study and the corresponding intervals/values of are limited to BHs only. However, we expect a similar behavior for other BHs (different from the 22 lower BHs considered in this study) as well.
5. Simultaneous Predictive Potential of and
In this section, we are interested in finding value(s) of for which the correlation of either or with both properties and simultaneously is the strongest. In order to achieve that, we need to consider the multiple correlation coefficient of either or with both and by treating them as two independent variables. Let be the dependent variable and be the two independent variables. Note that the multiple correlation determines the relationship with one dependent and more than one independent variable. Since there are two representatives of physicochemical properties, i.e., and , we employ multiple correlation between one graphical descriptor and the two chosen properties . This was able to deliver the predictive potential of a descriptor with the two properties simultaneously rather than determining the correlation strength of the considered descriptor with both properties individually.
In the case where the response variable y depends on an unknown parameter , the value of multiple correlation above also depends on , i.e., . A preliminary plot of in the region reveals a unimodal shape with a maxima in this region. A built-in optimizer in the R programming language was employed that yielded the value that maximizes the multiple correlation value . Figure 8 presents the corresponding plot elaborating this calculation.
Figure 8.
Plot delivering the value for , which maximizes the multiple correlation value .
Figure 9 exhibits the matrix plot showing the distribution of the variables as well as the bivariate relationships between them (using the optimal value ).
Figure 9.
Distribution of variables and and the bivariate relationships between them. Moreover, the main diagonal entries present the distribution of data points. The *** denotes the continuance of decimal expression.
Next, we study the multiple correlation between and the two chosen physicochemical properties . In the case where the response variables y depends on an unknown parameter , the value of R above also depends on , i.e., . A preliminary plot of in the region again reveals a unimodal shape with a maxima in this region. This time, the built in R optimizer yielded the value , so . Figure 10 presents the corresponding plot elaborating these values.
Figure 10.
Plot delivering the value for , which maximizes the multiple correlation value .
Figure 11 exhibits the matrix plot, showing the distribution of the variables as well as the bivariate relationships between them (using the optimal value ).
Figure 11.
Distribution of variables and and the bivariate relationships between them. Moreover, the main diagonal entries present the distribution of data points. The *** denotes the continuance of decimal expression.
6. Conclusions
We showed in this paper that there is good correlation between the boiling point () and general product-connectivity index (), between and the general sum-connectivity index (), between the enthalpy of formation () and , and between and when is in some intervals (as in Figure 6) for benzenoid hydrocarbons. Moreover, for and (marked with blue dashed lines in Figure 3 and Figure 5), rather than is a better measure of the boiling point and enthalpy of formation, respectively, of G. For an outside those intervals, is a better measure of both the boiling point and enthalpy of formation of G.
7. Future Work
This paper examined the correlation power of the general product-connectivity index and general sum-connectivity index to determine the boiling point () of benzenoid hydrocarbons, and their correlation power to determine the enthalpy of formation () of benzenoid hydrocarbons. We propose further studies to similarly examine other generalized valency-based indices in determining the and of benzenoid hydrocarbons. We also propose the following open problem:
Problem 2.
Can one use a learning technique to learn a nonlinear function of minimal complexity that is a best predictor of ?
Author Contributions
Conceptualization, S.H. and H.J.; Methodology, A.A.; Software, A.K. and H.J.; Validation, S.H. and M.J.F.A.; Formal analysis, A.A. and A.K.; Investigation, A.A. and A.K.; Resources, H.J. and M.J.F.A.; Data curation, H.J. and M.J.F.A.; Writing—original draft, A.A.; Writing—review & editing, S.H., H.J. and M.J.F.A.; Supervision, A.K.; Project administration, S.H.; Funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.
Funding
S.H. is supported by UBD Faculty Research Grant with Grant Number UBD/RSC H/1.4/FICBF(b)/2022/053 and the National Natural Science Foundation of China (No. 622260-101). A.K. is supported by the Key Laboratory of Philosophy and Social Sciences in Guangdong Province of Maritime Silk Road of Guangzhou University (No. GD22TWCXGC15), the National Natural Science Foundation of China (No. 622260-101) and by the Ministry of Science and Technology of China (No. WGXZ2023054L). M.J.F.A. extends his appreciation to Researcher Supporting Project number (RSPD2024R582), King Saud University, Riyadh, Saudi Arabia.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Gutman, I.; Furtula, B. Novel Molecular Structure Descriptors—Theory and Applications I; University of Kragujevac: Kragujevac, Serbia, 2010. [Google Scholar]
- Trinajstić, N.; Li, X.; Gutman, I. Mathematical aspects of Randić-type molecular structure descriptors. Croat. Chem. Acta 2006, 79, A31–A32. [Google Scholar]
- Gutman, I.; Furtula, B. (Eds.) Recent Results in the Theory of Randić Index; University of Kragujevac: Kragujevac, Serbia, 2008. [Google Scholar]
- Bollobás, B.; Erdös, P. Graphs of extremal weights. Ars Comb. 1998, 50, 225. [Google Scholar] [CrossRef]
- Devillers, J.; Balaban, A.T. Topological Indices and Related Descriptors in QSAR and QSPR; Gordon and Breach: Amsterdam, The Netherlands, 1999. [Google Scholar]
- Zhou, B.; Trinajstić, N. On a novel connectivity index. J. Math. Chem. 2009, 46, 1252–1270. [Google Scholar] [CrossRef]
- Ullah, A.; Aurangzeb; Zaman, S. A new perspective on the modeling and topological characterization of H-Naphtalenic nanosheets with applications. J. Mol. Model. 2022, 28, 211. [Google Scholar] [CrossRef]
- Ullah, A.; Shamsudin; Zaman, S.; Hamraz, A. Zagreb Connection topological descriptors and structural property of the triangular chain structures. Phys. Scr. 2023, 8, 025009. [Google Scholar] [CrossRef]
- Ullah, A.; Zaman, S.; Hamraz, A.; Muzammal, M. On the construction of some bioconjugate networks and their structural modeling via irregularity topological indices. Eur. Phys. J. E 2023, 46, 72. [Google Scholar] [CrossRef]
- Ullah, A.; Zaman, S.; Hussain, A.; Jabeen, A.; Belay, M.B. Derivation of mathematical closed form expressions for certain irregular topological indices of 2D nanotubes. Sci. Rep. 2023, 13, 11187. [Google Scholar] [CrossRef]
- Gutman, I.; Tošović, J. Testing the quality of molecular structure descriptors. Vertex-degree-based topological indices. J. Serb. Chem. Soc. 2013, 78, 805–810. [Google Scholar] [CrossRef]
- Malik, M.Y.H.; Binyamin, M.A.; Hayat, S. Correlation ability of degree-based topological indices for physicochemical properties of polycyclic aromatic hydrocarbons with applications. Polycycl. Aromat. Compd. 2022, 42, 6267–6281. [Google Scholar] [CrossRef]
- Hayat, S.; Khan, S.; Khan, A.; Liu, J.-B. Valency-based molecular descriptors for measuring the π-electronic energy of lower polycyclic aromatic hydrocarbons. Polycycl. Aromat. Compd. 2022, 42, 1113–1129. [Google Scholar] [CrossRef]
- Hayat, S.; Suhaili, N.; Jamil, H. Statistical significance of valency-based topological descriptors for correlating thermodynamic properties of benzenoid hydrocarbons with applications. Comput. Theor. Chem. 2023, 1227, 114259. [Google Scholar] [CrossRef]
- Randić, M. Characterization of molecular branching. J. Am. Chem. Soc. 1975, 97, 6609–6615. [Google Scholar] [CrossRef]
- Cruz, R.; Gutman, I.; Rada, J. On benzenoid systems with minimal number of inlets. J. Serb. Chem. Soc. 2013, 78, 1351–1357. [Google Scholar] [CrossRef]
- Furtula, B.; Gutman, I. Comparing energy and Randić energy. Maced. J. Chem. Chem. Eng. 2013, 32, 117–123. [Google Scholar] [CrossRef]
- Randić, M. The connectivity index 25 years after. J. Mol. Graph. Model. 2001, 20, 19–35. [Google Scholar] [CrossRef] [PubMed]
- Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Wang, S.; Zhou, B.; Trinajstić, N. On the sum-connectivity index. Filomat 2011, 25, 29–42. [Google Scholar] [CrossRef]
- Xing, R.; Zhou, B.; Trinajstić, N. Sum-connectivity index of molecular trees. J. Math. Chem. 2010, 48, 583–591. [Google Scholar] [CrossRef]
- Balister, P.; Bollobás, B.; Gerke, S. The generalized Randić index of trees. J. Graph Theory 2007, 56, 270–286. [Google Scholar] [CrossRef]
- Li, J.; Li, Y. The asymptotic value of the zeroth-order Randić index and sum-connectivity index for trees. Appl. Math. Comput. 2015, 266, 1027–1030. [Google Scholar] [CrossRef]
- Li, F.; Ye, Q. Second order Randić index of fluoranthene-type benzenoid systems. Appl. Math. Comput. 2015, 268, 534–546. [Google Scholar] [CrossRef]
- Li, X.; Zheng, J. Extremal chemical trees with minimum or maximum general Randić index. MATCH Commun. Math. Comput. Chem. 2006, 55, 381–390. [Google Scholar]
- Zhou, B.; Trinajstić, N. On general sum-connectivity index. J. Math. Chem. 2010, 47, 210–218. [Google Scholar] [CrossRef]
- Akhter, S.; Imran, M.; Raza, Z. Bounds for the general sum-connectivity index of composite graphs. J. Ineq. Appl. 2017, 2017, 76. [Google Scholar] [CrossRef] [PubMed]
- Du, Z.; Zhou, B.; Trinajstić, N. On the general sum-connectivity index of trees. Appl. Math. Lett. 2011, 24, 402–405. [Google Scholar] [CrossRef]
- Milovanović, I.Ž.; Milovanović, E.I.; Matejić, M. Some inequalities for general sum-connectivity index. MATCH Commun. Math. Comput. Chem. 2018, 79, 477–489. [Google Scholar]
- Zaman, S. Cacti with maximal general sum-connectivity index. J. Appl. Math. Comput. 2021, 65, 147–160. [Google Scholar] [CrossRef]
- Rada, J.; Araujo, O.; Gutman, I. Randić index of benzenoid systems and phenylenes. Croat. Chem. Acta 2001, 74, 225–235. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
































