1.1.2. Energy Scale and the Heterogeneity of the Protein Surface

As a next step, we invoked the formula of Waugh and Fedin, after the improvement of placing it on a fundamental temperature scale of the right dimension. The formula can then be used for aqueous solutions. The equation at atomic/molecular level is

$$E\_{0\text{a}}\text{ [erg]} = ck\_{\text{B}}T \text{ [erg]}.\tag{1a}$$

or applied to molar quantities it is

$$E\_{\rm{0m}}\text{ [kJ/mol]} = cRT\,\text{[kJ/mol]}.\tag{1b}$$

In these equations, *c* is a dimensionless quantity, i.e., a number, the value of which was determined by applying Equation (1b) to the melting of bulk ice, considering the melting heat of ice (6.01 kJ/mol [13]). The fundamental temperature equivalent with 273.15 K is *T*<sup>f</sup> = *RT* = 2.272 kJ/mol. In Equation (1b), the *c* proportionality constant is 2.65. When comparing Equation (1a) with the energy pertaining to one degree of freedom by the equation of equipartition (1/2 *k*B*T*), we may deduce the degree of freedom of a water molecule as 5.3, which seems to be in the right range for a rotating (and not translating) electric water dipole.

In addition, we introduced dynamic parameters for the quantitative characterization of the ordered/disordered state of protein molecules, which goes beyond their static structural description. Before formalizing the definitions, let us take a look at Figure 2 (and for details, Figures 3 and 4). There is a marked difference between the globular and intrinsically disordered proteins. On the melting diagram of the globular protein Ubq one can see a broad, temperature- (or excitation energy-) independent region (plateau). On the other hand, the plateau of the IDP ERD10 is significantly smaller. (A similar behavior was also seen for other proteins [5–10]). Significantly more information is provided by the initial (*T*fno) and the ending temperature (*T*fne) values of the plateau. The region between these two temperatures shows homogeneous bond (potential energy barrier) distribution, whereas the region above *T*fne shows a heterogeneity in terms of protein–water-bond energy distribution. After this introduction, the following quantities can be defined.

Heterogeneity ratio, *HeR*. According to our observations [5–10] and the literature quoted therein, protein molecules can be characterized and categorized by the homogeneity/heterogeneity of the energy distribution of water binding. The basis of the classification is the measurement of the ratio, for which we suggest the relation

$$HeR = (1 - T\_{\text{free}})/(1 - T\_{\text{free}})\_\prime \tag{2}$$

in which (1 − *T*fne) and (1 − *T*fno) give the measured distances from the melting point of ice. These values can be easily read from the novel *MD*s. *HeR* is 1 (one) for systems showing heterogeneous water binding (lacking a plateau) and 0 (zero) for homogeneous binding systems (e.g., bulk water), and is between 0 and 1 for partially heterogeneous systems. *HeR* therefore gives the order parameter type specification for what extent of the surface of the protein molecule can be regarded as showing heterogeneous potential energy distribution (disordered) in terms of water binding. It must be emphasized that this correlation measures the heterogeneity ratio based on the comparison of the extent of the two possible regions and does not measure the number of actual protein–water bonds in them.

#### 1.1.3. An Analytical Description of *n*

The introduction of fundamental temperature or energy scale makes it possible to describe *MD* by power series in the form

$$m = A + B(T\_{\rm fm} - T\_{\rm fm1}) + C(T\_{\rm fm} - T\_{\rm fm2})^2 + \dots \tag{3}$$

That is, we can define the total number of water molecules, *n*, that are moving at a given thermal energy (temperature), as well as the change of *MD* on a normalized fundamental energy scale, i.e., the differential form of melting diagram, *DMD*

$$
\Delta \text{n/} \Delta T\_{\text{fn}} = B + 2\text{C}(T\_{\text{fn}} - T\_{\text{fn2}}) + \dots \text{ } \tag{4}
$$

which defines the number of water molecules that begin to move at the given excitation energy. *T*fnx (with *x* = 1, 2, ... , *n*) is fitting parameter in Equations (3) and (4), in which *x* is equal to the exponent in each term (in the other terms too, with *n* ≥ 3 not given here in detail). The present form of equations calls attention to the validity of any term in a given temperature range. It should be emphasized that all quantities and coefficients are dimensionless in these formulae.

Number of protein–water bonds, *HeR*n. We can make a statement about the homogeneity/ heterogeneity of bonds (potential barriers) if we ask about the exact number of protein–water bonds in the given excitation energy range. Parameters that fit the power series provide the answer. In the simplest cases (including, in our experience, aqueous solutions with distilled water), in which there is only a wider heterogeneous range in *MD*, the number of water bonds in the heterogeneous region depends on the number of fitting members, *B*/(1 2212 *T*fne), and 2*C*/(1 − *T*fne); if both, then it depends on the sum of the two members. As simplification of the determination of the number of protein–water bonds (the degree of hydration), it can be directly read from the *DMD*s, i.e., the value or the sum of the areas colored in the figures enter (in principle, the definite integrals within the region *T*fne to *T*fn ≈ 1). Let *n*ho be the number of water molecules in the first hydrate shell and *n*he the total number of water molecules in the entire heterogeneous region. In this case, the second relation suggested for the ratio of heterogeneity is

$$HeR\_{\rm n} = n\_{\rm he} / (n\_{\rm he} + n\_{\rm ho}).\tag{5}$$

The value of *n*ho (approximately) is given by the area of the rectangle at the lowest excitation energy region, whereas *n*he in our case is given by the areas of triangles (in general, those described by members of higher exponents; see Figures 3 and 4).

The numbers in the equation can be measured directly based on *MD*. *n*ho can be determined with high accuracy as the average of all *n* points measured on the plateau, and (*n*he + *n*ho) as an approximate value by the *n* value reliably measured at a temperature close to the highest temperature, *T*fn ≈ 1. The process has a self-checking potential and thus improves the reliability of the data.

The measure of heterogeneity is *HeM*. We suggested [9] to introduce this as the parameter

$$HeM = (B + 2C)/(1 - T\_{\text{free}}).\tag{6}$$

This relationship is also correct in terms of dimensions, and the *HeM* value is generally a positive number. Its value is zero for proteins of almost equipotential molecular surface, so it can be considered as a quasi-order parameter. The denominator, (1 − *T*fne) designates the energy range in which there are varying protein–water bonds (of heterogeneous distribution), and *B* + 2*C* (going till the second term of non-zero exponent) is the number of bonds within this range. The fraction is thus a kind of slope of the *MD* function; its values cannot be limited to the range of 0 to 1, just like for the tangent function. Non-heterogeneously binding proteins, such as globular proteins by our experience, have a *HeM* value, by definition, which applies to the region above the plateau. It is not unfounded to suggest that there is a similar dynamic difference in the hydrogen mobility (*HM* [1]) and *HeM* parameters, i.e., in the mobility of all proton–proton pairs, and in the degree of heterogeneity of protein–water bonds.

In the power series of *n* Equation (3), we only went till the first two members of non-zero exponent (which is enough to interpret the results presented in most of our examples). Heterogeneity and homogeneity can be observed in both the nature and the magnitude of the respective potentials and their distance dependence. Variants of theoretical possibilities are found in the literature [14–17].

The determination of the *MD* function and its differential form that can also be described analytically allows for the unique and individual mapping of the energy distribution of the potential barriers that inhibit the motion of water molecules bound to the protein. Using the elements required for the interpretation of measured *MD*s we have introduced, the purpose of our present work can be easily formulated. Specifically, it is a deeper, thermodynamic interpretation of our results. The examples that illustrate this statement are presented through the analysis of the *MD* of the globular standard protein ubiquitin, and the intrinsically disordered ERD10.

#### **2. Results and Discussion**

In Figures 3 and 4, we show the *MD*s determined for the two proteins, dissolved in double-distilled water, with a panel (a) showing measurements on reference water too, and panel (b) the derived curves, *DMD*s (that is Δ*n*/Δ*T*fn the potential distribution of protein–water bonds). The information on the origin of the samples, the measuring equipment, and the details of the measurements is described in our above-mentioned articles and in book chapters [4,5].

**Figure 3.** (**a**) Melting diagram (*MD*, green circles) of ubiquitin dissolved in double distilled water and that of frozen water under identical conditions (blue squares). (**b**) *DMD* curves (that is, the potential barrier distribution of protein–water bonds). There is no reliable measured data in the range −1–0 ◦C (0.995–1.00 *T*fn). The data are given for 50 mg/mL protein concentration.

Perhaps it is not unnecessary to repeat that the amplitude value of the slow component of the measured FID signal extrapolated to *t* = 0 gives directly the number *n* of resonant protons (i.e., the protein-bound water molecules), whereas the temperature dependence of *MD* gives the dependence of *n* on thermal excitation energy.

The information can be read from Figures 3 and 4 as follows. Bulk water (blue squares) show the microscopic image of the ice-water phase transition. What would we expect of an absolute pure water sample of infinite size (in theory, one having a periodic boundary condition)? A single step of infinite slope at *T*fn = 1.00 and *E*<sup>a</sup> = 6.01 kJ/mol excitation energy (at 0.00 ◦C), in which all four bonds of the water molecule in the tetrahedral bond symmetry environment "melt" simultaneously. Instead moving water molecules are detected already below 0 ◦C. There are several reasons for this. The sample is not of infinite size, and the environment of the water molecules on the surface of the small sample is not the same as of those in the bulk environment. Secondly, the sample is not of absolute purity, so the environment of pollutions is not the same as in the clean environment. Third, the temperature of the sample in the measurement can be controlled and determined with limited accuracy only, especially at 0 ◦C.

**Figure 4.** (**a**) The melting diagram (*MD*, red stars) of ERD10 dissolved in double-distilled water and the melting curve (blue squares) of the solvent (water). (**b**) *DMD* curves are shown (that is, the potential barrier distribution of protein–water bonds). There is no reliable measured data in the range −1–0 ◦C (0.995–1.00 *T*fn). The data are given for 50 mg/mL protein concentration.

In Figure 3a, the "melting point" (−46 (1) ◦C (for definition of error, see Table 1) of the aqueous solution of ubiquitin shows the thermal energy investment (Δ*Q*) that is required to start to move the water molecules that are bound to the protein. The steep step (with a narrow, ≈0.01 kJ/mol energy range) shows that there are water molecules in the first hydrate shell that are bound almost identically. It is a reasonable approximation to consider these energies nearly the same, and the relevant molecular surface equipotential. This potential field of nearly identical elements resembles the feature of the H-bridges [16–19] and is largely different from strongly distant dependent potentials (the variants can be find in the text-books [16–19]). The number of protein–water bonds in the actual region is given by the area of the rectangle. As a self-check, the same quantity can be more accurately determined from the average of all *n* points on the *MD* plateau.

The next wide region is the plateau. (This region begins at *T*fno = 0.832 (4), in Figure 3b, in which the value of Δ*n*/Δ*T*fn is zero). The plateau carries very important information. No new water molecules begin to move in this excitation energy region, because there are no water molecules that are bound by corresponding energy to the protein. We can suggest that the H-bridges here, which link the bulk of the protein molecule to a globule. Thus, this can be an ancestral form of a higher order structure, which is represented not only by geometry but also by a bonding network of a certain energy. The heat invested within the plateau region does not start to move new water molecules; rather, it increases the specific heat, and the rotational speed of already rotating water molecules, as we have seen in a previous work on differential scanning calorimetry (DSC) measurements and data interpretation [7]. (Based on this interpretation, these statements can be made to be more accurate, which we intend to do in a short notice.)

*T*fne is the end of the plateau, and here begins the energy region where there are binding energies close to the binding energy of water–water bonds, presumably on parts of the protein molecule that are better exposed to water. In principle, the temperature dependence of *n* can be described by the higher exponents of the power function; the quadratic member was sufficient in this case. All data are available; we summarize the values and the order parameters introduced by us in Table 1.

In Figure 4a, we show the "melting point" (approximately −42 (2) ◦C) for ERD10 (red stars). We also repeat the above procedure for ERD10 with different parameters. The steep step (with narrow, ≈0.01 kJ/mole energy range) shows the presence of water molecules of nearly identical binding energy in the first hydrate shell, but this region is followed by a plateau, which is significantly narrower than that observed in the case of globular proteins. We then observe a phase of continuous rise in *MD*, which can be well approximated by the quadratic (or even higher) component of the summation. A much larger part of the molecular surface is exposed to water than in the case of ubiquitin, i.e., about 69–77% of the protein molecule can be described as disordered. The range (1 − *T*fne) of energy barriers inhibiting water movements (which can be defined as disorder) is more than three times broader than for the selected globular protein.

Figure 3b and Figure 4b depict the changes (differential quotient) of the mobile water fractions by normalized functional temperature, i.e., they are the graphical representations of Equation (4). As outlined, the bars at low temperature (around −45 ◦C) correspond to the relatively high differential quotient values describing the first few data points greater than zero. The fraction of mobile hydration water increases here within a few degrees to the level of *n*(*E*a,o) or *A* while the first mobile hydration layer forms, which gives the high differential quotient values.

**Table 1.** Characteristic thermal quantities for two sample proteins. *T*fno end *T*fne give the start and the end points of the plateau in *MD*s, respectively, as normalized fundamental temperature. *n*ho and *n*he values are given as the mobile hydration water fraction and as the number of mobile hydration water per protein molecule. *HeR*, *HeR*n, and *HeM* are dynamic parameters describing heterogeneity from various aspects (see text).


\* The number in parentheses is the measurement error in the order of magnitude of the last number; the heterogeneity ratio is defined by the relation (4) or (6); \*\* Lower limit estimate due to the uncertainty of measured data is close to *T*fn = 1; at *T*fno value given in Table 1 (−43 ◦C), the excitation energy is 5.06 (4) kJ/mol for both proteins; at *T*fne for ubiquitin, the excitation energy is 5.798 (2) kJ/mol at −9.9 ◦C; and for ERD10, it is 5.31 (3) kJ/mol at −36 ◦C.

A comparison of *HeM* values (analogous with the tangent function) shows that in globular proteins the realization of the two extreme values, conditions in the first hydrate shell and water-water bonding, are very close. For ERD10, a much wider distribution of potential energy barriers is characteristic of structural disorder. The typical data are summarized in Table 1. The ordered/disordered state of the two protein molecules only approximates the ideal limiting values, *HeR* = 0 and *HeR* = 1.

The reality of the *n*ho number of protein–water bonds in the homogeneous binding energy region (in other words, in the first hydrate shell) is better appreciated by reference to our knowledge of the hydration of protein-forming amino acids [18]. The sum of the numbers of the possible H-bridges of ubiquitin molecule gives 211. According to our measurements, the number of water molecules bound in the first hydration shell by similar binding energies is *n*ho = 226 (3).

The summation of possible H-bridges within ERD10 yields 986. According to our measurements, the number of water molecules bound in the first hydration shell by similar binding energies is *n*ho = 514 (13). The difference between measured and estimated values is unsurprising, especially in light of the good agreement found for ubiquitin. It is reasonable to ask the question whether approximately half of the H-bridges does not link with other water molecules, but realize some other type of bond.

Among the quantities given in Table 1, it is necessary to emphasize the determination of the relative number of bonds that fall into the heterogeneous region (*n*he/(*n*he + *n*ho)). The result is surprising if one is thinking in terms of a globular protein molecule, because for ubiquitin, the protein is in contact with an additional *n*he > 102 (33) water molecules, which is approximately 36% of all bound water-molecules. The bonds of these water molecules are dominated by water–protein bonds, which are close in energy to the of water–water bonds. In the case of ERD10, the protein surface is in contact with an additional *n*he > 2200 (220) water molecules, which is approximately 73% of all bound water-molecules. In the bonds of the latter water-molecules, water–protein bonds similar to water–water bonds dominate with a substantially wider energy distribution for this initially disordered protein.

It is maybe unnecessary to emphasize that the values we suggest are derived from direct measurements, i.e., they do not rely on assuming any hypothesis or model! They allow to determine the number of first-neighbor water molecules per amino acid (*n*ho/amino acid), which is 226/76 ∼= 3.0 for UBQ and 514/260 ∼= 2.0 for ERD10. The round value within an error of 1% is surprising, as well as the close match of 2.0 with other values observed for other globular proteins (casein, lysozyme, and BSA, to be published).

Therefore, the measured number of bound water-molecules for ubiquitin is 328 (30). Molecular dynamics simulation estimation from the literature [19] gives a value of 379. For ERD10, the numbers per protein molecule is 2714 (263) (measured) and 881 (estimated by molecular dynamics simulation [19]). The difference between the two proteins and the reverse ratio raise many questions about the nature of protein–water bonds that are still difficult to answer.
