**Tiddo J. Mooibroek**

van 't Hoff Institute for Molecular Sciences, Universiteit van Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands; t.j.mooibroek@uva.nl; Tel.: +31(0)205-25-72-08

Received: 28 July 2019; Accepted: 12 September 2019; Published: 16 September 2019

**Abstract:** A systematic evaluation of the CSD and the PDB in conjunction with DFT calculations reveal that non-covalent Carbon-bonding interactions with X–CH3 can be weakly directional in the solid state (*P* ≤ 1.5) when X = N or O. This is comparable to very weak CH hydrogen bonding interactions and is in line with the weak interaction energies calculated (<sup>≤</sup> –1.5 kcal·mol−1) of typical charge neutral adducts such as [Me3N-CH3···OH2] (**2a**). The interaction energy is enhanced to <sup>≤</sup>–5 kcal·mol−<sup>1</sup> when X is more electron withdrawing such as in [O2N-CH3··O=Cdme] (**20b**) and to <sup>≤</sup>18 kcal·mol−<sup>1</sup> in cationic species like [Me3O<sup>+</sup>-CH3···OH2] <sup>+</sup> (**8a**).

**Keywords:** intermolecular interactions; non-covalent interactions; carbon-bonding interactions; crystal structure database analysis; density functional theory

#### **1. Introduction**

The manner in which molecules interact with one another is largely determined by non-covalent interactions.c [1] So-called 'σ-Hole interactions'c [2–5] like hydrogen bonding are prominent identifiable interactions that bear biological significance [6]. Such σ-hole interactions have also been identified with other non-metals [7–11] like halogen atoms to generate halogen bonding interaction [12,13]. The impact of halogen bonding interactions on molecular biology has come into focus since about 2004 [14]. Indeed, evaluations of the protein data bank (PDB) [15] revealed that halogen bonding is structurally very similar to hydrogen bonding [12,14,16–18] and can be functionally relevant [19–22]. Relatively weak π-hole interactions [4,23–31] involving organic carbonyls, [26,32–36] π-acidic aromatics, [37,38] metal carbonyls [33,34,36,39] and nitro-compounds [40–45] are increasingly acknowledged as relevant drivers of molecular aggregation such as in ligand-protein complexes.

The impact of a novel type of weak interaction on molecular recognition phenomena naturally leads one to speculate that other non-canonical interactions may play a similar role. One interesting candidate are σ-hole interactions involving sp3-hybridized C-atoms. Such interactions have been studied since about 2013 [7,46] and are particularly interesting because sp3-C is abundant in living systems. More specifically, the methyl group (X–CH3, where X = any atom or group) is frequently encountered in natural and synthetic compounds and 'non-covalent Carbon bonding' involving methyl groups has thus been studied by various researchers [47–59]. Most of these contributions are computational inquiries, while a small amount of these articles also deals with an analysis of non-covalent Carbon bonding interactions in protein structures present in the Protein Data Bank (PDB) [47,50,56]. Interestingly, none of the studies so far have systematically evaluated the crystal structure data present in the Cambridge Structure Database (CSD) [60,61]. What is more, evaluations of the PDB were largely anecdotal or only considered structures that comply to the (rather strict) geometric criteria of a Carbon bonding geometry. Some also included *intra*molecular contact distances (which are notoriously difficult to evaluate).

In this contribution a combined CSD and PDB evaluation is presented aimed at elucidating whether electron rich entities have a preferential orientation around a methyl group within a rather large envelope, i.e., whether intermolecular non-covalent Carbon bonding interactions with methyl groups are directional. For evaluative purposes, several Density Functional Theory (DFT) computations were conducted as well. This combined database/DFT study reaffirms that non-covalent Carbon-bonding interactions with X-CH3 can be significant, although the interaction is hardly directional, in particular when the methyl group is poorly polarized such as most C–CH3 structures.

#### **2. Materials and Methods**

#### *2.1. General Information on Database Analyses*

The CSD [60,61] version 5.40 including two updates (until May 2019) was inspected using ConQuest [62] version 2.0.2 (build 246353, 2019). X-ray powder structures were omitted from the searches, which were further limited to structures containing 3D coordinates and those with an R-factor ≤ 0.1. The PDB was queried using Relibase [63] 3.2.3 and restricted to protein and DNA crystal structures where the packing environment was also searched. No other restrictions were imposed on the PDB search. Datasets were obtained using the general query shown in Figure 1a. The methyl groups were split in those connected to a C, N, O, P, or S atom (X in the figure, in the PDB search specified as part of a ligand). The interacting 'electron rich' partners (ElR in the figure) considered were a water, amide or carboxy-O atom, a sulphur atom or the centroid of an aryl ring (in the PDB search always specified as part of the protein). The geometric constraints imposed on the searches were that the *inter*molecular distance *<sup>d</sup>* between the methyl C-atom and ElR was <sup>≤</sup>5 Å and that the X–CH3···ElR angle (α) was 90◦–180◦. All the data were thus confined within a hemisphere with a basal radius of 5 Å, centered on the methyl C-atoms as is shown in Figure 1b.

**Figure 1.** Representation of the method used to retrieve and analyse data from the CSD and the PDB. (**a**) general query to obtain data with *<sup>d</sup>* <sup>≤</sup> 5Å, <sup>α</sup> <sup>=</sup> <sup>90</sup>◦–180◦, X <sup>=</sup> C, N, O, P or S and ElR (electron rich entity) is as indicated. (**b**) Illustration of the method used to assess directionality (see text for details).

#### *2.2. Methodology to Generate P(*α*) Plots*

The datasets obtained as described above (2.1) were analysed to assess whether the distribution of ElR within the methyl-centered hemisphere reflects any directionality. This method has been successfully applied to assess the directional behaviors (in the solid state) of various other weak non-covalent interactions such as anion/lone-pair-π, [29,64] CH-π, [11,65] halogen-π [66,67] and nitro π-hole interactions [42,68]. The method works by first computing the freely accessible volumes at each α-value (αfree) by subtracting the volume of a model methyl group from a spherical cone with 5 Å height and a cone angle of 180-α. This can be achieved by using the 3D-drawing program Autodesk® Inventor® Pro [29]. This is illustrated in Figure 1b, where the spherical cones are shown at 10◦ intervals. The model methyl group was generated by using standard aliphatic C–H bond distances (1.06 Å) [69] and the van der Waals radius of C (1.70 Å) and H (1.09 Å) [70]. The interfering volume between each spherical cone and the model methyl group can be obtained using the 'inspect interference' option in Autodesk® Inventor® Pro; the red part in Figure 1b is the interfering volume involving a spherical cone with a cone angle of 60◦ (i.e., at α = 120◦). The volume differences between such 'free' volumes

with increasing values of α thus give the absolute volume distribution of freely accessible volume around a methyl model within the hemisphere, as a function of α: Δαfree(α). Dividing each volume (Δαfree) in this distribution by the total freely accessible volume (i.e., the volume of a hemisphere minus the interfering volume of the model methyl group in that hemisphere) thus gives the relative volume distribution as a function of α: Δrelαfree(α). This distribution is the random (or volume) distribution. The data retrieved form the CSD and the PDB can be binned as a function of α. Relating this binned data to all the data in a dataset thus gives the observed relative distribution as a function of α: Δrelαdata(α). The quotient of this relative data distribution over the random distribution is a measure for the actual probability (*P*) of finding data at a certain value of α. That is, *P*(α) is unity for a random distribution of data, while *P*-values larger than unity reveal a relative concentration of data, which is indicative of attractive interactions.
