**1. Introduction**

The determination of the relative and absolute configuration of natural products is essential to understand their interactions in the biological field and to allow their procurement through total synthesis. The structure determination of natural products by NMR spectroscopy [1–3] is usually divided into two more or less "independent" approaches: (a) constitutional assignment and (b) configurational and conformational assignment (see Figure 1). The constitutional assignment will not be covered in the present manuscript. We will focus on the discussion of the assignment of the relative configuration and conformation only.

**Figure 1.** Structure elucidation of natural products by NMR spectroscopy.

### *1.1. NOEs*/*ROEs in Structure Elucidation*

So far, there is no general NMR method for a secure assignment of the relative configuration of non-crystallizable natural products [4–6]. Valuable information is provided by NOEs or ROEs which allow to derive actual interproton distances by volume integration of the cross-peaks in the NOESY or ROESY spectrum. The H, H distances are obtained by the comparison of the peak volume with a cross-peak of known distance (the so-called calibration or reference peaks). The determination of the relative configuration from NOE- or ROE-derived interproton distances can be accomplished in di fferent ways [3]. In the past, this was mainly carried out in a qualitative way using molecular-mechanics or density functional theory (DFT) derived structure models. In particular, the DFT approach is restricted to relatively small systems because these types of calculation quickly become prohibitively expensive for larger structures or large numbers of diastereomers that need to be considered.

Another possibility would be to run rEM (restrained energy minimization) [7] or rMD (restrained molecular dynamics) [8,9] simulations for all possible relative configurations, and generally, the one with the lowest error with respect to the experimental restraints is chosen as the correct relative configuration of the investigated molecule. The disadvantage of this approach is that it is very time consuming, especially for molecules with many unknown stereogenic centers because for every diastereomer separate simulations need to be run (2*<sup>n</sup>*−<sup>1</sup> calculations), albeit these may be automated in computer-assisted structure elucidation protocols [10,11]. However, MD simulations are biased by the choice of the force-field (for uncommon structural fragments these might even lack appropriate parameters at all) and the user's choice of the initial geometry ("starting configuration and conformation"). Relative conformational energies obtained from DFT calculations may be inaccurate up to ~1–2 kcal/mol−<sup>1</sup> (amounting to errors in Boltzmann weights of conformers di ffering by factors of ~0.20–0.03 at 300 K!) depending on the treatment of electron-electron correlation and/or dispersion interactions [12].

One method of choice for small molecules with several stereogenic centers is the combination of distance geometry (DG) [13–16] and distance bounds driven dynamics (DDD) calculations using NOE/ROE-derived distance restraints (r) [3,5,16–19]. The most important aspects of the NOE/ROE-restrained DG/DDD method (rDG/DDD) is the possibility to allow configurations to dynamically change during the simulation (floating chirality, fc) and therefore to determine the conformation and the relative configuration of small organic molecules simultaneously (fc-rDG/DDD). The DG approach (see Figure 2) considers holonomic distance restraints as lower (*dmin*) and upper (*dmax*) bounds of atom-atom distance relations, which are derived from the molecular constitution (which must be known!), as well as 1,2- (bonds), 1,3- (angles), and 1,4-connectivities (torsions) and experimental NOE/ROE-derived restraints can be added to this set of limits. Within these restraints, structure models are generated solely based on distance information, removing the bias to any initial input reference model, and these models are further refined in a simulated annealing approach. Chirality is incorporated in the DG approach using signed chiral volumes, which basically describe the volume enclosed by the substituents on tetrahedral centers, and which simultaneously encode for opposite configurations through sign inversion (see Section 4).

**Figure 2.** Workflow of rDG/DDD calculations: Based on distance restraints such as bond lengths (1,2-distance), angles and torsions (1,3- and 1,4-distances), excluded van der Waals volumes, and NOE/ROE derived distance limits, a matrix of upper (top right triangle) and lower (bottom left triangle) distance bounds between all atom-atom pairs *i*, *j* is constructed (**a**). Based on these distance limits, initial guess structure models of arbitrary configuration and conformation are generated through a "*metrization*" procedure in 4D space (**b**, for clarity shown as 3D models). These models are subsequently refined through an automated sequence of simulated annealing steps in 4D and 3D space (**c**), by which finally the correct configuration evolves as the best-fit structures of lowest pseudo energy (total error). In particular step (**a**) ensures that the final structures generated are not biased by any input structure, and through step (**b**,**<sup>c</sup>**), evolve solely on the basis of experimental data. At no point of a rDG/DDD simulation, neither a conventional parametrized force-field is involved, nor are any presumptions on conformational preferences implied. All relative configurations of stereogenic centers emerge exclusively based on experimental data.

The concept of floating chirality (fc) was introduced for the assignment of diastereotopic protons or methyl groups in proteins. This approach was first applied in 1988 to distance geometry (DG) calculations [20] and in 1989 to rMD simulations [21]. In DG calculations, floating chirality is achieved by not using chiral restraints (chiral volumes) for unknown prochiral and stereogenic centers, whereas in rMD, floating chirality is achieved by reducing or removing the force constants of the angles which define the chiral centers. Even more, DG uses no energy penalty or additional out-of-plane terms to guarantee that the full set of permutations for all stereogenic centers is generated. In general, DG uses a single chiral volume restraint on one selected stereogenic center only in order to avoid enantiomeric configurations (see discussions below). However, in contrast to rMD simulations, DG does not use any physical force-field of any type, and thus removes any intrinsic bias imposed on the results by this choice. DG relies solely on experimental data like distances between atoms or anisotropic data (see below) and all stereogenic centers are allowed to adopt their relative configuration in accordance with the experimental data.

Moreover, the strength of the DG approach is that all structure models are first generated in four-dimensional (4D) space before these are transferred into "real" 3D space. The extra dimension provides additional degrees of freedom to assemble structures of di fferent configurations and conformations within the limits of the distance bounds. Most notably, the sequence of 4D and 3D simulated annealing steps has major benefits for the robustness and quality of configurational sampling, as inversions of 3D objects (e.g., stereogenic centers) become simple rotations in 4D, and thus the "energy" barriers between alternate diastereomers are e ffectively lowered or even removed altogether (see Figure 2, and in the Section 4, Figure 12).

NMR-derived experimental data such as NOE/ROE distances, scalar couplings, residual dipolar couplings (RDCs), residual quadrupolar couplings (RQCs), and residual chemical shift anisotropies (RCSAs) can be incorporated in this DG approach. Here, all experimental parameters are accounted for as sums of squared violations Δ *X*<sup>2</sup> = ( *Xexp* − *Xcalc*) 2 of experimental versus back-calculated values, and these deviations are added up in a harmonic approximation as pseudo energy terms *E* = 1/2*K* - Δ *X*<sup>2</sup>

with empirical force constants *K*. In total, the sum of these terms based on NMR data, and violations of distance bounds or, if applicable chiral volume restraints, define a dimensionless total penalty or pseudo energy function, which must not be confused with a MD- or DFT-derived "real" molecular energy, and the lower this pseudo energy penalty becomes, the better the restraints based on experimental data are fulfilled. A comprehensive description of all energy terms is given in the Section 4. In this context, these violation energies, and in particular their partial derivatives ∂*Etotal*/∂*r* with respect 4D and 3D Cartesian atomic coordinates, are considered as forces which drive the structure evolution in a simulated annealing type approach–and thus the structures evolve from the data rather than being evaluated against pre-calculated structures only.

Up till now a general application of the DG approach to all di fferent kind of natural products was hindered by the fact that NOEs/ROEs cover only short-range interactions (up to 400 pm for small molecules) and was hampered or even impossible for proton-deficient structures. This can now be overcome by the use of anisotropic NMR parameters (RDCs, RQCs, and RCSA) in the structure under investigation.

### *1.2. RDCs in Structure Elucidation*

In contrast to NOEs/ROEs, residual dipolar couplings (RDCs) are anisotropic NMR parameters, which are global in nature and independent from the distances between the vectors connecting the coupling nuclei. RDCs, RQCs, and RCSAs are NMR observables that can now be used within the fc-rDG/DDD method using the recently published software *ConArch*+ [22,23]. Within this investigation, only the usage of RDCs will be discussed here.

Standard NMR investigations are carried out in isotropic solutions, where usually the dipolar couplings are averaged out by isotropic tumbling of the molecules. If this is not the case, either by the presence of paramagnetic metal ions [24] or anisotropic susceptibility of diamagnetic macromolecules [25] or, more general, the presence of an anisotropic medium, the molecules will be partially oriented with respect to the external magnetic field, and residual dipolar couplings (RDCs) can be measured (detailed reviews can be found at [26–31]). An anisotropic environment is generated by an alignment medium (AM), examples for AMs are stretched gels [32–40] or lyotropic liquid crystalline (LLC) phases [41–49].

The size of <sup>1</sup>*D*CH RDCs depends on the time-averaged orientation of the CH-vector and its averaged angle with respect to the external magnetic field *B*0 (see Figure 3). The one-bond (CH) dipolar coupling is usually obtained by comparison of HSQC-type experiments run in isotropic and anisotropic environment [50,51]. A very popular variant of these HSQC experiments is the so-called CLIP/CLAP-HSQC [52], which is run without F2 decoupling in order to observe the one-bond coupling in F2. The residual dipolar coupling adds to the scalar coupling leading to a total coupling constant ( <sup>1</sup>*T*CH) from which the residual dipolar coupling (<sup>1</sup>*D*CH) can be calculated (<sup>1</sup>*T*CH = <sup>1</sup>*J*CH + 2 <sup>1</sup>*D*CH).

Analysis of RDC data is less straightforward than the interpretation of isotropic data such as chemical shifts and scalar coupling. However, given a molecular geometry for the compound analyzed, RDCs can be back-calculated from the experimental data and this structural model in a parameter-free fashion using natural constants only, and an alignment tensor can be computed which describes the average orientation of the molecule in relation to the magnetic field [53]. Frequently, alternative relative configurations of analytes imply di fferent relative orientations of CH vectors, and thus RDCs are very sensitive configuration probes even for cases, where stereogenic centers are separated by many bonds. Usually, the configuration which displays the best correlation between experimental and back-calculated RDC data ( *Dexp* vs. *Dcalc*) is considered as the correct one (see Figure 3). RDC analysis is based on the assumption that the chemical shifts in the isotropic and anisotropic phases do not change or change only slightly. The standard procedure does not include a re-assignment of the molecule under study in the anisotropic phase, but the assignment could be questionable if larger changes in the chemical shifts are observed.

**Figure 3.** Matching of RDC data against two alternative diastereomers of tubocurarine (**4**), which di ffer in their configuration at C-24: (**a**) correct, and (**b**) wrong relative configuration of C-1 and C-24. The stereogenic centers C-1 and C-24 are marked in green and orange, respectively, and the directionality of the corresponding methine C-H bonds is indicated by the colored vectors in both structure models. The di ffering average orientation of each of these C-H bond vectors relative to the external magnetic field (blue vector) of the NMR spectrometer leads to di fferent RDCs back-calculated for both diastereomers (colored values in the plots of *Dexp* vs. *Dcalc* ), the better correlation between experimental and back-calculated data identifies (**a**) as the correct relative configuration of **4**, whereas (**b**) could be ruled out.

However, crucial for the interpretation of RDC data is the fact that accurate structure proposals must be provided at first hand, which are then evaluated against the experimental NMR data, and a thorough error analysis has to be carried out in order to ascertain configurational assignments [23]. The necessity for pre-evaluation of conformational preferences may become problematic for flexible or larger molecules. Moreover, this type of analysis has to be repeated for all 2*<sup>n</sup>*−<sup>1</sup> diastereomers if the molecule contains *n* stereogenic elements. In a recent report [22], we have demonstrated on how to include RDC information in DG simulations in both 4D and 3D space, using a pseudo energy penalty function *ERDC* = 1/2*KRDC* - (*Dexp* − *Dcalc*) 2 similar as described above. This now provides the advantage that the prerequisite of the beforehand structure generation is dropped altogether. Instead, the correct configuration emerges from these RDC-driven rDG types of simulations as a direct consequence and within the boundaries of these experimental restraints.

Though the mathematical details for the treatment of NOEs/ROEs and RDCs di ffer vastly, the pseudo energy error function allows to arbitrarily combine these di fferent types of restraints within DG, and structures are generated fulfilling all experimental parameters best. However, there is one additional fundamental di fference between NOE and RDC data. For the former, only a single NOE "data set" can be obtained, whereas for the latter RDCs multiple "data sets" can be obtained when measuring the NMR data under di fferent alignment conditions (i.e., di fferent alignment media [23,54–57], multi-component multi-phase AM [46], temperature dependent AM [43,58], etc.). Though this might entail considerable experimental e ffort, these multi-alignment data sets can also be exploited in the DG implementation of ConArch+ [22]. Under the assumption that the conformational preferences of the analyte do not change significantly for alternate alignment conditions, di fferent sets of RDCs can provide crucial additional and independent structure information, which may contribute significantly to the certainty with which configurational assignments are supported by experimental data [23,54–57].

In the sequel, the application of the fc-rDG/DDD method will be demonstrated on five complex natural products (see Scheme 1). The dimeric cyclic pyrrole-imidazole alkaloid (PIA) axinellamine A (**1**) isolated from the marine sponge *Axinella* sp. in 1999 [59] is the first compound to study. The second example is also a dimeric cyclic PIA from the marine sponge *Stylissa caribica*, tetrabromostyloguanidine (**2**) from 2007 [60], and the synthetic massadine derivative 3,7-*epi*-massadine chloride (**3**) is the last one of the PIA series from 2008 [61]. Finally, the terrestrial plant alkaloids tubocurarine (**4**) from *Chondrodendron tomentosum* [62] and vincristine (**5**) from *Catharanthus roseus*[63] are examples discussed here to illustrate

the limitations of configurational analysis based on NOE/ROE data solely, and only the combined approach of using distance as well as RDC data allows to deduce their configurations unequivocally.

**Scheme 1.** Structural formulae of the investigated molecules with atom numbering: axinellamine A (**1**), tetrabromostyloguanidine (**2**), 3,7-*epi*-massadine chloride (**3**), tubocurarine (**4**), and vincristine (**5**).

### **2. Results and Discussion**

Compounds **1**–**3** are cyclic dimeric pyrrole-imidazole alkaloids (PIAs) with eight contiguous stereogenic centers each, resulting in 128 possible relative configurations (diastereomers), respectively. Axinellamine A (**1**) and 3,7-*epi*-massadine chloride (**3**) possess tetracyclic cores, whereas tetrabromostyloguanidine (**2**) features an even more complex hexacyclic core. For the PIAs **1**–**3** only ROE-derived interproton distances were used. The interproton distances were extracted from a ROESY spectrum with a mixing time of 100 ms (in case of **3**: 300 ms). For all compounds the interproton distances ±10% were used as distance restraints in the floating chirality restrained DG/DDD calculations (fc-rDG/DDD), additional details on the calculations on **1**–**3** are given in the Section 4 and the Supporting Information. As NMR can anyhow determine relative configurations only, in all rDG simulations a single stereogenic center of **1**–**3** each was fixed by applying a chiral volume restraint in order to avoid enantiomeric structures. The number of the generated structures in the fc-rDG/DDD calculations was set to 1000 to allow for reasonable sampling of the configurational and conformational space. Additional simulations applying different chiral volume restraints and/or sampling lengths, as well as in-depth analyses of the rDG runs are described in the Supporting Information. In the following, we report the application of the fc-rDG/DDD method to assign the relative configuration of all stereogenic centers for compounds **1**–**3** simultaneously, and based on ROE data alone.

### *2.1. Configurational Assignment with ROEs Only*

### 2.1.1. Axinellamine A (**1**)

For the configurational assignment of axinellamine A (**1**) 35 interproton distances from ROESY spectra were used (the complete list of ROEs of **1** is given in the SI, Table S1). As mentioned above, one stereogenic center of **1** was fixed and set as reference (C-14). In the traditional approach of pre-calculating structures, this would entail the necessity to evaluate a total of 128 diastereomers. Indeed, inspection of the output on the rDG protocol shows that all 128 configurations are actually generated by the "metrization" process in 4D space, but many of these molecular geometries severely violate the restraints imposed by the ROE data even in this higher dimension, and thus do not "survive" even the 4D refinement of simulated annealing. At the end of the 4D sampling phase, 40 alternative configurations were obtained (see Supplementary Figures S2 and S3), out of which even only 37 did emerge finally from the 3D sampling, albeit many of these structures display severe ROE violations.

The over-all exceedingly high efficiency of configurational sampling by rDG, and the results for 1000 generated possible structural candidates of axinellamine A (**1**) are shown in Figure 4a ("best 700") as a graphical representation of the total error (dimensionless) for each structure, ordered according to ascending total errors. The first wrong structure (wrong configuration of **1**) in respect to the eight stereogenic centers is No. #598 (red circle in Figure 4a). This structure differs from structures #1 to #597 by the configuration of C-1. The first "pseudo-configurational" change was already observed at structure No. #365 (orange circle in Figure 4a). This is the alternative assignment of the diastereotopic protons at the methylene group C-1-. Mathematically there is no difference between stereogenic and prochiral centers, which means that for axinellamine A (**1**) altogether ten centers needed to be assigned. Chemically only the stereogenic centers are of importance for the differentiation of the stereoisomers, but the prochiral centers are important to support the configurational assignment. In this example, only C-1- is used, whereas the second prochiral center (C-1-') does not contribute to the results since no ROE to both H's of C-1-' have been observed.

Most notably, the first wrong configuration of **1** (#598) appears rather late in this sequence of energy sorted structures sampled, visualizing the efficiency of sampling (the total number of structures with the correct configuration for axinellamine A (**1**) is even 760/1000). Additionally, the second best (first wrong) alternative configuration is separated from the best-fit global energy minimum structure by significant energy steps and a large pseudo energy difference of the penalty error function (Δ*Etotal* = 3.15). Within the rDG approach, both of these characteristics are indicative for an unambiguous configurational assignment of **1** based on the experimental NMR data used, and the plot in Figure 4b shows, that all structures #1 to #597 indeed feature the same relative configuration of all stereogenic centers.

Figure 4a illustrates very well that the correct relative configuration of axinellamine A (**1**) appears in different conformations with respect to the orientation of the side chains. There are already six steps before a different configuration is observed, which originate from alternate local conformational changes that mainly include the orientation of the side chains (see Figure 4b). The inset plot in Figure 4a shows the first "energy" step in detail.

It must be stressed, that this rDG simulation is actually a single, fully automated sequence of calculations–and not 128 individual calculations on alternate diastereomers–by which the correct configuration of axinellamine A (**1**) is quickly and highly reliably identified. At no point of this simulation is a physical force-field involved, and the final assignment emerges based on experimental data only irrespective of the starting configuration.
