*Review* **Non-Targeted Screening Approaches for Profiling of Volatile Organic Compounds Based on Gas Chromatography-Ion Mobility Spectroscopy (GC-IMS) and Machine Learning**

**Charlotte Capitain and Philipp Weller \***

Institute for Instrumental Analytics and Bioanalytics, Mannheim University of Applied Sciences, 68163 Mannheim, Germany; c.capitain@hs-mannheim.de

**\*** Correspondence: p.weller@hs-mannheim.de; Tel.: +49-(0)621-292-6484

**Abstract:** Due to its high sensitivity and resolving power, gas chromatography-ion mobility spectrometry (GC-IMS) is a powerful technique for the separation and sensitive detection of volatile organic compounds. It is a robust and easy-to-handle technique, which has recently gained attention for non-targeted screening (NTS) approaches. In this article, the general working principles of GC-IMS are presented. Next, the workflow for NTS using GC-IMS is described, including data acquisition, data processing and model building, model interpretation and complementary data analysis. A detailed overview of recent studies for NTS using GC-IMS is included, including several examples which have demonstrated GC-IMS to be an effective technique for various classification and quantification tasks. Lastly, a comparison of targeted and non-targeted strategies using GC-IMS are provided, highlighting the potential of GC-IMS in combination with NTS.

**Keywords:** gas chromatography ion mobility spectroscopy (GC-IMS); volatile organic compounds (VOCs); non-targeted screening (NTS) using machine learning

#### **1. Introduction**

Quality control and early detection of hazard chemicals, allergens, or biological contaminants are critical to ensure product safety. Environmental pollutants, pesticides, or toxins, among others, can compromise food safety and pose a public health risk [1]. Furthermore, food adulteration and food fraud, accelerated by globalization, continue to cause economic losses and customer dissatisfaction and emphasize the need for robust, inexpensive, and fast analytical methods [2]. While new scientific findings continuously identify potential hazardous or allergenic compounds [3], commonly employed methods, which focus on the detection and identification of a particular compound or class of compounds, lack the ability to identify new or unknown compounds. Due to the inherent diversity of biogenic samples, as observed in food analysis, and the chemical complexity of the sample matrices, analysis often requires advanced sample preparation strategies [4]. For systematic monitoring of product quality, it is therefore desirable to develop analytical methods capable of discovering unknown or non-targeted compounds from the complex sample matrices. This approach, also referred to as NTS, requires comprehensive extraction and analysis of compounds of interest. Analysis of the volatile organic compounds (VOCs) of samples, also known as VOC profiling, allows for the detection of compounds in complex sample matrices without the need for detailed a priori knowledge of the molecular composition. Due to its high sensitivity and resolving power on the one hand and its simplicity and robustness on the other, ion mobility spectrometry (IMS) has gained popularity for the analysis of VOCs [5]. Moreover, gas chromatography coupled to ion mobility spectroscopy (GC-IMS) has been shown to be an easy-to-handle and yet highly effective tool for VOC profiling [6]. As a result, non-targeted VOC profiling based on GC-IMS in combination with machine learning has emerged as a promising method for sample monitoring.

**Citation:** Capitain, C.; Weller, P. Non-Targeted Screening Approaches for Profiling of Volatile Organic Compounds Based on Gas Chromatography-Ion Mobility Spectroscopy (GC-IMS) and Machine Learning. *Molecules* **2021**, *26*, 5457. https://doi.org/10.3390/ molecules26185457

Academic Editor: Thomas Letzel

Received: 24 July 2021 Accepted: 1 September 2021 Published: 8 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Since the 1970s, when IMS was first known as 'plasma chromatography', IMS has developed into a highly sensitive technique for the analysis of VOCs at ultratrace concentration levels, which accounts for additional information regarding the ion's mobility [7–9]. Due to the robust and easy-to-handle instrumentation, a wide range of application fields have been found for IMS today, such as food flavor analysis [5], process monitoring [10,11], and quality control [12], as well as detection and quantification of warfare agents [13] and explosives [14,15].

With IMS, analytes are first ionized in the ionization region of the instrument. The most common ionization method is the atmospheric pressure chemical ionization [16] by beta emitters, which frequently use nickel-63 (Ni-63) [11,15,17,18] or the less hazardous beta-emitting tritium (H-3) [19] or alpha-emitting americium-241 (Am-241) [20,21]. Other ionization methods are atmospheric pressure photo ionization (APPI) [22], which uses ultraviolet light (UV) [23,24] or corona discharge (CD) atmospheric pressure chemical ionization [17,25–27], where a high electric field between a needle and a metal plate or discharge electrode is used. Yet another method is the laser desorption/ionization technique (LDI), which employs a laser pulse as ion source [28].

According to the European Union directive, the exemption limit for the total activity for tritium was set to 1 GBq [29]. Therefore, the usage of a low-radiation tritium ion source with an activity of 300 MBq is not subject to authorization, hence leading to a broad adoption of tritium ion sources in a number of commercially available systems on the market [30–34]. Beta particles, which are emitted by the tritium source, initiate a gas-phase reaction cascade of the drift gas (nitrogen or air), resulting in predominant proton-water clusters H<sup>+</sup> [H2O]n, which are commonly referred to as 'reactant ions' [35]. The number of water molecules (n) depends on the gas temperature and the moisture content of the gas atmosphere [8]. Depending on the proton affinity, molecules entering the ionization region react with the reactant ions to protonated monomers MH<sup>+</sup> [H2O]n−x, while decreasing the intensity of the reactant ion peak (RIP). At higher analyte concentrations, proton-bound dimers M2H<sup>+</sup> [H2O]m−<sup>x</sup> are formed by the attachment of additional analyte molecules. When the concentration further increases, the formation of higher molecular cluster ions, such as trimers or tetramers, is possible; however, due to their low stability and short lifetime, higher molecular cluster ions are rarely observed [36]. In general, nonlinear behaviors are observed for the ratio of the RIP and the distribution between the protonated monomer and the proton-bound dimer [36,37]. The principles of a drift-time IMS including a H-3 ionization source are shown in Figure 1.

**Figure 1.** Setup of a drift-time IMS with a tritium (H-3) ionization source, adopted from [38] with permission (ID5138730886281).

Subsequent to ionization, the analyte ions enter the drift region, where they are accelerated towards the detector, typically a Faraday plate, and are separated by their drift time (or mobility) in an electrical field at ambient pressure. The ions are slowed down by the collision with counterflowing drift gas molecules in the collision cross-section (CCS). Due to an equilibrium between the acceleration by an electric field and deceleration by the collision with the drift gas molecules, the ions move with a constant velocity to the detector. Depending on the characteristic mass, charge, and structure, the ions are separated in the drift tube and reach the detector at different drift times [39]. For identification of the analyte, the inverse of the measured drift time is normalized to the drift length and the electric field resulting in the spectrum of ion mobility. The reduced ion mobility K<sup>0</sup> (see Equation (1)), which is independent of ambient conditions and experimental setup, is obtained after further normalization to pressure and temperature.

$$\mathbf{K}\_0 = \frac{\mathbf{L}}{\mathbf{E} \cdot \mathbf{t}\_\mathbf{D}} \cdot \frac{\mathbf{p}}{\mathbf{p}\_0} \cdot \frac{\mathbf{T}\_0}{\mathbf{T}} \tag{1}$$

With

K<sup>0</sup> = reduced ion mobility in cm2V −1 s −1 L = drift length in cm E = electric field strength in Vcm−<sup>1</sup> t<sup>D</sup> = drift time in s p = pressure of the drift gas in hPa p<sup>0</sup> = ambient pressure: p<sup>0</sup> = 1013.2 hPa T = temperature of the drift gas in K T0-ambient temperature: T<sup>0</sup> = 273.2 K

Instead of measuring temperature and pressure, the normalization is often carried out using the known mobility of the ions produced in the pure drift gas or by adding a reference analyte [40]. The signal intensity is proportional to the concentration and enables the quantification in ppb<sup>v</sup> (for some compounds even pptv) levels within a few milliseconds.

The state-of-the-art IMS technologies can be classified into time-dispersive, spacedispersive, and trapping technologies [41]. Time-dispersive IMS separates ions as a function of their mobility in a neutral gas, whereas space-dispersive IMS separates ions by the ratio of low-field to high-field mobilities [42]. Examples of time-dispersive IMS are drift tube ion mobility spectrometry (DTIMS) and travelling tube ion mobility spectrometry (TWIMS). High-field asymmetric waveform ion mobility spectrometry (FAIMS) and differential ion mobility spectrometry (DIMS or DMS) are examples for space-dispersive techniques [43]. The third class is represented by trapped ion mobility spectrometry (TIMS), which contains a trapping technology able to confine and release ions.

IMS alone has been applied for quantification [11,15,17,27] and classification [5] tasks in controlled environments. However, due to the inherent diversity of biogenic samples, the applications of IMS with direct sample introduction are often not sufficient, requiring prior purification or separation. The commonly used purification methods for VOC profiling in combination with IMS are solvent extraction [20,26,44] and solid-phase microextraction (SPME) [13]. SPME devices are constructed of a silica fiber coated with a thin layer of a suitable polymeric sorbent or immobilized liquid, used for the direct extraction of analytes from gaseous and liquid media [45]. While SPME coupled to IMS has been successfully used for quantification tasks, such as the detection and quantification of precursors and degradation products of chemical warfare agents [13], SPME is commonly extended by column separation techniques [18,46,47].

To avoid clustering in the ionization or drift region, IMS devices are commonly coupled to column separation techniques, such as liquid chromatography (LC) or gas chromatography (GC). Column separation coupled to drift-time IMS separates analytes into two orthogonal features, first the retention time through chromatography and second, the drift time or mobility through IMS, resulting in a two-dimensional (2D) highly resolved GC-

IMS spectrum [6,38]. In LC analysis, any soluble compound can be separated, but sample preparation is a critical step for the data quality [48]. A comprehensive extraction method which enables the extraction of a wide range of compounds with minimized potentially interfering coextractives is needed for NTS approaches, since unspecific compounds are being targeted [49]. LC-MS in combination with NTS has been applied for the detection of food contaminants and environmental hazards [50,51].

In GC analysis, the volatility of a sample is a prerequisite. Headspace (HS)-based techniques allow for the analysis of untreated samples, avoiding the time-consuming sample pretreatment steps [52]. The analysis of non-volatile samples may be achieved through the derivatization with a functional group onto the molecule of interest. Although the modification of the functional group enables the analysis of compounds that otherwise could not be easily monitored by GC, NTS approaches usually do not incorporate derivatization, in particular due to the high level of variance.

The advantages of GC-IMS in comparison to established techniques, such as mass spectrometry, are its simple and inexpensive design primarily due to being operated at atmospheric pressure and hence not requiring vacuum pumps [8]. Furthermore, the use of radioactive ionization sources allows for portability, miniaturization, and mechanical robustness and therefore is suitable for field and benchtop applications [52]. Due to efficient ionization, in combination with its fast and sensitive detection, IMS is a universal technique for the analysis of organic and inorganic molecules, atoms, or particles [38]. One potential challenge of IMS analysis is that spectra may contain interference due to widespread ionization, which results in low selectivity. The addition of suitable dopant substances, however, has been shown to overcome these limitations [53,54]. A nonlinear concentration range was previously described for IMS, requiring the careful monitoring of sample concentration to avoid sample saturation. Furthermore, the separation, which is based on CCS, often provides limited information regarding specific qualities concerning size and shape of analytes. However, the drawback of interference caused by spectral complexity and nonlinearities can be overcome by using computer-based analysis tools [55].

The complexity of biological samples results from the presence of a variety of compounds, which provide in their entirety a characteristic HS-GC-IMS spectrum, often referred to as the VOC profile or 'fingerprint' [56,57]. HS-GC-IMS has been demonstrated to be an effective technique for the evaluation of VOC profiles of biological samples due to its simple system setup, robustness, and price [44,58–61]. The chemical fingerprinting of food and beverages in combination with chemometric analysis is widely used for food authentication and ultimately to identify food adulteration and fraud [62]. Furthermore, the VOC profile is influenced by production processes as well as storage conditions. Consequently, process control and quality assurance, such as the control of food freshness or food safety, are topics of interest for NTS using HS-GC-IMS [63,64] techniques.

#### **2. Motivation for Non-Targeted Screening Using HS-GC-IMS**

Labelling fraud, e.g., of organic certifications or geographic origin, is the most common type of fraud in agricultural and food markets [65,66]. According to the European Commission, honey and olive oil are particularly affected by mislabeled botanical origin, as well as dilution with inferior or less expensive products [2,67]. Moreover, food adulteration and food fraud have led to cases of economic loss and may pose health risks [2]. The detection of food fraud or adulteration often involves the identification of compounds of unknown molecular composition. Since no identified chemical markers or sets of markers are commonly accessible for a target-based analysis, an analytical approach covering a multitude of parameters in parallel paired with strong discrimination power is required. The currently used methods to determine quality and authenticity, such as sensory analysis and physicochemical analysis [68], are time- and resource-consuming, while lacking sensitivity as well as prediction accuracy, not at least due to univariate analysis. To overcome the limitations of traditional, wet-chemistry-based assays, targeted and non-targeted approaches using chromatographic methods [69,70], often in combination with mass spectrometry [71–73],

as well as infrared (IR)-based spectroscopy [74,75], and proton nuclear magnetic resonance ( <sup>1</sup>H NMR) spectroscopy [76,77] have been discussed for various applications. However, to obtain the required reproducibility needed for chemometric analysis, time-consuming sample preparation, including precise adjustments of pH, water content or particle size, have been reported in combination with the mentioned methods. Furthermore, the high costs of ownership and maintenance, as well as the requirement for expert knowledge, may limit applications. Finally, high-end instrumentation also requires suitable laboratory infrastructures, which are usually not available at the point of care. Thus, robust, inexpensive, and fast analytical methods, such as HS-GC-MS, are needed, which require little or no sample preparation but deliver high selectivity.

Application examples for HS-GC-IMS with NTS:

A plethora of studies have shown the potential of HS-GC-IMS in combination with NTS for monitoring food quality or confirmation of geographical or botanical origin, despite the complexity of the samples. For example, HS-GC-IMS with NTS has been widely applied for the classification of olive oil between high-priced type 1 extra-virgin olive oil (EVOO), medium-priced type 2 virgin olive oil (VOO or OO), and non-edible type 3 olive oil, also known as pomace olive oil (POO) or lampante (virgin) olive oil (L(V)OO) [32,33,78,79]. Furthermore, HS-GC-IMS with NTS was successfully used for reliable classification of geographical origins for both olive oil (EVOO) [34,80,81] and wine [30]. Moreover, HS-GC-IMS with NTS was applied for the classification of honey according to botanical origin [52,81,82], as well as for the detection and quantification of honey adulterated with sugar cane or corn syrups [83,84]. Recently, HS-GC-IMS with NTS has been applied to assess the freshness of food [85] and for the detection of mold formation on milled rice [86], peanut kernels [87], and wheat kernels [88]. Further examples of recent studies using HS-GC-IMS with NTS are provided in Table 1.

*Molecules* **2021**, *26*, 5457

