*Review* **Approaching Authenticity Issues in Fish and Seafood Products by Qualitative Spectroscopy and Chemometrics**

#### **Sergio Ghidini , Maria Olga Varrà \* and Emanuela Zanardi**

Department of Food and Drug, University of Parma, Strada del Taglio 10, 43126 Parma, Italy; sergio.ghidini@unipr.it (S.G.); emanuela.zanardi@unipr.it (E.Z.)

**\*** Correspondence: mariaolga.varra@studenti.unipr.it; Tel.: +39-0521-902-761

Received: 9 April 2019; Accepted: 8 May 2019; Published: 10 May 2019

**Abstract:** The intrinsically complex nature of fish and seafood, as well as the complicated organisation of the international fish supply and market, make struggle against counterfeiting and falsification of fish and seafood products very difficult. The development of fast and reliable omics strategies based on spectroscopy in conjunction with multivariate data analysis has been attracting great interest from food scientists, so that the studies linked to fish and seafood authenticity have increased considerably in recent years. The present work has been designed to review the most promising studies dealing with the use of qualitative spectroscopy and chemometrics for the resolution of the key authenticity issues of fish and seafood products, with a focus on species substitution, geographical origin falsification, production method or farming system misrepresentation, and fresh for frozen/thawed product substitution. Within this framework, the potential of fluorescence, vibrational, nuclear magnetic resonance, and hyperspectral imaging spectroscopies, combined with both unsupervised and supervised chemometric techniques, has been highlighted, each time pointing out the trends in using one or another analytical approach and the performances achieved.

**Keywords:** fish and seafood; food authentication; chemometrics; fingerprinting; wild and farmed; geographical origin; vibrational spectroscopy; absorption/fluorescence spectroscopy; nuclear magnetic resonance; hyperspectral imaging

#### **1. Introduction**

The demand for fish and seafood products has increased notably during the last years, mostly as a consequence of the new special attention paid by consumers towards healthier food. The technological development that has invested the whole fisheries sector has additionally contributed to overcome the well-known obstacles to export fish and seafood worldwide, deriving from the high vulnerability of the products, to the point that today more than 35% of all caught and cultured fish is traded across national boundaries [1]. The growing competitiveness of the sector and diversification in fish supply chain have, in turn, led to the presence of a huge variety of look-alike products on the international market, whose global quality features are, however, quite different. More than 700 different species of fish, 100 of molluscan, and 100 of crustacean are, in fact, used as food for humans [2].

In this scenario, what is remarkable is that consumers demand not only for more fish, but for even safer and higher-quality fish, whilst the deliberate or accidental lack of transparency about the identity of products and fraudulent or negligent activities continue to grow. Based on what has been recently reported by the Food and Agriculture Organization, fish and related products have become among the most vulnerable to fraud category of food. Nevertheless, the effective monitoring of illicit practices in the fisheries sector is hampered by the increasing spread of highly processed fish products, in which the presence of different types of fraud can be hidden with ease [3].

The voluntary substitution of commercially valuable fish species with lower quality ones, represents the most recurrent form of fish fraud, although substitution can also take place accidentally when species look so similar that they are mistaken for each other. The geographical provenance and the production process are other current authenticity topics concerning fish and seafood products, whose falsification which is hard to bring to light, has a negative economic impact. Despite being economically motivated, mislabelling concerning these issues may occasionally represent a risk to public health. The illegal commercialisation of poisonous fish species (*Tetraodontidae, Molidae, Diodontidae*, and *Canthigasteridae* families) or the replacement of certain kinds of raw fish fillets with gastro-intestinal toxic fish (i.e., those belonging to the *Gempylidae* family) are just some of many examples. Likewise, occurrence of some harmful marine biotoxins may be linked to the geographical distribution of the producing organisms [4], while the presence of higher levels of heavy metals or residues of antibiotic and pesticides are more likely to be found in farmed products than in wild ones [5–7].

Ensuring a clear discrimination of the authenticity of fish and seafood is of special concern today not only for consumers, but also for producers, traders, and industries. Traceability throughout the whole production chain and at all stages of the market, covered by Regulations 178/2002/EC [8], 1005/2008/EC [9], and 1224/2009/EC [10], is considered to be the starting point for the assurance of a high level of safety and quality of food and ingredients, as it represents the basic instrument not only for preventing illegal activities, but also for protecting consumers through the opportunity to access information about the exact nature and characteristics of fish. Specific regulations for the provision of information to consumers [11], and the requirement to uniquely identify fish and seafood on the label [12], play also an essential role in providing more transparency regarding the nature of the products, as they allow consumers to make informed choices and further contribute to the implementation of seafood traceability. As a matter of fact, labels of all unprocessed and some processed fishery and aquaculture products must include information on both the commercial and scientific names of the species, whether the fish has been caught or farmed, the catch or the production area, the fishing gear used, whether the product has been defrosted, and the date of minimum durability (where appropriate). Many other voluntary claims can also be reported on the label, including the date of catch/harvest for wild/aquaculture products, information about the production techniques and practices, and environmental and ethical information [12].

All the claimed declarations appearing on the label must always be checked to verify whether they are truthful. Therefore, in spite of the utility of the traceability system, the fisheries sector needs effective methods to address the problem of fish authenticity and ensure product quality. Innovative analytical approaches based on the evaluation of total spectral properties, are rapidly gaining ground at all levels of current food authenticity research, thanks to their ability to simultaneously provide lots of information related to physical and chemical characteristics of the food matrix. Recent advances in chemometrics, moreover, have represented a major turning point in the dissemination of 'fingerprinting strategies', as they allow for the study of all the genetic, environmental, and other external factors influencing food identity, and to bypass many obstacles related to the application of conventional techniques [13]. This way, chemometrics can be now considered an essential tool for differentiation of similar samples according to the authentication issues of interest.

Until now, several spectroscopic techniques in conjunction with chemometrics have been used as rapid, simple, and cheap tools for fish quality and authenticity testing. Among these, vibrational (near-infrared (NIR), mid-infrared (MIR), Raman), fluorescence or absorption ultraviolet-visible (UV–Vis), and nuclear magnetic resonance (NMR) spectroscopies, together with hyperspectral imaging (HSI) spectroscopy, represent the most used techniques, even if they are still being developed.

Based on this background, the present review article has been designed to highlight the uses and developments of fast and reliable omics strategies based on UV–Vis, NIR, MIR, Raman, NMR, and HSI spectroscopies, with the attempt to address the key authenticity challenges within the fish and seafood sector. To this end, a brief discussion concerning basilar concepts underlying these techniques has been provided, and has been accompanied by a short overview about the implementation of several chemometric tools, in order to highlight the potential benefits in extracting relevant information from spectral data.

The main body of this review focuses specifically on the application, over the years, of spectroscopy and chemometrics to distinguish products in accordance with the species, production method (wild or farmed), farming system (conventional or organic; intensive, semi-intensive, or extensive), geographical provenance (different FAO areas and countries of origin), and the processing technique (fresh or fresh/thawed) that at present, correspond to the key authenticity concerns for which there must be ongoing and effective monitoring.

#### **2. A Conceptual Framework of Spectroscopy and Chemometrics**

Spectroscopy is the study of electromagnetic radiation interacting with matter, which can be absorbed, transmitted, or scattered on the basis of both the specific frequency of the radiation and the physical/chemical nature of the matter. When absorbed, radiation leads to a change in the energy states of atoms, nuclei, molecules, or crystals that make up matter, inducing an electronic, vibrational, or rotational transition, depending on the energy of the incident radiation [14]. When the radiation, at a specific frequency, is scattered by molecules (as in Raman spectroscopy), some changes can occur in the energy of the incident photon, which transfers parts of its energy to the matter. In any case, the result of these interactions is a spectrum enclosing many features of the matter analysed, which, when properly interpreted with the help of chemometrics, can be used in a great number of different applications. In choosing the most appropriate spectroscopic method to be used, consideration should be given to some factors, which go beyond the purely analytical purposes: the physical state and chemical composition of the sample, sensitivity, specificity, and overall accuracy of the technique, scale of operation, time of analysis, and cost/availability of the instrumentation [15].

For the sake of conciseness, the main features related to spectroscopic techniques used mostly in the food authentication field are summarised in Table 1.



*Molecules* **2019**, *24*, 1812

studies.

 H-1, C-13, and P-31 are the most frequently investigated nuclei in food

science-related

 nuclear magnetic resonance (NMR) applications.

#### *2.1. UV–Vis Absorption and Fluorescence Emission Spectroscopy*

UV–Vis spectroscopy involves the electronic excitation of molecules containing specific chromophore groups, which results from the absorption of photons at two wavelength regions of the electromagnetic spectrum. In the absorption mode, the amount of light retained by the sample is measured, while in the fluorescence mode the amount of light emitted after absorption is taken into consideration [15]. Typically, the UV–Vis spectrum is characterised by broad absorption or emission peaks which reflect the molecular composition of the matrix: by exploiting the unicity absorption or emission patterns of the entire spectrum, or by measuring the absorbance or fluorescence intensity of the analyte at one wavelength, this spectrum can be used for many food analytical qualitative and quantitative applications, respectively [16,17].

#### *2.2. IR Spectroscopy*

Infrared spectroscopy involves three different sub-regions of the electromagnetic spectrum, namely NIR, MIR, and FIR, whose absorption by samples results in vibrations of atoms in molecular bonds [18]. These vibrations give out a great amount of information related not only to chemical bonding, but also to the general molecular conformation, structure, and intermolecular interactions within the sample [19]. This way, IR spectra enclose the total sample composition, whose pattern of peaks distribution represents a unique signature profile and whose intensity of bands is linked to the concentration of specific compounds [20,21].

The NIR spectrum of food samples results from absorption by molecular bonds containing prevalently light atoms and it is characterised by the presence of broad and overlapping overtone and combination bands [22,23]. By contrast, spectral signature in the MIR region is characterised by the presence of more intense and delineated bands, whose position and intensity are more informative of molecule's concentration in the sample [24,25]. Here too, the spectral profile is complex and data mining is very difficult without the use of multivariate data analysis. Finally, with reference to FIR spectroscopy, it is noted that no applications to food authentication are currently available since it relates to molecules containing halogen atoms, organometallic compounds, and inorganic compounds, whose interest is more limited within the context of food research [26].

#### *2.3. Raman Spectroscopy*

Raman spectroscopy is a molecular vibration technique based on the inelastic Raman scattering, a physical effect that comes with molecular vibrations and triggers a change in the polarizability of the molecule [27]. In particular, this kind of spectroscopy focuses on the measurement of those small fractions of the radiation which is scattered by specific categories of compounds at higher or lower frequencies than incident photons. The typical Raman spectrum, showing intensities of the scattered light versus the wavelengths of the Raman shift, is characterised by sharp and well-resolved bands, which provide information about molecular structure and composition of the matter analysed.

For a long time after its discovery, Raman spectroscopy has been poorly exploited in food applications, by reason of several analytical disadvantages and interference (see Table 1). These drawbacks have now been overcome thanks to the overall technological improvement of Raman equipment: by way of example, surface-enhanced Raman spectroscopy (SERS) has recently made it possible to surmount hurdles related to faint scattering signals [28].

#### *2.4. Hyperspectral Imaging*

HSI is a technique cobbling together spectroscopy and computer vision to give useful information concerning the physicochemical characteristics of samples in relation to their specific spatial distribution. Briefly, HSI systems provide several hyperspectral images of the tested sample, corresponding to three-dimensional data containers, of which each sub-image is a map showing spatial distribution of the sample constituents in relation to each single wavelength [29,30].

Over the recent years, the steady usage growth of HIS technology in the field of food research has been mainly driven by the availability of different instrumental configurations that exploit fluorescence, absorbance, or light scattering phenomena. On the other side, application of spectral imaging technologies is not at all widespread in the food industry, due to a variety of factors ranging from high costs and low availability of instrumentations, to the computation speed and necessity of expertise by users [31].

#### *2.5. NMR Spectroscopy*

NMR spectroscopy is a very versatile technique for food analysis and its untargeted applications have become very popular. The first reason for NMR popularity is that the composition of the matter under study can be perfectly mapped out by the overall NMR spectral profiles, thus giving a comprehensive view for the identification of all major and minor food components [32]. At the same time, the area of the NMR spectral bands is directly proportional to the number of nuclei that produce the signal, so the technique is also well-suited for quantitative purposes. Additionally, despite relatively high NMR equipment costs and spectra interpretation difficulties, NMR spectroscopy is one of the only techniques available that can provide information about the regio/stereo chemistry of molecules [33].

On the basis of the physical state of the matter and on the intended aim of NMR application, different methodologies involving the use of NMR have been optimized. Among these, high-resolution NMR, low-field NMR, solid-state NMR, liquid-state NMR, and NMR imaging are the most used ones, any of which requires specific instrumentation and different approaches to sample preparation, data acquisition, and processing [34].

#### *2.6. Qualitative Chemometric Methods*

Raw spectra resulting from spectroscopic analyses are usually characterised by broad and unresolved bands containing too much information, some of which are certainly useful and need to be retained, but some of which hamper the correct data interpretation and need to be removed. Recent advances in chemometrics have marked an important milestone in spectra analysis, since they have simplified the identification of hidden interrelations between variables providing the key for discrimination and classification of samples [20,35]. In other words, qualitative chemometrics methods help to recognise similarities and dissimilarities within spectral data, which can be used to confirm the authenticity or detect adulteration of food samples [36].

Based on the explorative or predictive nature of the methodology, qualitative chemometric techniques are usually classified into unsupervised and supervised techniques. While unsupervised techniques are independent of prior knowledge of class membership of samples to perform classification, supervised techniques call for such knowledge. Brief descriptions of the principles behind the chemometric techniques which are being used to a greater extent are provided below.

#### 2.6.1. Spectral Pre-Treatments

Pre-treatment of spectral data is recognized as being fully integrated into the chemometric set-up itself. Prior to the development of chemometric models, raw spectroscopic data are suggested to be pre-processed by applying some corrections, aimed to enhance spectral properties and minimize the fraction of systematic variation which does not contain relevant information to the discrimination of samples. One such systematic variation is the sum of different physical effects which arise during instrumental acquisition of spectra (e.g., light scattering or background fluorescence phenomena), which are responsible for the appearance, especially in solids samples, of multiplicative, additive, and non-linearity effects (e.g., overlapping bands, baseline shifts/drifts, random noise) [37].

Thus, pre-processing algorithms are usually classified into signal correction methods (e.g., multiplicative scatter correction, MSC; standard normal variate, SNV), differentiation methods (first, second, or third order derivation), and filtering-based methods (e.g., orthogonal signal correction, OSC; orthogonal wavelet correction, OWAVEC) [38]. While signal correction and filtering-based methods are conceived to retain only the spectral information mainly by suppressing the light-scattering effects, derivative-based methods also help to reduce the spectral complexity through the separation of the broad overlapping bands.

A more detailed description of spectral pre-processing techniques can be widely found in the literature [37,39,40]. Either way, it is essential to point out that spectral filters are most often concatenated to exploit the effects of each one, but this concatenation might increase model complexity and background noise, resulting in an inaccurate chemometric modelling of data and, thus, wrong predictions. For this reason, it is recommended to customize the selection of the pre-treatments prior to performing chemometric analysis according to the spectroscopic technique used and the sample characteristics, trying to restrict, whenever possible, their number.

#### 2.6.2. Unsupervised Methods

Unsupervised methods look at the study of variability among samples for the purpose of identifying their natural characteristics and possible similarities among them, without the need to provide any information about the class to which samples belong.

Between the various available techniques, principal component analysis (PCA) is the most used one. PCA is a quite basic projection method able to reduce the original correlated variables into a smaller number of new uncorrelated latent variables (known as principal components), containing as much systematic variation as possible of the original data [41]. Score plot outputs deriving from PCA applications show in a simple and intuitive graphical way the hidden structures among samples, the interrelations among variables and between samples and variables, the probable presence of any outliers, and possible groupings or dispersion of sample according to specific class membership.

Hierarchical cluster analysis (HCA) is another frequently employed unsupervised method, based on the splitting of samples into different clusters. This splitting is based on the degree of analogy among samples and it is generally performed by evaluating the Mahalanobis or Euclidean distance between the same samples. The hierarchical approach followed is thus aimed at constructing a ladder, in which the most closely related samples are first classified into small groups, and then progressively assembled into bigger groups including less similar samples [35]. Results of HCA are graphically expressed by tree diagrams (dendrograms) showing relationships among clusters; nevertheless, despite being easily computable, dendrograms are often misunderstood, since the number of clusters to be considered is arbitrary, making the interpretation of results more subjective than objective.

#### 2.6.3. Supervised Methods

Supervised techniques require the previous knowledge of the class membership of the samples tested, which can be used to develop predictive models able to discriminate and classify future unidentified samples. There are several different chemometric techniques belonging to the category of the supervised methods, most of which require a training set (to find classification rules for the sample), and a test set (to assess the predictability of the model developed) [42].

Linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) are variance-based methods which use Euclidean distance to find those combinations of the original variables determining maximum separation among the different groups of samples [20]. Both techniques presume that the measurements within each class are normally distributed, but while LDA supposes that dispersion (covariance) is identical for all the classes, QDA, on the contrary, allows the possibility of different dispersion to be present within different classes [35]. Although QDA is considered an extension of LDA, there are some common limitations, for instance the risks of overfitting and failing in classification, especially when the samples size for each class in unbalanced.

K-nearest neighbors (k-NN) clustering is one of the simplest method to discriminate samples on the basis of the distance among them. After choosing the adequate number of k-neighbor samples, the algorithm identifies the k-nearest samples of known class membership to select the classification of

unknown samples. This method, unlike LDA and QDA, does not require any prior assumption and its success is independent of the homogeneity of sample numbers in each tested class [43].

Among supervised machine learning approaches, support vector machines (SVM) are particularly advantageous when samples classification is complicated by non-linearity and high dimensional space. The core of the method is the use of specific functions for pattern analysis (kernel algorithms), through which the margin of separation between classes is maximised and complex classification problems that are not linear in the initial dimension (but may be at high dimensional spaces) are resolved [20].

Similarly, artificial neural networks (ANN) is a machine learning method characterised by the ability to adapt to the data, providing classification also in the presence of non-linearity input–output relationships. Structured and organized in a less complex way than SVM, ANN usually generate a more rapid response at a lower computational cost; these efforts, however, are counterbalanced by a reduction in accuracy [20,44]. Nevertheless, ANN suffers from poor data generalisation and, by consequence, it is inclined to return model's overfitting errors. This tendency to overfitting is the main reason why accurate ANN computation analyses call for a very high number of samples to be considered, and at the same time, require strict internal and external validations to be performed, where the training set and the test set should enclose as much similar variability as possible [45].

Soft independent modelling of class analogy (SIMCA) is an alternative pattern recognition method which first performs individual PCA on the samples for each class they must be assigned to, in order to compress original variables into a smaller number of new principal components. Principal components and critical distances computed are then used to delineate a confidence limit for each class. Unknown samples are then assigned to the class to which they get close by projection into the resulting multidimensional space [36]. SIMCA is particularly useful when samples belong to several different classes; since maximum class-separation is not covered by the method, the interpretation of the outcomes may be difficult, if not impossible [20].

Regression-based supervised discriminant analyses exploit specific classification algorithms to model the interrelations existing between measured variables (i.e., spectra) and qualitative parameters (i.e., class membership), such that maximum separation between the different groups of samples is achieved. Partial least square-discriminant analysis (PLS-DA) and orthogonal partial least square-discriminant analysis (OPLS-DA) belong to this category of techniques. PLS-DA involves a standard PLS regression to find interrelations between the X-matrix (containing measured variables) and Y-matrix (containing categorical variables) by building new variables (latent variables). These interrelations allow not only to classify new samples into one of the Y-groups based on measured spectrum, but also to identify variables that mostly contribute to the classification. Although PLS-DA has the advantage of modelling noisy and highly collinear data efficiently, the technique is often unsuccessful when the non-related (orthogonal) variability in the X-matrix is substantial, since it hinders the correct interpretation of the results [20]. This drawback can be overcome by the application of OPLS-DA, through which the orthogonal variability within the X-matrix is separated from the related (predicted) variability and then modelled apart. Consequently, if samples cannot be discriminated along the predictive direction, the orthogonal variability may be handled to increase the effectiveness of discrimination among classes [46].

#### **3. Authenticating Fish and Seafood through the Application of Qualitative Spectroscopy and Chemometrics**

Spectroscopic and chemometric analyses have been used over the years for many applications in fishery research, those in the authentication field being among the most promising ones. Some of the works concerning the flexibility of spectroscopy in fish and seafood analysis have already been reviewed by different authors [24,25,47–49], but they have mainly centred on illustration of the advances of the available techniques for quality attributes assessment, as well as on the advantages and limitations of the single type of technique over traditional methods.

Therefore, in the following section, more attention has been paid to the resolution, on a case-by-case basis, of the weightiest authentication issues in the fish and seafood sector, namely species substitution, geographical origin falsification, production method or farming system misrepresentation, and fresh for frozen/thawed product substitution, each time pointing out the trends in using one or another method as well as the discrimination performances achieved, which are considered to be the most intuitive parameters used for chemometric models diagnostics. An overview of the most frequently investigated authentication issues in the fishery sector and the trend of using each spectroscopic technique over the years by the scientific community are plotted in Figures 1 and 2, respectively.

**Figure 1.** Percentage distribution of the authenticity issues covered by the scientific literature reviewed in the present work. Data were collected in February 2019 from the web search engine Google Scholar (search criteria: time period: "any time", and keywords: "fish and/or seafood"; "authenticity"; "spectroscopy"; "chemometrics".

**Figure 2.** Combined bars and lines graph, where bars (plotted against the left Y-axis) show the cumulative number of scientific works concerning the use of spectroscopy and chemometrics for fish authentication purposes, and lines (plotted against the right Y-axis) show the cumulative number of works using each spectroscopic technique. Data were collected in February 2019 from the web search engine Google Scholar (search criteria: time period: "any time", and keywords: "fish and/or seafood"; "authenticity"; "spectroscopy"; "chemometrics".

#### *3.1. Species Substitution*

Substitution or counterfeit of high-value fish species with low-value ones has many quality and safety implications. Therefore, the confirmation of scientific and commercial names declared on the label through the use of rapid and low-cost methods is increasingly popular in food research.

#### 3.1.1. Application of Vibrational Spectroscopy

An early study explored Vis-NIR spectroscopy as a tool to detect the counterfeit of Atlantic blue crabmeat (*Callinectes sapidus*) with blue swimmer crabmeat (*Portunus pelagicus*) in 10% increments, taking into consideration their different commercial values [50]. Qualitative chemometric analysis was performed on 400–2498 nm Vis-NIR spectra (previously subjected to different pretreatments to evaluate the effects on model performance), by means of a full-spectrum PCA and a sequential-spectrum PCA. As a result, both the first derivative-pretreated full spectra and second derivative-pretreated sequential spectra, highlighted a trend of samples towards moving from the left part to the right part of the PCA score plot with increased adulteration levels, but authors identified the sequential approach, using 400–1700 nm second derivative spectra, as being the most informative and, thus, the most suitable approach [50].

Based on the fact that the past several years have seen a sharp rise in the interest towards the portability of instruments, which may provide greater flexibility especially in on-line, in-line, and at-line routine quality control, a study performed by O'Brien et al. (2013), explored the ability of a hand-held NIR spectrometer to give positive results of discrimination between high-value and low-value whole fish and fish fillet species [51]. In particular, the objective was to discriminate between two different species of mullet (red mullet from mullet), cod (winter cod from cod), and trout (samlet from salmon trout). NIR spectra (906–1648 nm) obtained from skin (whole fish) and meat (fish fillets), were first pre-processed and then elaborated by PCA and SIMCA analysis. Successful PCA results were achieved only in separating the whole mullet samples, but the discrimination performances improved significantly also for mullet fillets after the application of the SIMCA analysis. PCA failed to discriminate both whole cod and cod fillets, but here too, SIMCA predictions provided a correct assignment of the tested fish samples. Similar outcomes for samlet from salmon trout were achieved [51]. Thus, although PCA investigation failed, SIMCA supervised analysis clearly outlined the possibility to authenticate high quality fish species which are potentially substitutable with lower-quality alternatives. Still in the context of the use of hand-held and compact NIR devices, a broader attempt to distinguish fillets and patties of Atlantic cod (*Gadus morhua*) from those of haddock (*Melanogrammus aeglefinus*) was recently made [52]. Raw fillets and patties of the two fish species were scanned at 950–1650 nm (by the portable instrument) or at 800–2222 nm (by a benchtop instrument) and after being pre-treated with SNV, MSC, or Savitzky–Golay smoothing (SG) coupled with first or second derivative, they were elaborated by means of supervised LDA and SIMCA analysis. Regardless of instrumentation used, the best LDA models were computed on the MSC spectra of both fillets and patties, since the correct classification rate in the external validation step reached 100% [52]. SIMCA class-modelling strategy obtained 100% correctly classified SNV, SG-first derivative, or SG-second derivative fillets spectra acquired by benchtop NIR, and 100% correctly classified MSC fillets spectra acquired with a portable NIR [52]. As for patties, samples acquired by benchtop NIR and portable NIR were 100% correctly classified when spectra were subjected to SG-first derivative or SG-second derivative, and SNV or MSC, respectively. The worst SIMCA outcomes in prediction for patties and fillets were obtained for SG-second derivative spectra acquired with the portable instrument. Despite these results, no significant differences in the performances of the two instruments tested were found, thus confirming equivalent discrimination powers also in processed product.

Different species of freshwater fish of the Cyprinidae family, namely black carp (*Mylopharyngodon piceus*), grass carp (*Ctenopharyngodon idellus*), silver carp (*Hypophthalmichthys molitrix*), bighead carp (*Aristichthys nobilis*), common carp (*Cyprinus carpio*), crucian (*Carassius auratus*), and bream (*Parabramis pekinensis*), were also investigated by NIR spectroscopy [53]. Fish samples were scanned in the 1000–1799 nm region, MSC pre-treated, and pre-reduced in dimensionality by different methods, including PCA, PLS, and fast Fourier transform (FFT). In this case, LDA models were built by using only nine pre-selected spectra wavelengths from the entire spectrum and results obtained showed a good prediction ability of the adopted strategy: PCA-LDA and FFT-LDA models, in fact, showed 100% accuracy, specificity, sensitivity, and precision, even if most of the information was not taken into account by calculation [53].

Zhang et al. (2017) attempted to classify marine fish surimi by 1100–2500 nm NIR spectroscopy, according to the species by which products were composed, namely white croaker (*Argyrosomus argentatus*), hairtail (*Trichiurus haumela*), and red coat (*Nemipterus virgatus*) [54]. According to results obtained from PCA of the pre-processed spectra, the presence of a well-defined and separated cluster associated with red coat surimi species was observed, but the separation of the other two species of surimi samples was not clear [54]. However, as regards LDA results, 100% correct classification rate for external validation datasets after MSC pre-treatment was achieved, demonstrating once again the greater effectiveness of supervised analyses compared to unsupervised ones.

Species authenticity was also studied by comparing FT-NIR and FT-MIR spectra of red mullet and plaice fillets (higher-value species) to those of Atlantic mullet and flounder fillets (lower-value species) [55]. LDA and SIMCA analysis applied to differently pre-treated NIR and MIR spectra (800–2500 nm and 2500–14,300 nm spectral ranges, respectively), clearly discriminated Atlantic mullet fillets from those of the more valuable red mullet. While LDA gave a 100% correct classification percentage in prediction (irrespective of the spectroscopic technique considered), sensitivity and specificity higher than 70% and 100%, respectively, were calculated for FT-NIR spectra subjected to SIMCA analysis [55]. Poorer, but acceptable, results were obtained for flounder and plaice fillets discrimination: in this case, FT-IR spectroscopy showed the best discrimination power, with a prediction ability higher than 83% and a specificity of 100%.

The usefulness of NIR spectroscopy was explored to identify different fish species used to make fishmeal under industrial conditions. The 1100–2500 nm raw or second derivative NIR spectra of samples containing salmon, blue whiting, and other (i.e., mackerel or herring) fish species were elaborated by PCA, LDA, and DPLS (PLS-DA). Models developed correctly classify, on average, more than 80% of the fish meal samples into the three groups assigned according to the fish species [56].

In contrast to the multiple applications of NIR spectroscopy, only one study explored the discrimination abilities of MIR spectroscopy [57]. This study coupled SG- and SNV-pre-treated MIR spectra (2500–20,000 nm) with chemometrics (PCA) to specifically detect adulteration of Atlantic salmon (*Salmo salar*) mini-burgers with different percentage (from 0 to 100%, in steps of 10%) of Rainbow trout (*Onconrhynchus mykiss*). The resulting 11 formulations of salmon burgers were grouped into 11 distinct clusters, even when the samples were stored for different periods of time before acquisition [57].

Only two applications of Raman spectroscopy concerning fish species authentication are available. The aim of the first study was to discriminate 12 different fish fillets of different species by using pre-treated Raman spectra in the range 300–3400 cm−<sup>1</sup> (about 3940–33,333 nm) recorded by a Raman spectrometer equipped with a 532 nm laser exciting source [58]. HCA analysis applied to the Raman spectra revealed the presence of three major clusters, one corresponding to fish from the Salmonidae family (rainbow trout and Chum salmon), one corresponding to various freshwater fish (zander, Nile perch, pangasius, and European seabass), and one corresponding to various saltwater fish (Atlantic herring, Atlantic pollock, Alaska pollock, Atlantic cod, blue grenadier, and yellowfin tuna). Within these large clusters, spectra were also grouped according to their species in sub-clusters, with a high degree of accuracy of the spectral classification on species level (95.8%) [58]. Similarly, PCA analysis performed on 5000–50,000 nm Raman spectra (acquired by using a 785 nm laser exciting source) discriminated among horse mackerel (*Trachurus trachurus*), European anchovy (*Engraulis encrasicolus*), Bluefish (*Pomatamus saltatrix*), Atlantic salmon (*Salmo salar*), and flying gurnard (*Trigla lucerna*) samples. In this case, however, the study was less rapid and more elaborate since the spectral acquisition was performed on the previously extracted lipid fraction of fish [59].

#### 3.1.2. Application of NMR Spectroscopy

Muscle lipids of four different species of fish belonging to the Gadoid family, namely cod (*Gadus morhua*), haddock (*Melanogrammus aeglifinus*), saithe (*Pollachius virens*), and pollack (*P. pollachius*), were subjected to 13C-NMR spectroscopic analysis of phospholipid profiles, in order to authenticate samples according to the species [60]. As a result, supervised LDA and Bayesian belief network (BBN) performed on the resulting 13C-NMR spectral peaks provided 78% and 100% of the correctly classified samples, respectively [60]. Other applications of NMR and chemometrics concerning fish species discrimination were not reported in literature until now. In our opinion, the method should be further explored in view of the several potentials and benefits provided, despite disadvantages deriving from the need of sample preparation prior to analysis.

#### *3.2. Production Method and Farming System Misrepresentation*

The differentiation of the production method of fish and seafood is another relevant aspect in certifying authenticity and traceability. During the last few years, the wild fish catches have been decreasing compared to the aquaculture production, thus supply of the market in farmed products has been growing very fast. From a compositional and organoleptic point of view, a wild fish is quite different from an aquaculture one, and this diversity is inevitably reflected on the different economic value of the two types of products [61–63]. By way of example, wild fish is usually characterised by higher levels of muscle protein, saturated, and polyunsaturated fatty acids, while farmed fish by a higher content of total lipid and monounsaturated fatty acids [64,65]. Consequently, the illegal substitution of higher-value wild fish with lower-value farmed fish is not an uncommon occurrence. Additionally, aquaculture fish consist of a number of high-variable products (i.e., extensively, semi-intensively, or intensively farmed fish, as well as organic or conventional farmed fish), whose final characteristics, since influenced by the husbandry environment and, above all, by the diet, are slight and very difficult to identify. This the reason is why the authentication of the production method (wild or farmed, organic or conventional), but also of the farming system of the aquaculture products is of extreme importance from the standpoint of fraud prevention and transparency towards consumers.

#### 3.2.1. Application of Vibrational Spectroscopy

Among various vibrational spectroscopic methods applied to differentiate production processes and farming systems of fish, NIR is once again the most widely used. No application of UV or Raman spectroscopy, to the best of our knowledge, are currently available.

Ottavian et al. (2012) proposed a comparison between the classification performances of wild and farmed European sea bass obtained by three different NIR spectroscopic/chemometric approaches, and the classification performances obtained using only chemical and morphometric features [66]. The use of 1100–2500 nm raw spectra, WPTER-pre-treated spectra (wavelet packet transform for efficient pattern recognition), or of some parameters predicted by building a regression-based model, were found to be equivalent in terms of predictability assessed by PLS-DA and no differences between classification obtained by these models and classification obtained by using only chemical and morphometric data was observed. Moreover, authors identified (by using the variable influence of projection indexes, VIP) the wavelengths related to the absorbance of fat, fatty acids, and water as most influential in differentiating the production process of the fish tested.

More recently, the systems behind the production of European sea bass, was also investigated by applying unsupervised PCA and supervised OPLS-DA to 1100–2500 nm NIR spectra [67]. PCA built to SNV-SG-second derivative spectral data did not return a clear separation of groups, mainly as a consequence of the fact that the intraclass variability among samples was higher than the among-class variability between samples. A correct classification rate of 100% for both wild and farmed sea bass was instead achieved by OPLS-DA, and, in this case, authors found VIP indexes related to proteins exerting a greater contribution to the variance between the two types of fish. A deeper insight into the

different farming systems of aquaculture samples, moreover, showed the ability of NIR and OPLS-DA to authenticate 67%, 80%, 100% of extensively, semi-intensively, and intensively-reared subjects, respectively, thanks above all to the spectral bands associated with protein absorption [67]. Concrete tank-cultured sea bass were also successfully discriminated from sea cage-cultured sea bass during storage, by means of Vis-NIR spectroscopy coupled with PLS-DA [68]. The best performances (87% of correct classification), were observed for spectral measurements performed at 48 h post mortem [68]. However, the greater contributions of the wavelengths to the PLS discrimination of samples analysed at 48 h post mortem were different from those of samples analysed at 96 h post, thus classification by farming system may have been affected also by other unrelated factors, such as the well-known compositional changes occurring during shelf life.

Authentication by NIR and SIMCA analysis of European sea bass raised in extensive ponds, semi-intensive ponds, intensive tanks, and intensive sea-cages, was also performed both on fresh fillets and freeze-dried fillets [69]. Authors found that freeze-drying the samples gave the best classification outcomes. The same results were obtained when classifying fresh minced fillets and freeze-dried fillets of farmed European sea bass according to the semi-intensive conventional or the organic production system [70]. SIMCA classification based on second-derivative spectra (1100–2500 nm) of samples, in fact, generated good results when fitted on the freeze-dried fillets (65–75% of correct classification), and worse results when performed on fresh fillets (20–25% of correct classification) [70]. All these results are particularly informative about problems posed by water when analysing high-moisture foods like fish. One of the main drawbacks of NIR spectroscopy is, in fact, the difficulty in separating relevant from useless information from spectra, in which peaks of water are predominant. These peaks, when included in chemometric calculations may hinder reliable features related to functional groups of molecules of interest and, thus, produce misleading results, especially when samples only slightly differ, such as in the case of fish reared under different conditions.

Following these principles, NIR spectroscopy was also used to directly authenticate freeze-dried rainbow trout fillets by rearing farm and, at the same time, to check whether NIR discriminating capability changed between raw and cooked freeze-dried fillets [71]. Rainbow trout samples came from three different aquaculture systems, varying in average well water temperatures, of which one consisted in indoor rearing at 11–14 ◦C, one in outdoor rearing at 9–11 ◦C, and one in outdoor rearing at 3–14 ◦C. Results for classification by farm (using SNV and second derivative 1100–2500 nm spectra of raw samples) showed approximately 97–100% of accuracy, with k-NN analysis giving the best overall statistical performances and PLS-DA the worst ones. As for cooked freeze-dried samples discrimination, the accuracy was approximately the same as those obtained for raw samples (90–100% for LDA, QDA, k-NN and 80% for PLS-DA), highlighting that the cooking process did not alter the capabilities of the technique to discriminate the sample by rearing farm [71].

#### 3.2.2. Application of NMR Spectroscopy

Several applications of NMR spectroscopy aimed at authenticating the production process or the farming system were found in literature. In particular, proton (1H) NMR spectroscopy can be used to analyse lipid mixtures such as fish oil, requiring simple preparation of samples and short time of spectra acquisition and providing a great deal of useful information [72]. Thus, considering that fish flesh lipids are the main compounds changing on the basis of the feeding regime, many attempts to use 1H-NMR to identify the production process or the farming system were made. One of the earliest studies used SVM to elaborate 1H-NMR spectra, and it was highly effective in predicting the wild or the farmed origin of salmon from different European countries [72]. Similarly, encouraging results were achieved through the combination of 1H-NMR fingerprinting of lipids from gilthead sea bream with more complex chemometric data analyses [73]. The only unsupervised PCA applied on raw or processed 1H-NMR spectral profiles returned, in fact, a clear separation between wild and farmed samples, which was found to be linked to methyl and methylene protons, together with methylene and methyne protons in unsaturated fatty acids [73]. Moreover, LDA variables selection allowed classification of 100% of the tested wild and farmed samples, and results from probabilistic neural network (PNN) analyses further reinforced the findings that such class discriminations were readily feasible.

If the previous studies were performed on fresh raw fish, other studies were intended to evaluate any differences in classification outcomes deriving from various degrees of fish processing. Lipids extracted from different types of processed Atlantic salmon products (frozen, smoked, and canned) were subjected to 1H-NMR fingerprinting to develop models for determining labelling authenticity (wild/farmed) of these products [74]. SIMCA analysis applied to 138 pre-selected spectral peaks of NMR data, correctly classified as 100% of wild and 100% farmed samples, thanks mostly to the influence of a higher content of linoleic and oleic acid in farmed salmon compared to wild salmon [74]. A higher content of unsaturated fatty acids (and especially *n*−3 polyunsaturated fatty acids) was also found to play a special role in the discrimination between wild and farmed specimens of gilthead sea breams [75]. The influence exercised by these compounds was studied though the application of a supervised OPLS-DA to the whole lipid fingerprinting data obtained by 1H-NMR spectroscopy. Just like SIMCA classification did in the previous study, OPLS-DA also led to a perfect separation of samples, but with the great advantage of being able to highlight the most effective variables in discrimination in the simplest of ways.

The 1H-NMR molecular profiles of gilthead sea bream fish specimens produced according to different farming systems, have also been investigated, to seek out differences among three different kinds of aquaculture practices (cage, tank, and lagoon), but also any variations in the molecular patterns after a 16-day storage time under ice [76]. PCA-score plot of the pre-treated spectra showed a clear separation of fresh samples from ice-stored samples. At the same time, three distinct sub-clusters for each of the storage times, corresponding to the three farming systems investigated, highlighted the ability of the proposed methods to detect those molecular changes taking place during fish storage and exploited them for authentication purposes.

Another different NMR approach retrieved from the published literature concerned the use of carbon-13 (13C) NMR instead of 1H-NMR. Authors combined 13C-NMR spectra of muscle lipids of Atlantic salmon with PNN and SVM chemometric elaborations, to discriminate between farmed and wild samples and obtained excellent discrimination performances (98.5% and 100.0% of correctly classified samples, respectively) [77]. Despite 13C-NMR signals being generally much weaker than those provided by 1H-NMR (as well as time of analysis is often longer), useful and complementary information can be obtained by this technique.

#### *3.3. Geographical Origin Falsification*

Proving the geographical origin authenticity of fish and seafood often involves the use of multi-disciplinary and cross-disciplinary approaches which take account of the environmental and genetic backgrounds affecting fish final characteristics [78]. Several published scientific researches concerning the use of spectroscopic methods pointed out the usefulness in classification of fish and seafood according to country or FAO area of origin.

#### 3.3.1. Application of Vibrational Spectroscopy

Unlike the other authentication issues discussed above, NIR spectroscopy has been less explored for fish geographical origin identification. The reason, probably, is the great difficulty experienced in modelling total variability of NIR spectra and uniquely steering it to provenance, since provenance is the sum of a huge amount of different intrinsic or extrinsic factors (genetic, growth pattern, feeding regime, muscular activity, water temperature and salinity, etc.).

A traceability model able to predict the geographical origin of Chinese tilapia fillets coming from four different Chinese provinces, was developed by NIR spectroscopy [79]. SIMCA analysis, performed on 1000–2500 nm spectra of the minced samples, allowed more than 80% of fillets from Guangdong, Hainan, and Fujian provinces and 75% of fillets from the Fujian province to be correctly and exclusively assigned to the corresponding area of origin. Several locations in the Northern China Sea and East China Sea, from which sea cucumber (*Apostichopus japonicus*) come from, were also identified by using NIR spectroscopy [80]. In this case, authors found pre-treated (SNV or MSC, and second derivative) 1000–1800 nm spectra to give the best performance in PCA, since 100% correct classification rate was obtained both in the internal calibration model and in the external validation model. Similarly, 100% of sea cucumber analysed by means of diffuse reflectance MIR spectroscopy (fingerprint 5800–16,600 nm region) combined with SIMCA, were discriminated by the Chinese geographical region of provenance [81].

The last available application of NIR spectroscopy concerned the authentication of European sea bass according to Western, Central, or Eastern Mediterranean Sea provenances, by using OPLS-DA as a classification technique [67]. Results showed an overall discrimination performance of 89% according to these geographical origins, with 100% of Eastern, 88% of Central, and 85% of Western Mediterranean Sea samples being correctly classified. The VIP index analysis, moreover, identified lipid-associated bands as the most influential variables on the samples geographic discrimination.

#### 3.3.2. Application of NMR Spectroscopy

Masoum et al. (2007) proposed a method for the origin authentication of Atlantic salmon based on 1H-NMR and SVM of spectra extracted from samples coming from Canada, Alaska, Faroes, Ireland, Iceland, Norway, Scotland, and Tasmania. SVM returned a low degree of misclassification (4.6%) and, thus, an excellent correct classification rate for all the salmon samples [72]. Likewise, Aursand et al. (2009), used NMR combined with pattern recognition techniques to assess the geographical origin of Atlantic salmon and to verify the origin of market samples [77]. Here too, muscle lipids were extracted from tissues of fish coming from the same origins as those previously listed, but on the contrary, lipid composition was studied by 13C-NMR coupled with PNN or SVM. This time, although the PNN- and SVM-based approaches used returned different correct classification rates (93.8% and 99.3%, respectively), a comparable classification accuracy between the two methodologies approaches was observed [77]. The 1H-NMR lipid fingerprint, elaborated by LDA or PNN, allowed also to differentiate 76.2–100% of wild and farmed gilthead sea bream samples coming from Italy, Greece, Croatia, Turkey, and the Mediterranean Sea (for wild specimens), with better classification rates when PNN was applied [73]. Farmed gilthead sea bream specimens coming from five geographically distinct sites of Sardinia (Italy) and Greece were also discriminated by means of 1H-NMR lipid fingerprint [75]. In this case, the fraction of unwanted variability related to the different production system of samples (off-shore sea cages and lagoon) was successfully overlooked thanks to the application of the OPLS-DA and, although authors did not provide statistical outcomes from internal or external classification, the significance of the clusters observed in the score plot was confirmed by bootstrap statistical analysis. The highest bootstrap values (indicating a well-defined class separation) were obtained for discrimination between Greek and Sardinian fish (100%), while lower but meaningful bootstrap values were obtained for discrimination among samples coming from different Sardinian offshore sea cage farms (68–57%) [75].

One last interesting application of 1H-NMR dealt with the geographical authentication of bottarga, a fish-derived product consisting of salted and dried mullet (*Mugil Cephalus*) roe [82]. Low-molecular weight metabolites of aqueous extracts of samples, were analysed by PCA in order to identify clusters corresponding to one of the specific geographical provenances studied, namely FAO 37.1.3, FAO 34, FAO 41, FAO 31, and one unknown provenance. Results from PCA confirmed the possibility to characterise bottarga samples having different geographical origins, since samples with the same known geographical origin were closely clustered in the same region of the PCA scores plot, and those of different origin were far away from each other.

#### *3.4. Discrimination between Fresh and Frozen*/*Thawed Fish and Seafood*

Fish is commonly processed by freezing in order to be preserved from deterioration. Frozen fish, however, is usually characterised by much lower quality and commercial value compared to fresh fish. Therefore, fraudulent practices consisting in the substitution of fresh with frozen/thawed products are not uncommon events [83]. Considering that labelling of fish must state if the fish is fresh, frozen, or previously frozen (or refreshed), discriminating fresh from frozen/thawed products is one of the most important authenticity issues. The differentiation between fresh and frozen/thawed products is hampered by difficulties in detecting those tiny physical and chemical variations occurring during freeze storage, which, moreover, do not cause any perceptible organoleptic change [83,84]. Therefore, the rapid confirmation of fish freshness by spectroscopy has been widely studied during the last few years and several published researches are currently available.

#### 3.4.1. Application of Fluorescence and Vibrational Spectroscopy

Front-face fluorescence spectroscopy is one of the earliest spectroscopic techniques historically applied to differentiate fresh from frozen/thawed fish. It has been demonstrated that typical changes in fluorescence spectra of aromatic amino acids, nucleic acids, and nicotinamide adenine dinucleotide (NADH) occur during storage, as a consequence of several reactions involving free amino acids and carbonyl compounds of reducing sugars, formaldehyde (produced from trimethylamine oxide), and malondialdehyde (produced from oxidation of fish lipids during storage). Therefore, changes in fluorescence of fish samples may be considered as fingerprints for fresh and aged fish fillet identification [85]. The fluorescence emission spectra of tryptophan (305–400 nm) recorded directly on whiting fillets and elaborated by factorial discriminant analysis (FDA) led to correct classification rates of 62.5% and 70.8% in the calibration and validation set, respectively. NADH fluorescence spectra (360–570 nm), indeed, were found to have a higher potential to differentiate fresh from frozen/thawed products as they allowed to achieve 100% of correct discrimination for both calibration and validation set [85]. More recently, the same authors confirmed the success of a similar methodology in authenticating freshness of sea bass samples. Fluorescence emission spectra at 340 and 380 nm, elaborated by FDA, led to 94.87% of total correct classification rate [86]. Additionally, the elaboration of NADH fluorescence spectra by Fisher's linear discriminant analysis, was stated as a reliable method to rapidly discriminate fresh and frozen/thawed large yellow croaker fillets, since 100% of total correct classification rate was achieved [87].

More applications of IR spectroscopy are reported in the published literature. Uddin and Okazaki (2004) used NIR reflectance spectroscopy on dry extract of horse mackerel specimens to evaluate freshness [88]. Both PCA (using 1100–2500 nm spectra) and SIMCA analysis (using only three selected wavelengths which were strongly related to protein content) successfully discriminated 100% of fresh and frozen/thawed samples. Thereafter, the same authors performed further investigations on fresh and frozen/thawed red sea bream by using Vis-NIR spectroscopy in the 400–1100 nm region [89]. In this case, raw spectra were used to build an LDA model, by which 100% classification accuracy in prediction was reached. PLS-DA of SG-smoothed spectra (670–1100 nm) of shrimps subjected to different treatments (including ice, water, and brine at various salt concentrations), also led to 100% of fresh and frozen/thawed samples to be authenticated [90].

Another study was directed to compare classification ability of Vis-NIR (380–1080 nm) and NIR (1100–2500) spectroscopy in authenticating fresh and frozen/thawed swordfish and, through the application of PLS-DA, it was found that in this case, Vis-NIR spectra gave better results in the external validation (≥96.7% of correctly classified samples) [91]. Although worse outcomes were obtained by only using the NIR region, the technique, combined with SVMs, also authenticated 93% of fresh and 83% of frozen/thawed sole (*Solea vulgaris*) samples [92]. Again, high accuracy (90%) and sensitivity (80%) in prediction were observed for the discrimination of fresh and frozen/thawed tuna sample by Vis-NIR spectral analysis (350–2500 nm) combined with PLS-DA [93], while better and more homogenous SIMCA prediction results were obtained when using MIR (2500–14,300 nm) instead of NIR (800–2500 nm) regions for the discrimination between fresh and previously frozen Atlantic mullet fillets [94].

Ottavian et al. (2013) proposed an interesting three-step approach based only on NIR spectra and latent variable modelling techniques to develop a species-independent classifier able to simultaneously discriminate between fresh and frozen/thawed fish and, remarkably, overall classification accuracy of the method ranged between 80% and 91%, based on the strategy adopted and the instrument used [94]. By contrast, the only MIR region was found to be useful for determining whether whiting fish fillets have been frozen/thawed: when FDA was applied to the 3300–3570 nm MIR subregion (usually related to fatty acids absorption), 87.5% of sample spectra in the validation set was correctly identified [95].

Finally, one single application of Raman spectroscopy to the authentication of fresh fish is now available [59]. Lipid fraction of fish from several species (horse mackerel, European anchovy, bluefish, Atlantic salmon, red mullet, and flying gurnard) was extracted from three samples batches (fresh samples, once frozen/towed samples, and twice frozen/thawed samples), and then collected by a Raman spectrometer along the 5000–50,000 nm spectral range and using a 785 nm laser exciting source. Chemometric analysis, performed by PCA, identified three different clusters in the score plot, each corresponding to one of the three batches of fish investigated [59].

#### 3.4.2. Application of Hyperspectral Imaging Spectroscopy

Discrimination between fresh and frozen/thawed cod fillet was studied by Vis-NIR/HSI, using both a handheld interactance probe and an imaging spectrometer (for automatic online analysis at typical industrial speeds) [96]. Spectra resulting from the two instruments were pre-treated (SNV and second derivative) and statistically analysed by applying the Rosenblatt's perceptron linear classifier to the first and third principal component of the imaging data. Results showed that fresh cod fillets can be completely separated from fresh/thawed cod fillets using only a few wavelengths in the Vis region, mainly related to the oxidation of haemoglobin and myoglobin which occur during freezing/thawing [97]. Similarly, hyperspectral data from Vis-NIR/HSI (380–1030 nm) combined with least square-SVMs, returned an average correct classification rate of 91.67% for fresh and frozen/thawed halibut fillets [97].

#### 3.4.3. Application of NMR Spectroscopy

NMR spectroscopy is considered to be a useful and suitable tool for the discrimination of fresh from frozen/thawed fish, since NMR signals are sensitive enough to changes in water mobility and its interaction with other molecules [98]. NMR spectroscopy has been already widely exploited to identify the various modifications in fish tissues occurring during freezing and thawing of fish [99–102]; however, as far as we know, no application of this technique for fish freshness authentication is currently available.

#### **4. Critical Aspects and Limitations to Overcome**

The food scientists' interest towards the development of reliable methods for the resolution of several food authenticity issues is well documented by the increasing number of scientific works which, albeit through different methodologies, have attempted to address the same problems. It is clear from the analysis of the latest literature that spectroscopy combined with chemometrics is just one of the many untargeted strategies adopted: chromatographic, MS-based, as well as bio-molecular and sensory techniques have been already widely exploited and have demonstrated their exceptional multipurpose qualities for fish authenticity testing [78,103–108].

These techniques are known to share certain common disadvantages, such as the long time needed for analysis, high costs of the equipment, the need of sample preparation prior to analysis, destructiveness, and the demand for qualified personnel. On the other hand, as they become more consolidated within the research community, these techniques excel by their higher accuracy, specificity, and sensitivity compared to spectroscopic ones, to the point that many of them are used in food official

controls. Despite this, the attractiveness of spectroscopy and chemometrics is evidenced by not only by the large literature provided in the present review, but also by several other applications covering a wide range of food and foodstuffs: fruits and vegetables, honey, wine, edible oils and fats, cereal and cereal-based products, milk, and dairy products [109–114] have been successfully investigated and authenticated by means of spectroscopy.

Having said that, some critical reflections should be made about the problems related to the use of spectroscopy and chemometrics, which still have not been overcome. In accordance to what has been already reported and to our opinion, the research papers analysed were found to be highly variable to each other in terms of analytical set-up (e.g., sample pre-processing, spectral ranges, spectra pre-treatments, resolutions, number of samples tested, and statistical elaboration). This variability, as easily understood from Section 3, is further worsened by the fact that only a few of the works analysed reported in-depth statistical outputs and, where present, they were not comparable to each other.

A critical and objective evaluation of these works is also severely hampered by a lack, in certain cases, of comprehensive data with regards to the validation of the results. Alongside the internal cross-validation, the external validation of the qualitative chemometric model is, in our opinion, a crucial point in assessing the overall goodness of the classifiers and avoiding misleading interpretations. The last aspect which should be emphasised is that a detailed description of the characteristics of the sample dataset was not often reported and the lack of standardisation of external factors (e.g., storage times and conditions), may have interfered with spectral analysis, possibly affecting the robustness of the model. In this scenario, a recommendation for future works is to consider the intrinsically natural variability of the fish products (as well as those of all other foodstuffs), and to organise the sampling in such a way that as much of the expected variability of samples is collected during the calibration stage. That way, the robustness of the models can make their way to the spread of applications also in the industrial sector.

As a final remark, no technique should be universally regarded as the optimal solution. However, the possibility of using UV, IR, Raman, and NMR spectroscopies with no distinction for food authentication purposes is still an obstacle to overcome, and therefore, in accordance to our experience, untargeted NIR spectroscopy represents the most versatile option thanks to its high sensitivity to organic molecules of food, cost-effectiveness, and ease of use. Additionally, the use of NIR spectroscopy with supervised chemometric method, able to separate relevant from non-relevant spectral variation like OPLS-DA, should be encouraged since the interpretability of results is enhanced.

#### **5. Conclusions and Prospects for the Future**

Recent increases in the complexity and competitiveness of the fishery and seafood sectors, have resulted in the presence, on the international market, of a huge variety of fresh and processed products, but at the same time, have meant that the risk of fraud deriving from substitution among look-alike products is now exponentially higher than it was even a few years ago. Thus, ensuring the truthfulness of fish and seafood claims concerning their quality and origin, has become an exceptionally important topic, firstly with a view to enable consumers to make informed decisions.

The overview presented in this review clearly highlights the effective support provided by analytical approaches based on spectroscopy and multivariate data analysis for the evaluation and monitoring of fish and seafood products authenticity. Fluorescence, vibrational, NMR, and HSI spectroscopic applications have been discussed, with an accent on the trends toward their use for several authentication purposes. In this connection, IR spectroscopy has been the most exploited technique, especially in studies concerning species and fresh for frozen/thawed products substitutions. NMR, instead, has shown many applications in the field on the production method, farming system, and geographical origin identification. By contrast, Raman and HIS have provided very encouraging results in some fish authentication fields, but their overall potential has so far been largely ignored.

Rapidity, non-destructive nature, ease of use, and high-throughput measurements make the spectroscopic non-targeted approach an ideal tool for quality control operations, especially in the context of daily routine and screening analysis in the food industry, and as a possible substitute of traditional analytical techniques. Thanks to the technological development of the spectroscopic instrumentation, the availability of miniaturized and portable devices on the market is rapidly growing, and this will contribute to an additional growth of applications in the food sector. On the other hand, these analytical strategies in the official control of foodstuffs are still far from being effectively applied, largely due to the need of a strict validation to assure further reliability and robustness of results before implementation as standalone tools. For these reasons, standardisation of the working conditions, optimisation of the chemometric software, and creation of large databases for data-sharing and for encouraging greater cooperation between food scientists, represent important current research fields and future challenges to be faced.

**Author Contributions:** All authors contributed equally to this work.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Chemometrics Approaches in Forced Degradation Studies of Pharmaceutical Drugs**

#### **Benedito Roberto de Alvarenga Junior and Renato Lajarim Carneiro \***

Department of Chemistry, Federal University of São Carlos, São Carlos 13565-905, Brazil; benedito.alvarenga@outlook.com

**\*** Correspondence: renato.lajarim@ufscar.br; Tel.: +55-16-3351-9366

Academic Editor: Marcello Locatelli Received: 11 September 2019; Accepted: 8 October 2019; Published: 22 October 2019

**Abstract:** Chemometrics is the chemistry field responsible for planning and extracting the maximum of information of experiments from chemical data using mathematical tools (linear algebra, statistics, and so on). Active pharmaceutical ingredients (APIs) can form impurities when exposed to excipients or environmental variables such as light, high temperatures, acidic or basic conditions, humidity, and oxidative environment. By considering that these impurities can affect the safety and efficacy of the drug product, it is necessary to know how these impurities are yielded and to establish the pathway of their formation. In this context, forced degradation studies of pharmaceutical drugs have been used for the characterization of physicochemical stability of APIs. These studies are also essential in the validation of analytical methodologies, in order to prove the selectivity of methods for the API and its impurities and to create strategies to avoid the formation of degradation products. This review aims to demonstrate how forced degradation studies have been actually performed and the applications of chemometric tools in related studies. Some papers are going to be discussed to exemplify the chemometric applications in forced degradation studies.

**Keywords:** forced degradation; degradation products; stress test; chemometrics

#### **1. Chemometrics**

The Swedish word "kemometri" appeared for the first time in 1971 by a combination between the terms chemistry and -metri. In 1972, the English homologous term chemometrics (chemo + metrics) was referred by Prof. Svante Wold that named his group as Forskningsgruppen för Kemometri (Research Group for Chemometrics) or Kemometrigruppen (Chemometrics Group), and in the next year, it was published the first article with the term kemometri [1,2]. The International Chemometrics Society explained the term "chemometrics" for the first time in 1974. International journals, in the 1980s, had special issues on chemometrics. In 1986–1987, the publishers Wiley and Elsevier created the chemometrics journals "The Journal of Chemometrics" and "Chemometrics and Intelligent Laboratory Systems," respectively [3].

The definition of chemometrics is intimately linked to what it is expected to gain from using it. This definition has presented some inconsistencies between authors over the years, once each one belongs to fields with different aims [4].

According to Pure and Applied Chemistry (IUPAC), the full definition of chemometrics, considering no preference of area, is the science of relating measurements performed on a chemical system or process to the state of the system through application of mathematical or statistical methods. IUPAC also highlights that, in chemometrics, the data are treated commonly in a multivariate approach, and although there are cases in theoretical chemistry that use the same mathematical and statistical techniques in some application, it should aim primarily to extract useful chemical information of measured data [5].

This definition evidences clearly the utilization of chemometrics in all stages of the chemical measurement process, from definition of optimal experimental conditions, data collection, and processing of data. Chemometrics has its roots in analytical chemistry [6], but it is totally interdisciplinary and has been applied in many different areas [7], such as food sciences [8–12], assessment of adulteration, geographical origin [13–15], metabolomics [16–18], engineering [19,20], forensics [21–25], pharmaceutical studies [26–30], cultural studies [31–33], environmental chemistry [34], etc. Chemometric tools are fundamental to solve real life problems [35].

In fact, when chemometric is applied appropriately with suitable interpretations, it enables to obtain a better data visualization even from experimental of poor quality (low resolution and high level of noise), making the relations between analytical signals and experimental parameters clearer [36]. The development of methods for analysis of degradation products is a hard work, time consuming, and an expensive task. In this context, chemometric tools are an alternative approach to carry out studies related to impurities in pharmaceutical drugs, contributing for acquiring relevant information from the system or turning the analytical method greener.

#### **2. Degradation Products**

The efficacy and safety of drugs are determined by toxicological and pharmacological profiles and adverse side effects due to the dosage and impurities [37–39]. According to the International Council for Harmonization and Technical Requirements for Pharmaceuticals for Humans Use (ICH), a drug impurity is any component that is not a chemical entity defined as an active pharmaceutical ingredient or excipient [40]. The impurities can be classified regarding their origin: inorganic impurities (reagents, ligands and catalysts, heavy metals or other residual metals, and inorganic salts), organic impurities (starting materials, by-products, intermediates, degradation products, reactants, ligands, and catalysts), and solvents (organic and inorganic liquids used in preparation of solutions or in the synthesis of a new drug substance). Therefore, any extra material present in the drug, even if it does not have pharmacological activity, is considered an impurity [39]. Although the term "impurity" is commonly assigned as synonymous of degradation products, it is worth highlighting that these compounds belong to a subgroup inside the impurity definition [41]. The United States Pharmacopoeia adopts the term "Related Compounds" for the main degradation products and impurities from synthesis.

The yielding of degradation products depends of several variables, chemical stability being the most important one. The degradation of APIs involves the formation or breaking of covalent bonds in chemical processes such as oxidation, reduction, thermolysis, and hydrolysis reactions. These processes can usually be accelerated when the drug is exposed to light, high temperatures, acidic or basic conditions, humidity, oxidative environment, incompatible excipients, and even due to its contact with packaging during its shelf-life [41].

#### *2.1. The Generation of Degradation Products*

Stability of API is a critical parameter in the development of a drug product, which should be considered in the formulation, analytical methods, package, storage, shelf life determination, safety, and toxicological studies [42,43].

The degradation of an API can result in the loss of effectiveness and can also lead to adverse effects due to degradation products [44]. Therefore, understating the processes that contribute to generation of degradation products is extremely important to create strategies aimed at the prevention and/or minimization of the API's degradation.

The oxidative degradation is one of the leading causes of drugs degradation, once it involves the removal of an electropositive atom, radical, electron, or the addiction of an electronegative atom or radical. The major part of API's oxidation occurs slowly due to the action of molecular oxygen, and some procedures used during manufacturing and storage are employed to stabilize the API in the product. For that, it is necessary to know the variables that increase the extension of oxidation. One form of preventing the oxidation process is to substitute oxygen inside pharmaceutical recipients

by nitrogen or dioxide carbon. The contact of drug with metal ions, which can catalyze the oxidation, should be also avoided, as well as high storage temperatures [45].

Temperature is another variable that has significant influence on degradation and is often used in forced degradation studies. The same product can present different shelf lives depending on how and where it is stored. For example, countries in which equatorial climate predominates have higher average temperature than the ones with tropical climate, and this difference promotes different degradation conditions and, consequently, different shelf lives [46].

Several pharmaceutical drugs have low stability in aqueous medium and must be evaluated under hydrolysis conditions. First, to evaluate the hydrolysis of an API, it is necessary to perform tests in a wide range of pH (solution or suspension) once the hydrogen and hydroxyl ions are able to influence the degradation ratio [47–49]. Then, hydrolytic forced degradation studies are performed by submitting the API to acid, basic, and neutral conditions, in a fashion that the experimental variables have to be adapted if it is observed high degradation of API, in order to avoid the formation of secondary degradation products [48].

Photostability studies should also be performed to demonstrate the extension of reactions when the APIs are exposed to light. The photolytic reactions are caused when the drug absorbs the ultraviolet/visible (UV-Vis) light (wavelength 300 to 800 nm), which promote the molecule to an excited state and can increase its reactivity in some sites of the molecule. The UV-Vis radiation also can lead to cleavage of chemical bonds, yielding new molecules. The extension of photodegradation is dependent of the wavelength of the incident radiation and the absorptivity of the molecule. In other words, this process depends of the presence of specific functional groups [50].

Nonetheless, it is worth mentioning that even when an API is shown to be chemically stable in stress tests, the stress conditions can degrade this API when excipients are present.

#### *2.2. Forced Degradation Studies*

Since the release of the first guidelines, massive changes to the definition of quality in pharmaceutical drugs have taken place, and several countries are extending the requirements of regulatory agencies to generic drugs and already commercialized products [51]. Forced degradation studies, also called "stress tests," have been used in the pharmaceutical industry for a long time [50], but the International Conference on Harmonization (ICH) only issued the formal request Q1A with a guideline "Stability Testing of New Drug Substance and Products" in 1993 [52]. In general terms, forced degradation studies are processes that involve the degradation of drugs under extreme conditions to accelerate the yielding of degradation products. The information obtained from these studies are usually used to determine the chemical stability, pathways of degradation, to identify the degradation products, conditions of storage, self-life, excipient compatibility, and also allow the development of selective analytical methods [52–54].

Today, the control of impurities has been established by ICH Q3A and Q3B guidelines, which are addressed for registration applications about the content and qualification of impurities classified as degradation products, which are observed during manufacturing or stability studies of the new drug product. Furthermore, the registration application should present a validated analytical procedure suitable for the detection and quantification of degradation products, which should include or evidence the method's specificity for specified and unspecified degradation products according to ICH Q2A and Q2B guidelines for analytical validation. When the impurities are available in the validation method phase, the discriminatory capacity of drug and impurities is validated through spiking drug substance with levels of impurities. On the other hand, if impurity or degradation product standards are unavailable, the drug substance should be submitted to stress conditions (light, heat, humidity, acid/base hydrolysis, and oxidation). Therefore, in general, the forced degradation studies are performed in the developing stability-indicating method, and the method validation should take into account the chromatographic separation of the degradation products.

Several works in the literature deal with studies of forced degradation and stability as synonymous, but it is worth highlighting that there are some differences between them. Stability studies consist of submitting the pharmaceutical drug in milder conditions over a long period (months or years) and, besides determining some degradation products, allow the establishment of the product's shelf life. Forced degradation studies are often performed by exposing the API or the product in drastic conditions for some hours or days. These extreme conditions are able to provide, as a general rule, substantial degradation of the API, usually from 10 to 30%. The set of whole degradation products found in every degradation condition composes a "potential" degradation profile. If just few degradation products are found, the degradation profile is then denominated as "real degradation profile." The method to evaluate the degradation products should be selective and developed considering the occurrence of every degradation product [55].

The forced degradation studies are critical in the development of drug products and aims the following points:


The degradation products are commonly analyzed by high-performance liquid chromatography (HPLC) coupled with ultraviolet/visible (UV-Vis) and/or mass spectrometric (MS) detectors. UV-Vis detectors are able to provide only information related to chromophores groups, but they are excellent for quantification. MS detectors are not robust as UV-Vis detectors for quantification, but MS presents high sensitivity (traces level) and gives important data to characterize the degradation products through fragmentation profile, accurate mass (for detectors of High Resolution such as Q-ToF, Orbitrap, and Fourier-transform ion cyclotron resonance (FT-ICR)), as well as information about the origin of fragments using multiple stage (MSn) and neutral loss scan. When more information is necessary to elucidate a chemical structure, the nuclear magnetic resonance (NMR) technique is required. NMR presents low sensitivity, but it is able to resolve conformational, structural, and optical isomers. All these techniques generate a great amount of data, and the manual data mining is very time and money consuming. In this context, chemometric tools can present a way to organize and pre-process data, optimize parameters of HPLC, MS, and NMR techniques, obtain the maximum knowledge about them, and clarify a lot of useful information [51,58,59].

#### *2.3. Strategies to Select the Degradation Conditions*

Forced degradation studies are performed in batches with solutions at different pHs, in the presence of hydrogen peroxide, UV-Vis radiation, metallic cations (Fe3<sup>+</sup> and Cu2<sup>+</sup>), and high temperatures [48].

Usually, the influence of pH is evaluated using 0.1 mol L−<sup>1</sup> of HCl or NaOH [48]. The degradation by radiation is performed under UV-Vis light, which should not be lesser than 1.2 million of lux per hour and a power of 200 Wh m−<sup>2</sup> [60]. For oxidant condition, the literature recommends using hydrogen peroxide (H2O2) in concentration from 0.1% to 3.0% at room temperature (25 ◦C). The evaluation of temperature is usually performed between 40 to 80 ◦C, but it could be higher for recalcitrant APIs. Other additional variables can be taken into consideration in the global stability studies of an API or the final product, such as humidity and microbiological stability [22,57,61,62].

According to ICH, in "Expert Committee on Specifications for Pharmaceutical Preparations" document, the recommended degradation should be between 10 to 30% of the API. This degradation range commonly allows for the evaluation of the main degradation products, avoiding the yielding of secondary degradation products [63]. In Brazil, the regulatory agency ANVISA recommends not less than 10% of degradation of API, and a technical justification is needed in the case where such degradation is not obtained [64].

It is worth highlighting that the cited conditions for forced degradation studies are just initial attempts, and the ideal condition could be more extreme or mild, depending of the chemical recalcitrance of the API. Table 1 summarizes degradation conditions of some papers that performed forced degradation studies.


*Molecules* **2019**, *24*, 3804

#### *2.4. Acceptable Limits of Impurities*

After obtaining the degradation profile, a critical analysis should be performed to verify the purity of the chromatographic band of the API and to evaluate the variables that can promote degradation of the API. The degradation products are analyzed according to their amount in relation to the API in the final product, after the regular stability time (without any stress condition). The evaluation considers the maximum amount of API administered per day, and the limit of degradation products are expressed as a percentage (or mass) relative to the API. The amount of degradation products defines if it is necessary to perform notification, identification, or qualification [40,57,77]. Table 2 shows the acceptance criterion used by ICH, FDA, and ANVISA for the amount of impurities found in relation of a daily administrated API. The acceptance criteria have the following meaning:



**Table 2.** Thresholds for degradation products.

#### **3. Applications of Chemometric Tools in Forced Degradation Studies**

#### *3.1. Design of Experiment (DoE)*

In every area is important to know how variables act on the system. In general, processes aim to enhance the quality of the final product, taking into account the minimization of cost and time. To achieve these goals, it is necessary to perform the optimization of variables of the system to gain knowledge about the behavior of variables in order to determine the influence of each variable [78,79]. The optimization of variables in a system is more commonly performed using one-variable-at-a-time approach (OVAT), where one variable, or also called factor, is changed at a time, causing a change in the monitored response. However, this univariate approach does not consider the interactions between variables, and therefore, it does not ensure the discovery of the optimum point in an optimization process [80]. The design of experiments arises as an alternative multivariate approach for studying the behavior of a system [81]. In this approach, the factors are simultaneously evaluated, and the experiments are performed in an organized way in order to acquire information about all the system performing a minimum number of experiments [82,83].

Some terms in DoE must to be clear for better understanding, as variables, levels, and responses. Variables or factors are independent experimental inputs capable of changing the responses of the system. Such factors are temperature, pH, irradiation time, reaction time, concentration of reactants, and so on. It is worth reiterating that variables can be changed independently of each other, but the response is dependent of synergism between them [84].

Levels are different values that a variable can assume within experimental domain. The variable temperature in an optimization process, for example, can be studied at three levels: at 30, 50 and 70 ◦C.

Responses or independent variables are the monitored parameters. Typical responses are cost, time of analysis, resolution between chromatographic peaks, percentage of API degradation, etc.

The values studied for each variable are coded in levels as high (+1), central (0), low (−1), and other levels, which depend on the design. This codification normalizes the independent variables, avoiding any wrong interpretation of data. The processes involved in DoE allow it to fit the empirical data to a function, creating a linear or quadratic model and considering the interactions between variables of the system [85]. Figure 1 shows the experimental domain of the most common experimental designs for screening and optimization steps.

**Figure 1.** Experimental domain of the most common experimental designs.

In sum, the DoE presents the following advantages:


In the context of forced degradation studies, the DoE has been mainly used for the development and optimization of chromatographic methods and for multivariate evaluation of stress conditions.

The use of DoE in the development and optimization of chromatographic conditions is not exclusive for forced degradation studies; instead, its application has spread to several fields that use chromatography as a tool [86–88]. Krishna et al. [89] performed forced degradation studies of eberconazole nitrate (EBZ) submitting it to hydrolytic (acid, basic, and neutral), thermal, oxidative, and photolytic degradation. In this work, a full factorial 3<sup>3</sup> design was used to identify the best conditions of the mobile phase for drug analysis. As is already well known in chromatography, the organic modifier in the mobile phase (methanol in this case), pH (10 mM potassium dihydrogen orthophosphate), and ion pair agent (tetra butyl ammonium hydroxide, TBAH) are important variables and alter the capacity factor (k) of the mobile phase. These variables were evaluated in three levels (−1, 0, and +1) following a full factorial design with 27 experiments (33 Full Factorial). Table 3 presents the real value of variables, and Table 4 shows the 27 different experiments.


**Table 3.** Real and coded values of variables considered in design of experiment.


**Table 4.** Conditions of experiments performed in full factorial 33 design.

The ranges studied in design were selected according to previous studies and considered the physicochemical properties of EZB. Other chromatographic parameters such as column dimensions, flow rate, injection volume, wavelength for detection, as well as the procedure performed in each degradation condition, can be found in reference [89].

As a result, a Pareto chart of standardized effects showed the quantification of each variable on the capacity factor, where organic phase and TBAH presented the higher influence on the response. Both linear and quadratic regressions showed no significance for pH inside its range of variation. The results of experimental design also allowed the authors to create contour plots, and they emphasized the usefulness of studying the interaction effects of variables on capacity factor. It was observed through contour plots that, by increasing concentration of TBAH, the capacity factor of EBZ was increased, and the same behavior occurred when the organic modifier decreased. Furthermore, pH did not affect the capacity factor in the investigated experimental domain. At the end, the optimum conditions (pH 2.8, 10 mM TBAH, and methanol 25% (*v*/*v*)) made it possible to find a capacity factor equal to 2.06.

Table 5 shows some papers that used the experiment design to optimize the chromatographic conditions to analyze the degradation products yielded in forced degradation studies.


**Table 5.** Design of experiments used in some papers to optimize chromatographic conditions for analyses of degradation products.

In the papers presented in Table 5, the DoEs were used to evaluate the chromatographic parameters in order to obtain the best chromatographic method. The meaning of the best chromatographic method depends of the intention of the analyst—better resolution for the API, higher number of peaks in order to detect all degradation compounds, cost-and-time saving methods, etc.

Another purpose for forced degradation studies found by Sonawane and Gide [101] was the application of experimental design for the optimization of forced degradation of luliconazole (LCZ), 4-(2,4-dichlorophenyl)-1,3-dithiolan-2-ylidene-1-imidazolylacetonitrile), which is recommended for the treatment of fungal infections. The LCZ was submitted to acidic (HCl), alkaline (NaOH), oxidative (H2O2), thermolytic (under reflux), and photolytic (direct sunlight) stress conditions, and a full factorial design was chosen to identify the conditions to obtain a degradation of this API between 10 and 20%. The 2<sup>3</sup> factorial design for acid and alkaline conditions took into account the variables concentration of the degradant agent (x1), temperature (x2), and time of exposure (x3) to achieve the desired degradation. The variable temperature was not included in oxidative degradation, and the design became a 2<sup>2</sup> factorial design. The same design was performed to dry heat and wet heat degradation, but including the variable temperature and discarding the variable concentration. For photolytic degradation, LCZ powder was exposed to direct sunlight for 48 h and compared with control in dark, but DoE was not applied. The level of the variables for each stress condition is presented at Table 6. The 2<sup>3</sup> factorial design was performed in a total of eight experiments, and the 2<sup>2</sup> factorial in a total of four experiments for each degradation (oxidative, dry heat, and wet heat) by design. Table 7 shows the experiments and the obtained results by liquid chromatography.


**Table 6.** Real values of the variables used in the design of experiments.


**Table 7.** Design of experiments with coded values and % of degradation of active pharmaceutical ingredient (API) for acid, basic, and oxidative conditions.

The dry and wet heat degradation did not present any degradation of luliconazole, but photolytic degradation obtained 8%. Concerning acid, alkali and oxidative conditions, the degradation ranges were 0–41%, 0–43%, and 0–100%, respectively. Multivariate regressions were performed on the results for each degradation (acid, alkali, and oxidative) in order to obtain the regression models (equations) for the studied experimental domain. These regression models are used to predict suitable conditions to achieve the desired percentage of degradation. These conditions provided degradation of 11%, therefore, a relative error equal to 9%. More details about the equations in each degradation condition as well as surface response created to better visualization of the results can be found in the reference [101]. The DoE in this work allowed the authors to gain knowledge about stability of LCZ, presenting the degradation condition where LCZ is more susceptible to undergo degradation and indicating the variables that present higher influence on the degradation of LCZ. Finally, the chemometrics tools aid to predict the values of variables to obtain the desired degradation.

Another example was presented by Kurmi et al. [102]. that used DoE to develop the stabilityindicating method and also found the stress conditions for forced degradation of furosemide in the range of 20–30%.

Despite the fact that DoE is a very interesting tool to find the most suitable conditions in the degradation studies and avoiding the generation of secondary degradation products, there are few papers presenting such approach.

#### *3.2. About Fusion QbD*®

As mentioned previously, forced degradation studies are performed in the development stability-indicating method phase. DoE is extremely useful to build a set of screening, optimization and robustness experiments. In this context, some HPLC method development software platforms are commercially available to automatically perform the experimental design. This software, such as Fusion QbD, uses concepts of experimental design and creates a sequence of experiments considering all relevant chromatographic parameters. It is possible to build, for example, a set of screening experiments considering more than one type of chromatography columns, multi-solvents, and other chromatographic variables. After the creation of a set of methods, guided by the DoE principles, and after running the sequence of experiments, the software generates mathematical models and makes predictions to find the better chromatographic method. As Fusion QbD is integrated with the chromatography system, all functions of HPLC are explored, and it allows users to reach maximum efficiency and speed in the method developing process [103]. Others specialized software is also used to create basic designs, such as Origin [104], Matlab [105], Minitab [106], Design-Expert [107], and Statistica [108].

#### *3.3. Principal Component Analysis (PCA)*

Principal component analysis (PCA) is one of the most used chemometric tools for data exploration through the reduction of a system's dimensionality [23,109,110]. This technique allows the user to establish the numerical adjustment of a linear model for describing the central relationships among process variables [111]. The PCA aims mainly to extract the most useful information from data. Besides, this chemometric tool helps simplify the description of the data for the analysis of variables [112].

The use of PCA enables the user to represent objects with new variables that are linear combinations of the original variables. These linear combinations, denominated principal components (PCs), are calculated considering directions of maximum variance, in a fashion that they may also be perpendicular to each other [23]. The first PC describes the maximum variance of the sample. The second PC describes the most considerable variability that the first one was not able to describe. The directions of the most dispersed samples are generally described in the first PC, since it corresponds to the vector with more information about the linear combinations of the original variables [113]. Figure 2 presents a graphical representation of PCA, where the axes are changed in order to maximize the explained variance using a smaller number of dimensions.

**Figure 2.** Representation of principal component analysis (PCA). Original data at left side, PC1 × PC2 in the middle and PC2 × PC3 at right side.

In the literature, three papers were found involving PCA associated with degradation products of pharmaceutical drugs. Two of them will be discussed in the next paragraphs, and the other one will be discussed later, in the MCR-ALS context.

Tôrres et al. [114] performed accelerated degradation studies of captopril and applied Multivariate Statistical Process Control (MSPC) for monitoring and identifying any changes in samples in order to guarantee the product quality. The details of all procedure data treatment can be found in reference [114]. The captopril stability was evaluated leaving 24 blisters of tablets of the same batch in a climatic chamber at 40 ± 2 ◦C and 75 ± 5% of relative humidity. One blister per week was analyzed by liquid chromatography, for six months, totalizing 24 chromatograms. In order to build the process control chart, a sample set of Captopril was used under normal operation conditions in the calibration (training stage), and in the validation stage, samples were used under normal operation conditions, as were samples presenting expired shelf life. Hotelling's T2 statistic and Square Prediction Error (SPE) were used for sample monitoring. PCA is a useful tool in the Hotelling's T<sup>2</sup> statistic, since it reduces the number of variables to be monitored, changing the original variables by the scores in the PCA, without significant information loss from dataset. The PCA along with the multivariate control charts contributes to identify possible failures and changes early in the process, making this method useful to ensure the quality control of product [114]. The same authors also performed a similar work using the mid (MIR) and near (NIR) infrared techniques [115].

Skibinski et al. [66] performed forced degradation of toloxatone, which is a pharmaceutical drug used as an antidepressant. These studies were carried out in basic (0.01 M NaOH), acidic (1 M HCl), neutral (water), photo UV-Vis, photo UVC, and oxidative (0.01% H2O2) degradation conditions. The samples (including the control solution) were evaluated in a LCMS (ToF) totalizing 21 chromatographic profiles. The stress conditions provided eight unique degradation products of toloxatone [66].

After aligning of chromatographic profiles, PCA analysis showed a visible grouping of the stressed samples. The author noticed that stressed basic samples gave rise to a separated cluster from other stressed samples in the scores analysis obtained from PCA, while neutral and acidic samples were close to the control samples. On the other hand, it was possible to separate in groups the samples carried out under photo UV-VIS, photo UVC, and oxidation conditions. The first three components of PCA model were able to explain almost 71% of the total variance. This work shows that PCA analysis can be used as a tool to characterize the chromatographic profiles.

#### *3.4. Partial Least Squares (PLS)*

Partial least squares (PLS) regression is a multivariate regression technique, the most important one in the chemometrics. It is used to stablish quantitative relationships between a vector of information (UV-Vis, Raman, NIR, MID-IR, NMR spectra or chromatogram, diffractogram, etc.) and properties to be quantified (concentration of an analyte, crystalline phase of API, etc.) [116–119].

As example, the concentrations of an analyte in calibration samples are organized in a vector y, and the chemical data (spectra) are organized in a matrix X. In the classic multivariate regression, the regression coefficient **b** is found by **b** <sup>=</sup> **y** × <sup>X</sup>+, where X<sup>+</sup> is the pseudoinverse of X. The regression equation (model) can be written in the matrix form as **y** = **b** × X. However, there is some issues related to the use of classical multivariate regression, such as the need of high number of samples and the problem of the correlation among the variables in the matrix X. Then, in a similar way as PCA, PLS calculations simultaneously decomposes X and **y** in order to maximize the correlation among the scores of X and **y**. After defining coefficients **b**, it can be applied to determine the concentration in external samples [120].

Some algorithms have been proposed to perform PLS, and the most common are PLS1 and PLS2, for one response and for multiple responses, respectively. Although PLS2 is used for multiple responses, it is recommended only in the cases where there is high correlation among the responses [121].

Recently, Sayed et al. [122] developed a stability-indicating method using PLS to determine mometasone furoate (MF) pure or in pharmaceutical formulation in the presence of its degradation products. The forced degradation was performed only in basic conditions once other previous works have demonstrated its susceptibility in undergoing alkaline hydrolysis. The multilevel multifactor experimental design was applied to prepare mixtures of calibration set constituted by 14 samples, which were scanned over the range of 220–350 nm. The UV spectra of 11 different mixtures of MF and its degradation products were used to predict the concentration of MF. The PLS model applied in the determination of MF presented good results, obtaining in calibration set mean recovery of 100.2% and RMSEC 0.002% meanwhile validation set presented mean recovery of 97.24% and RMSEP 0.04%. The recoveries in pharmaceutical samples were also satisfactory (98.47–102.66%), demonstrating no interference from excipients or alkaline degradation products in the quantification and the power of PLS method for quantification of MF [122]. Besides, in this same work, a new TLC densitometric method and the chemometric tools CLS and PCR were found, which were applied to develop quantification models for the MF in pharmaceutical samples.

Attia et al. [123] also developed spectrometric methods for determination of cefoxitin-sodium in the presence of its alkaline degradation product using different chemometric tools. PLS was applied to quantify cefoxitin-sodium in pharmaceutical sample. To obtain degradation product, the basic forced degradation was performed using NaOH 0.1 M for 10 min, which was neutralized with HCl 0.1 M. More details about the procedure to prepare the working solution are in reference [123]. The PLS model was built considering 13 mixtures denominated calibration set and 12 mixtures as a validation set obtained through experimental design. The number of factors was optimized through cross-validation method, as performed in reference [122]. The genetic algorithm (GA) was coupled with PLS to improve the prediction capability of models eliminating variables without information. In fact, the efficiency of the calibration of GA-PLS was better than only PLS, given lower RMSEC and RMSEP values for GA-PLS. The analysis of cefoxitin-sodium in presence of degradation products and in the pharmaceutical sample

presented mean recovery of 100.54% and 99.86 ± 1.347%, respectively, using GA-PLS. The proposed method presented no significant difference compared to the standard method. Different chemometric tools were proposed and all of them showed a solvent reduction and sample consumption, making the methods greener. Table 8 present papers found in the literature that use in some moment the PLS tool in forced degradation studies of pharmaceutical products.


**Table 8.** Works involving forced degradation studies and the partial least squares (PLS) tool.

#### *3.5. Multivariate Curve Resolution (MCR)*

Multivariate curve resolution (MCR) has been widely used to analyze several types of data in different application fields [137–139]. MCR constitutes a bilinear model based on the classical least squares (CLS) that decomposes data matrix into two submatrices, which have chemical information of the compounds involved in the system [137,139–141].

This approach is also known to be spectral unmixing tool once it allows mathematically solving analyte signals of a complex mixture where they are overlapped in one or more dimensions of data, as chromatograms and spectra of analyte in the presence of interferents in analysis without resolution. MCR aims to differentiate the individual contributions of components of a mixture providing the pure signals (spectra) and the proportions of analytes through concentration profile [138,139,142]. MCR comes from the Beer's law, where concentration is proportional to the absorbance. In this way, a spectral data set can be deconvoluted in the pure spectra from the analytes and their relative concentration. The general equation for MCR is X <sup>=</sup> <sup>C</sup> <sup>×</sup> <sup>S</sup><sup>t</sup> , where the spectral matrix X is deconvoluted in the concentration matrix and the pure spectra matrix.

Most papers related to forced degradation studies and MCR-ALS aimed for the evaluation of photodegradation. Except for basic hydrolysis condition, other degradation conditions were not found in the literature.

Marín-García et al. [143] investigated photodegradation of tamoxifen in aqueous medium using Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS). The photodegradation experiments were conducted at 35 ◦C in a cabinet equipped with light at two different irradiation power conditions (400 and 765 W/m2) according to ICH requirements. To monitor the photodegradation of tamoxifen, the UV-VIS spectra were collected from 0 to 160 min for irradiation power 400 W/m2, and from 0 to 120 min for 765 W/m2. The UV spectra allowed to obtain the evolution of the photodegradation process. MCR-ALS analysis of the UV data allowed to observe the estimation of the kinect profiles for the possible presence of at least four species, three of them being degradation products. Besides, it was possible to obtain the relative concentration of each specie along time.

During photodegradation some molecules cannot be detected by UV-Vis due to the loss of chromophore groups. The authors overcame this situation using a LC-DAD-MS technique to obtain deeper knowledge about species formed in photodegradation. In this case, MCR-ALS analysis provides the C and S matrixes that contain, respectively, the elution profile and pure UV-VIS or MS spectra for each substance. These matrixes showed a new component, which represents a fourth degradation product. This new specie was not observed in the UV-VIS monitoring, it rises during photodegradation but disappears at the end of the process. Furthermore, the authors elucidated the degradation product structures. This work shows MCR-ALS's ability to monitor and solve mixtures of degradation products formed during photodegradation process [143].

Another work reported in the literature was conducted by Feng et. al. [144], which investigated the basic degradation for paracetamol using two-way dimensional UV-Vis associated to MCR-ALS. Forced degradation was performed using a quartz cell where paracetamol and NaOH solutions were added, and the UV-VIS spectra were collected from 1 s to 24 h. Initially, a PCA was applied on UV-VIS data, and it suggested the existence of four components. Later, the concentration profiles were obtained from evolving factor analysis (EFA), and it confirmed the number of chemical components involved in degradation reaction. In the MCR-ALS deconvolution, it was applied to the constraints non-negativity for spectral and concentration profiles and unimodality for the concentration profile. Through the concentration profile and spectra profile plots, it was possible to perform a critical analysis of the formation and consumption of the species during alkaline degradation. It was possible to observe that there were a reactant, a degradation product, and two intermediates. The authors compared the results with HPLC analysis, which proved the existence of two intermediates, and the concentration profile were in agreement with the one recovered by MCR-ALS using UV-Vis. Besides, the authors also proposed a degradation pathway in alkaline media. The use of MCR-ALS in forced degradation studies allowed to verify the drug stability and kinect of degradation of paracetamol [144]. Other papers regarding forced degradation studies and MCR-ALS are presented in Table 9.


**Table 9.** Works involving forced degradation studies and the Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) tool.

#### *3.6. Artificial Neural Network (ANN)*

Artificial neural networks (ANNs) are powerful chemometric tools based on artificial intelligence. They can model nonlinear data through learning processes in a similar way to the human brain [36,152]. ANN models are able to map the input data in a set of appropriate outputs following a "learning by examples." In other words, the structure of data is learned through training algorithms [153].

To the best of our knowledge, two works regarding to forced degradation studies and artificial neural network are reported in the literature, and only one of them uses ANNs as the main tool [123,154].

Golubovi´c et al. [154] used ANNs to develop quantitative structure-retention relationships (QSRRs) model to optimize isocratic RP-HPLC method of candesartan cilexetil in the presence of seven degradation products obtained from acid, alkaline, neutral hydrolysis, photolysis, and oxidation conditions. QSRRs is able to relate chromatographic retention parameters and molecular structure, and it becomes a valuable tool to the prediction of chromatographic behavior and separation of complex mixtures.

Initially, to investigate the variables that could influence the chromatographic behavior, a 25–1 fractional factorial design was performed. The following variables were included in the design: percentage of acetonitrile in the mobile phase, buffer pH and ionic strength, temperature of the column, and flow rate of the mobile phase. All variables showed to be significant and, therefore, were considered as inputs in the ANN modeling, except flow rate, which was maintained as a constant.

The molecular structure is an essential variable in QSRR model and is encoded by descriptors. Roughly, molecular descriptors are obtained by logic and mathematical procedures that transform chemical information in a useful number of some standardized experiments. The selection of molecular descriptors was based on intermolecular interactions suggested by theory of liquid chromatography. In the ANN modeling it were included the descriptors which present low correlation between them, such as polarizability, H-donor sites, H-acceptor sites, and octanol/water distribution coefficient.

It was used a multi-layer feedforward, the most common ANNs, constituted by one input layer (descriptors and significant chromatographic variables), number of hidden neurons connected to both input and output neurons (retention factor). In the network training stage, the overall agreement between computed and target output for a set training is maximized. In order to avoid overfitting, the predictive power of network was evaluated using a validation set. Both training and validation sets were defined through a Box-Behnken design, varying from −1 to +1 level. A total of 344 cases for ANN optimization were obtained, which were divided into 280 cases for the training set, 32 for external validation, and 32 to validation set. For training, validation, and external validation data sets, coefficients of determination (R2) were obtained between experimental and predicted retention factor (Kexp and KANN respectively) equal to 0.9993, 0.9969 and 0.9956, respectively. Therefore, high R<sup>2</sup> and low RSME values demonstrate an excellent predictive ability of model and non-occurrence of overfitting during the training process.

This kind of mathematical model is an important tool in forced degradation studies since degradation products derive from the API and, therefore, are chemically similar. The creation of models able to predict the behavior of active substance and all degradation products contribute to defining the optimal chromatographic conditions during the optimization process [154].

#### **4. Conclusions**

Chemometric tools can bring considerable gains in forced degradation studies. DoE is the most used chemometric tool in such studies, especially in the development of suitable chromatographic methods to monitor the API. However, the application of DoE directly in stress experiments is also promising, as it is possible to quantify the individual effect of stress variables as well as the synergy between them, simulating what may occur in real life. The other widely used tool is PLS, since its use allows the quantification of the API directly in UV-Vis spectrophotometry analyzes, since it performs multivariate quantification, which makes possible quantification of species without resolution.

The PCA technique is not applied in these studies since it is an exploratory method, and its application is more related to process monitoring and classification methods for raw material identification.

The other tools, despite being very useful in such studies, are more complex, and their application is limited for non-chemometricians.

**Author Contributions:** Both authors contribute equally to produce this review.

**Funding:** This research was funded by the São Paulo Research Foundation (FAPESP), grant number 2017/13095-0. **Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Molecules* Editorial Office E-mail: molecules@mdpi.com www.mdpi.com/journal/molecules

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18