*Article* **Predicting Dissolution Kinetics of Tricalcium Silicate Using Deep Learning and Analytical Models**

**Taihao Han 1, Sai Akshay Ponduru 1, Arianit Reka 1,2, Jie Huang 3, Gaurav Sant <sup>4</sup> and Aditya Kumar 1,\***


**\*** Correspondence: kumarad@mst.edu; Tel.: +1-573-341-6994; Fax: +1-573-341-6934

**Abstract:** The dissolution kinetics of Portland cement is a critical factor in controlling the hydration reaction and improving the performance of concrete. Tricalcium silicate (C3S), the primary phase in Portland cement, is known to have complex dissolution mechanisms that involve multiple reactions and changes to particle surfaces. As a result, current analytical models are unable to accurately predict the dissolution kinetics of C3S in various solvents when it is undersaturated with respect to the solvent. This paper employs the deep forest (DF) model to predict the dissolution rate of C3S in the undersaturated solvent. The DF model takes into account several variables, including the measurement method (i.e., *reactor connected to inductive coupled plasma spectrometer* and *flow chamber with vertical scanning interferometry*), temperature, and physicochemical properties of solvents. Next, the DF model evaluates the influence of each variable on the dissolution rate of C3S, and this information is used to develop a closed-form analytical model that can predict the dissolution rate of C3S. The coefficients and constant of the analytical model are optimized in two scenarios: *generic* and *alkaline* solvents. The results show that both the DF and analytical models are able to produce reliable predictions of the dissolution rate of C3S when it is undersaturated and far from equilibrium.

**Keywords:** tricalcium silicate; analytical model; ion activity; dissolution kinetics; deep forest

#### **1. Introduction**

Portland cement (PC) is the fundamental material for modern infrastructure, but its production contributes significantly to global CO2 emissions, accounting for about 9% of total emissions [1–3]. To improve the sustainability and performance of PC, it is important to understand the hydration reaction of its primary component, tricalcium silicate (C3S). C3S is the most abundant component in PC, making up more than 50% of its composition [4–6]. When C3S reacts with water, it undergoes a series of chemical reactions that result in the dissolution of calcium and silicate ions, followed by the formation of calcium silicate hydrate and portlandite [4]. While the phase transformations that occur at later stages of the hydration are well documented [4,7], the dissolution kinetics of C3S at early stages remains a controversial subject. However, it is important to understand the dissolution kinetics of C3S when it is undersaturated with respect to solvent. The undersaturation of C3S solution presents the initial and induction periods of cement hydration [4,6]. The dissolution mechanisms of C3S are different when the solution is in undersaturation and saturation (i.e., hydration products form) [6,8]. By studying the dissolution behaviors of C3S, we can gain a better understanding of the factors that affect the hydration kinetics of cement. This knowledge can be used to develop novel cement formulations and improve cement performance.

**Citation:** Han, T.; Ponduru, S.A.; Reka, A.; Huang, J.; Sant, G.; Kumar, A. Predicting Dissolution Kinetics of Tricalcium Silicate Using Deep Learning and Analytical Models. *Algorithms* **2023**, *16*, 7. https://doi.org/10.3390/a16010007

Academic Editors: Xiang Zhang and Xiaoxiao Li

Received: 14 November 2022 Revised: 19 December 2022 Accepted: 21 December 2022 Published: 22 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Despite many studies that have sought to uncover the mechanisms (e.g., protective phase [9–11] and double layer theory [12]) behind the dissolution of C3S and minerals in recent decades, a definitive rate-controlling mechanism remains elusive due to the complex interaction of physicochemical parameters between solids and aqueous solvents. The most widely accepted theory to explain the dissolution kinetics of C3S and minerals is the inverse crystal nucleation theory [6,13,14]. This theory posits that, similar to the process of crystal growth, the dissolution of C3S and minerals is primarily determined by the density of pre-existing steps on the surface of minerals [14]. These steps are formed by dislocation defects and the nucleation of two-dimensional vacancy islands at impurities or homogenous sites. The growth of vacancy islands on a surface is determined by the Gibbs–Thomson effect, a thermodynamic principle that dictates their critical size [15,16]. If a vacancy exceeds this critical size, it will continue to grow. At the critical size, the free energy change reaches a maximum, creating an energy barrier that must be overcome for vacancy growth to continue. The energy barriers that must be overcome by the vacancy islands have a proportional relationship with the interfacial energy, but an inverse relationship with the degree of undersaturation [15,16]. While the solution is near the equilibrium, the density of steps of the solid is dominated by dislocation defects, as the energy barriers are too high for vacancy islands to overcome.

Except for surface defects, other experimental parameters—for example, solvent chemistry [8,17], surface geometry [18–23], and mineral composition [24–26]—also substantially influence the dissolution kinetics of C3S and minerals. By incorporating these parameters into analytical models, it is possible to reveal underlying structures between dissolution kinetics and physicochemical properties of minerals and solvents. The following review focuses on existing analytical models (shown in Table 1) that have been used to predict the dissolution kinetics of C3S and minerals. Some of these models have been successful in accurately predicting the dissolution kinetics of minerals. The symbols used in these models are defined as follows: Δ*Gr* is the Gibbs free energy of the overall reaction; *T* is the temperature; *R* is the gas constant; *A* is the effective surface area of material; *ai* is the ion activity of species *i*; *Ea* is the activation energy; *n*, *ni*, *k*, and *ki* are constants; and *g(I)* is the function of ionic strength.

The analytical model developed by Burch et al. [27] is based on the transition state theory and the Burton–Cabrera–Frank theory. It shows that the dissolution rate of a mineral depends exponentially on the Gibbs free energy of the overall dissolution and the temperature. However, this model cannot accurately predict the dissolution kinetics of a solid–solvent system that is near the equilibrium. This is because the model does not account for the transition from step retreat to dislocation-controlled dissolution. The model developed by Lasaga et al. [28] accounts for various factors such as surface area, temperature, ionic strength, H<sup>+</sup> concentration in the solvent, and the change in Gibbs free energy related to dissolution. This model is widely used in the cement community to predict the dissolution kinetics of C3S [6]. In addition to modeling from a thermodynamic perspective, several studies [29–33] have explained the dissolution kinetics of minerals using the ion leaching theory. Strachan's model [29] accounts for both H+ and OH<sup>−</sup> in the leaching process, as these ions leach species from mineral surfaces with different activation energies. Other studies [32–36] have found that cations (excluding H+) in solvents can also contribute to mineral surface leaching. Oelkers et al. [32] have emphasized the role of the ion activity ratio of H<sup>+</sup> to cations in mineral dissolution kinetics. This model divides the process into two scenarios: if the ion activity ratio is small, a large number of cations remain on the material surface, which dominates the leaching and dissolution processes; if the ratio is large, the dissolution rate is independent of cations. This model has been used to predict the dissolution kinetics of various minerals [33].


**Table 1.** Summary of current dissolution kinetics models for C3S and minerals.

Although previous studies have proposed various models for predicting dissolution kinetics of minerals based on disparate theories, none of these models can predict the dissolution kinetics of C3S in a high-fidelity manner with a coefficient of determination (*R*2) above 0.90. This is because there are several knowledge gaps in the state-of-the-art analytical models. First, it is not possible to account for all the influential variables (e.g., ions in the solvent; physicochemical properties of C3S particles; temperature; etc.) in a single analytical model. Moreover, it is difficult to incorporate a variable into analytical models without a clear understanding of its role in the dissolution process. Next, coefficients are not generic, thus requiring additional calibration while applying the model to a new C3S-solvent systems. Lastly, some parameters (e.g., ion activity; Gibbs free energy; activation energy; etc.) are obtained from additional quantitative and qualitative analyses of experimental results, which makes the model difficult to use and increases the likelihood of human error.

Measuring the dissolution rate of C3S is a challenge because the solubility of calcium silicate hydrate is much lower than that of C3S, which means calcium silicate hydrate will precipitate before C3S completely dissolves, unless a very small amount of C3S is used. As a result, only a few studies have attempted to measure the C3S dissolution rate. Those studies have applied two different methods to measure the dissolution rate of C3S: *reactor connected to inductive coupled plasma (ICP) spectrometer* [37,38]; and *flow chamber with vertical scanning interferometry (VSI)* [39]. In the first method, C3S particles dissolve into the solvent in a reactor, and the ICP spectrometer measures ion concentrations of the solution for the first couple of minutes to determine the dissolution rate. The change in C3S surface area can be ignored because of the short measurement duration. In the second method, the solvent is flushed over the surface of the C3S bulk for a period of time, and the VSI is used to measure the leaching depth and determine the dissolution rate. Because these two methods are based on different experimental principles and use different parameters, a single analytical model cannot be used to predict the dissolution rate from both methods.

Machine learning (ML), a data-driven framework, has been employed in many studies [40–50] to predict properties for multi-component systems (e.g., cement, glass, and biomaterials) in a high-fidelity manner. ML models acquire knowledge of underlying inputoutput correlations (all possible correlations can be included) from a training dataset, and subsequently utilize such knowledge to produce predictions for new mixture designs, without requiring an understanding of the mechanisms behind the materials. Elçiçek et al. [49] have successfully employed an artificial neural network to discover the underlying structure between the dissolution kinetics of colemanite, a type of boron mineral, in complex dissolution environments. A decision-tree-based ensemble model has demonstrated remarkable performance, in terms of *<sup>R</sup>*<sup>2</sup> <sup>≈</sup> 0.98, on predictions of dissolution rate for bioactive glasses in various pH environments [40]. ML models incorporating topological constraints of glasses have been employed to predict and extrapolate dissolution kinetics of silicate glasses without violating fundamental material laws [43]. Although extensive studies have applied ML methods to predictions of material dissolution kinetics, there is currently no literature that shows that an ML model is a valid approach to predict the dissolution rate of C3S when it is undersaturated with respect to the solvent.

In this study, a deep forest (DF) model is trained using a heterogenous database of C3S dissolution rate measured by the *reactor connected to ICP spectrometer* and *flow chamber with VSI* methods. The rigorously trained model produces high-fidelity, a priori predictions of the C3S dissolution rate. It is notable that ML models can predict the hydration kinetics of PC at any given age, which has been shown in our previous studies [51–53]. This study only focuses on the dissolution kinetics at the initial period (i.e., undersaturated solution) because the hydration products precipitate and cause the solution to reach saturation after a short time of the dissolution of C3S. Then, the influence of each input variable on the dissolution rate is evaluated, and this knowledge is used to develop a simple, closed-form analytical model based on fundamental thermodynamic and kinetic frameworks, such as ion activity, ion strength, and ion activity product (*IAP*). The analytical model reveals fundamental correlations behind the C3S dissolution process, which are the critical information that cannot be provided by ML models due to their "black-box" nature. Furthermore, the analytical model can be used by all end users, regardless of their background or of their access to ML models. Overall, this study is the first to develop an ML model to predict with high fidelity the dissolution kinetics of C3S dissolved in various solvents when it is undersaturated and far from equilibrium.

#### **2. Database Collection**

The C3S dissolution database used in this study consists of 292 data records, which were consolidated from Nicoleau et al. [37,38] and Juilland and Gallucci [39]. However, these data records are not compatible with our database due to differences in experimental parameters. For example, Bellmann et al. [54] measured the dissolution rate of C3S at the induction period and later ages; Damidot et al. [55] and Barret et al. [56] used the filter dissolution technique; and Robin et al. [57] used the face-specific dissolution method to measure the dissolution rate of C3S. The database used in this study contains 11 input parameters: temperature (◦C); specific surface area (SSA) of C3S (m2/g); flow rate (mL/min/mm2) initial concentration of Na, Cl, Ca, Si, Cs, K, and SO4 (mM); and initial pH (unitless). The output is the dissolution rate of C3S (umol/m2/s). There are 92 data-records from Nicoleau et al. [37,38] measured by the *reactor connected to ICP spectrometer* method. Since the flow rate is not applicable in this method, it was set as 0. Several solvents with different ions were utilized in the *reactor connected to ICP spectrometer* method. There were 200 data records from Juilland and Gallucci [39] measured with the *flow chamber with VSI* method. Moreover, since the SSA of C3S is not applicable in this method, it was set as 0. The solvents only contained calcium ions at different concentration levels. Four statistical parameters associated with inputs and output of the C3S dissolution database are summarized in Table 2.


**Table 2.** Four statistical parameters pertaining to the 12 parameters (11 inputs and 1 output in bold) of the C3S dissolution database. The database consists of 292 unique data records.

The ML model was trained by 219 randomly selected data records from the original database. The remaining 73 data records were used to validate the performance of the model. The prediction performance was evaluated by five statistical parameters: mean absolute error (*MAE*); coefficient of determination (*R*2); mean absolute percentage error (*MAPE*); Pearson correlation coefficient (*R*); and root mean square error (*RMSE*).

#### **3. Deep Forest Model**

In this study, a DF model was utilized to predict C3S dissolution kinetics based on the physicochemical properties of C3S and solvents. The DF model was developed based on the modified classification-and-regression tree (CART) model with a combination of bagging and random selection techniques [58,59]. The DF model grows a large number of independent trees through a recursive binary split at each node [58]. To be specific, the root node receives information from a bootstrap extracted from the training dataset, and then splits to create two child nodes. This process is repeated until the homogeneity of the child nodes cannot be improved. The tree can grow as deep as it can because none of the usual pruning or smoothing algorithms are applied. This allows the DF model to maintain diversity among trees. The DF model usually contains hundreds of independent trees. Usually, a large-size forest is required to produce reliable predictions while the database contains thousands of data-records. When a testing dataset is applied to a trained DF model, trees produce independent outputs, and subsequently a bagging algorithm averages them to derive the final output. A unique feature (i.e., two-stage randomization) allows the DF model to reduce the variance and bias errors in predictions. The first randomization is that the bootstrap randomly selects data records from the parent database. Second, at each split, several randomly selected variables, instead of all variables, are used to determine the optimal split. The randomization features ensure the decorrelation between trees. Furthermore, due to the growth of a large number of trees, errors from generalization and the likelihood of overfitting are minimized. Owing to those unique features, the DF model can effectively learn input–output correlations from complex databases. Overall, the architecture of the DF can be summarized in the following steps:

N bootstrap samples are randomly selected from the training dataset. N is equal to the number of trees. In this study, N was 200. Each bootstrap can contain ~66% [60–62] of the data records of the training dataset. The remaining data records are "out-of-bag" (OOB) data [58].


#### **4. Predictions from Deep Forest Model**

To optimize the DF model's performance on new data records, it is crucial to meet the following criteria. First, the model requires sufficient and diverse data records to learn adequate input–output correlations (e.g., pH–dissolution rate). Second, outliers should be included in the database to ensure that the DF model comprehensively learns input–output correlations [63,64]. Herein, the outliers indicated that one or more datarecords—although measured and reported properly—did not fit into the trends exhibited by the majority of the data records in the neighborhood because of some underlying

(chemical, or kinetic, or thermodynamic) mechanism. Third, it is important to avoid both underfitting and overfitting to datasets. Underfitting occurs when the model is unable to learn the underlying correlations in the data, often due to a small training dataset that does not contain enough information for the model to learn from. On the other hand, overfitting occurs when the model learns local trends instead of global ones from highly similar data, resulting in poor performance on the testing dataset. To address this issue, the hyperparameters of the DF model were the 10-fold cross-validation (CV) [41,65] and gridsearch methods [48,52]. These methods can help to prevent underfitting and overfitting by evaluating the model's performance on multiple splits of the training data and using a range of different hyperparameter settings, respectively. Predictions of C3S dissolution rate (from training and testing datasets), as produced by the DF model, are demonstrated in Figure 1. The five statistical parameters listed in Table 3 provide further evidence of the model's performance and accuracy. Overall, by meeting the aforementioned criteria, the DF model can be trained to make highly accurate predictions on new data records.

**Figure 1.** DF model's predictions of C3S dissolution rate against experimental measurements of training and testing datasets. Coefficient of determination (*R*2) is shown in the legend, providing a measure of the prediction performance. The dashed line represents the ideal prediction.

**Table 3.** *R*, *R*2, *MAE*, *MAPE*, and *RMSE* evaluating prediction accuracy of the DF model against the testing dataset.


The predicted results from the DF model for the dissolution rate of C3S, as shown in Figure 1 and Table 3, demonstrate the model's accuracy and reliability. The *R*<sup>2</sup> and *RMSE* values for the dissolution rate predictions were 0.94 and 9.4 μmol/m2/s, respectively, indicating a strong correlation between the predicted and measured values. In Figure 1, the predictions show a larger deviation at low dissolution rates than at high dissolution rates, but this is largely due to the use of a logarithmic scale on the *y*-axis. The prediction errors, as measured by the mean absolute error (*MAE*), were 2.01 μmol/m2/s for low dissolution rates (below 20 μmol/m2/s) and 8.67 μ mol/m2/s for high dissolution rates, indicating that the DF model is able to produce reliable predictions of the dissolution rate of C3S, regardless of the experimental method. This is a significant improvement over analytical models, which typically have a prediction accuracy of only 0.78 in terms of *R*<sup>2</sup> for silicate compounds [66]. The capability of the DF model to yield reliable predictions of C3S dissolution rate is largely due to its inherent architecture [59,60,62]. First, by growing a large number (more than 100) of independent trees without smoothing or pruning, the model is

able to significantly reduce the variance error in its output. Next, bias error is minimized by adopting the randomization at bootstrap and feature selections [59], which ensures that the output of one tree does not interfere with that of others. Lastly, the utilization of the 10-fold cross-validation method [65] and grid-search method [48,67] autonomously optimized the hyper-parameters so as to establish optimal input–output correlations as well as account for outliers.

The DF model can estimate the influence (in terms of importance) of input variables on the dissolution rate of C3S. The results of this analysis are shown in Figure 2, which is organized in descending order based on the magnitude of variables' influence. This rank is also utilized as a guide for feature selection in the development of the analytical model in Section 5.

**Figure 2.** The influence (importance) of input variables based on their contributions towards the C3S dissolution rate. The permutation of the rank is shown in a descending manner, where variables on the left side have more influence.

As can be seen in Figure 2, the initial pH, Ca concentration, SSA of C3S, and flow rate—ranked from high to low—exhibited the strongest influences on the dissolution rate of C3S. This is expected because the Ca and OH ions (in terms of pH value) are known to be the main factors that affect the dissolution reaction according to *IAP* (described in Section 5) of C3S dissolution, where a high concentration of these ions significantly reduces the dissolution rate. The SSA of C3S is the third important variable because an increase in the interface between C3S particles and solvent leads to a monotonical increase of the dissolution rate [20]. Similarly, the flow rate in the *flow chamber with VSI* method plays a significant role, as it determines the speed at which ions are leached from the surface of the C3S particles, with higher flow rates leading to an increase in the leaching speed. Temperature is also an important variable, as previous research [27] has shown that the dissolution rate of minerals increases exponentially with an increase in temperature. Other ions in the solvent contribute less significantly to the dissolution rate. This is not a surprise, because no literature has found direct correlations between C3S dissolution rate and those ions. Interestingly, the Si ion, one of the major ions that affect the dissolution rate of C3S, was ranked much lower in terms of importance. This is likely because there are only three solvents in the database that contain Si ions, and the dissolution rates for these systems show little variation. As a result, the Si ions are less important than they would be in a larger and more diverse dataset. It should be noted that the importance of input

variables can vary depending on the dataset used. Some variables may be found to be more important in one dataset, while being less significant in another. In this study, only a few variables were found to have a strong influence on the dissolution rate of C3S. However, in a different dataset, different variables may exhibit a greater importance.

#### **5. Analytical Model Development**

The abovementioned results demonstrate that the DF model can produce predictions of the dissolution rate of C3S in a high-fidelity manner. However, the use of machine learning (ML) techniques can have some limitations, such as the "black-box" issue, where the underlying input–output correlations learned by the model are difficult to interpret. Additionally, ML models may not be accessible to end users who do not have a programming background. To address these issues, this section introduces an original, closed-form analytical model that has been distilled from the DF model. This model can be used to predict the dissolution rate of C3S and provide a better understanding of the input–output correlations involved.

The development of a reliable analytical model involves a wise selection of input variables. The inclusion of influential variables is vital to enhance the performance of the analytical model. Simultaneously, the exclusion of inconsequential variables reduces the complexity of the model. The new analytical model is developed based on Lasaga's model [28], and some new input variables are added to it. We selected Lasaga's model as the baseline model because it is the most used model to predict C3S dissolution kinetics. This model accounts for the SSA of C3S, solvent pH, temperature, and ions in solvents. The feature importance, shown in Figure 2, also confirms that those parameters dominated the dissolution rate of the C3S. It is worth pointing out that only data from Nicoleau et al. [37,38] was employed to develop the analytical model. This is because the SSA of C3S is not applicable in Juilland and Gallucci [39].

In the baseline model, the Gibbs free energy of the overall reaction is one of the major influential variables. To properly quantify this variable, it is important to understand the dissolution mechanism of C3S. The dissolution process of C3S can be considered as an inverse nucleation process [13], which is controlled by two major factors: interfacial properties and the driving force. The interfacial properties include chemical composition, chemical bond, surface defects, and impurities in crystals. Generally, the dissolution process can be divided into three steps: (1) horizontal movement at the atomic scale to form a 2D vacancy; (2) etch pit formation at dislocation; and (3) step retreat at pre-existing roughness [6,19]. The driving force of the C3S dissolution reaction is defined as the energy to overcome the activation energy barriers for the first two steps of the dissolution process. The equation to calculate the driving force is shown in Equation (1) [6,68]:

$$
\sigma = \frac{\Delta \mu}{kT} = \frac{\Delta G^\*}{RT} = \ln \left(\frac{IAP}{K\_{SP}}\right) \tag{1}
$$

Here, *σ* is the undersaturation coefficient; Δ*μ* is the difference in chemical potential; *k* is the Boltzmann constant; *T* is the temperature; Δ*G*∗ is the free energy difference between the undersaturated solution and the solution in equilibrium; *R* is the gas constant; *IAP* is the ion activity product to reactant species; and *KSP* is the mineral solubility products. The dissolution reaction of C3S is expressed in Equation (2) [8], and the *IAP* is defined in Equation (3). ai is the ion activity of species *i*. The chemical equilibrium constant (*KSP*) for C3S dissolution has been estimated as 10−17.65 [8,69].

$$(CaO)\_3SiO\_2 + 3H\_2O \to 3Ca^{2+} + H\_2SiO\_4^{2-} + 4OH^- \tag{2}$$

$$IAP = a\_{Ca^{2+}}^3 \cdot a\_{OH^-}^4 \cdot a\_{H\_2SO\_4^{2-}} \tag{3}$$

Equation (3) suggests that the value of *IAP* is solely determined by calcium and hydroxide ion activity. Thus, a high calcium ion activity leads to an equilibrium for the C3S dissolution, resulting in a slower dissolution rate compared to a solvent without calcium ions [6,70]. Similarly, a basic solvent significantly decreases the dissolution rate of C3S by containing a large amount of hydroxide ions. In this study, only H2SiO4 <sup>2</sup><sup>−</sup> was considered in the *IAP* calculation because H4SiO4 and H3SiO4 − can deprotonate to form H2SiO4 <sup>2</sup><sup>−</sup> [71,72]. To clearly observe the influence of *IAP* on dissolution rate, Figure 3 shows the correlation between the degree of undersaturation (*IAP/Ksp*) and the dissolution rate of C3S. The general trend of the correlation and order of magnitude of changes in the dissolution rate observed herein are in good agreement with previous studies [6,28]. It is not surprising that the dissolution rate of C3S decreases as the degree of undersaturation increases, as a high degree of undersaturation indicates that the solution is approaching an equilibrium, which reduces the driving force for dissolution.

**Figure 3.** The correlation between the degree of undersaturation (*IAP*/*Ksp*) and the dissolution rate of C3S. The x-axis shows in a logarithmic scale due to the small magnitude of the degree of undersaturation.

As previously discussed in the introduction, Strachan [29] has demonstrated that H+ and OH− leach mineral surfaces in different activation energies. Since Lasaga's model only accounts for the H+, the new model includes the ion activities of both H+ and OH<sup>−</sup> in order to interpret the leaching process. Moreover, especially for C3S dissolution, OH− is one of the main products of the dissolution reaction, as shown in Equation (2).

Previous studies [32–36] have also shown that the concentration of major cations (excluding H+) in the solvent can influence the dissolution rate, and this is supported by the data shown in Figure 2, which highlights the importance of Ca concentration in the analytical model. However, previous studies have not explored the relationship between the activity of Ca2+ and the dissolution rate of C3S. Using data from Nicoleau et al. [37,38], we show this relationship in Figure 4, which plots the natural logarithm of the dissolution rate of C3S against the initial activity of Ca2+. The correlation is observed as linear (shown as the red line). This means the relationship between C3S dissolution rate and Ca2+ activity is exponential. Some outliers can be seen in the Figure, which may be due to the influence of other parameters, such as temperature and the specific surface area of C3S, on the dissolution rate. If all other parameters are kept constant, a more ideal linear relationship should be observed. After embodying OH<sup>−</sup> and Ca2+, the new analytical model, with seven input variables, is formed as Equation (4). Here, *Ci* is the coefficient for each attribute; *T* is Temperature (◦C); *A* is the specific surface area of C3S (m2/g); *ai,j* is ion activity of *i* species at initial/final state (unitless); *I* is ion strength of initial state (mM); *IAP* is ion activity product of final state (unitless); *Ksp* is C3S solubility product (≈10−17.65 [8,69]).

$$rate = e^{\mathcal{C}\_0} \ast e^{\frac{\mathcal{C}\_1}{\mathsf{T}}} \ast A^{\mathcal{C}\_2} \ast e^{\mathcal{C}\_3 \mathcal{a}\_{\text{Ca},initial}} \ast a\_{\text{OH,initial}}^{\mathcal{C}\_4} \ast a\_{\text{H,initial}}^{\mathcal{C}\_5} \ast I^{\mathcal{C}\_6} \ast \left(\frac{IAP}{K\_{sp}}\right)^{\mathcal{C}\_7} \tag{4}$$

$$\begin{aligned} \ln(rate) = \mathbb{C}\_0 + \frac{\mathbb{C}\_1}{\mathbb{P}} &\quad + \mathbb{C}\_2 \ln(A) + \mathbb{C}\_3 a\_{\mathbb{C}, \text{init}} + \mathbb{C}\_4 \ln(a\_{OH, \text{init}}) + \mathbb{C}\_5 \ln(a\_{H, \text{init}}) \\ &\quad + \mathbb{C}\_6 \ln(I) + \mathbb{C}\_7 \ln\left(\frac{IAP}{K\_{\text{sp}}}\right) \end{aligned} \tag{5}$$

**Figure 4.** The dissolution rate of C3S, expressed in terms of natural logarithm, against the ion activity of Ca2+ in solvents. The red line indicates the linear correlation.

Phreeqc version 3, a geochemical modeling package, was used to simulate chemical reactions and ion transportations in natural and polluted water for laboratory and industrial purposes. The program is based on the equilibrium chemistry of aqueous solutions interacting with other components, including mineral, gas, solid solution, and sorption surface. The model can produce the concentration of an element, molarity of a compound, activity of aqueous species, pH, and phase transformation to achieve equilibrium based on reversible and irreversible chemical reactions [73–75]. In this study, the geochemical Phreeqc code was employed to calculate ion activity and ion strength of ions in solutions. Thermodynamic data were obtained from the specific ion interaction theory database to account for the non-ideality of aqueous solutions and used to calculate the speciation and saturation index [73,76]. Temperature and concentration of the species are given as initial conditions with pH as charge balance to calculate the pH, ion strength and ion activity of Na+, Cl−, OH−, Ca2+, H2SiO4 <sup>2</sup>−, Cs+, K+, and SO4 <sup>2</sup>−.

There are seven coefficients and one constant (i.e., Ci) that ought to be optimized. Two scenarios are considered to optimize the coefficients: (1) C3S dissolves in *generic* solvent (pH <sup>≈</sup> 7–13) with a pH range of approximately 7–13, where both H+ and OH<sup>−</sup> can leach the surface of C3S; and (2) C3S dissolves in *alkaline* solvent (pH ≈ 11–13) with a pH range of approximately 11–13, where OH− is the primary leaching ion. An independent optimization for the alkaline scenario was performed in order to improve the prediction accuracy. The optimal values of coefficients were derived from a nonlinear, gradient-descent scheme [40,42,52,77–79] and Nelder–Mead multi-dimensional simplex algorithm [80,81].

Table 4 shows the optimal coefficients of the analytical model for the *generic* solvent scenario. Predicted results of the C3S dissolution rate as produced by the analytical model based on the coefficient in Table 4 are demonstrated in Figure 5a. Five statistical parameters pertaining to the predicted results are listed in Table 5. As demonstrated in Figure 5a and Table 5, the accuracy for predictions made by *generic* solvent scenario was moderate in terms of *<sup>R</sup>*<sup>2</sup> <sup>≈</sup> 0.69 and *RMSE* <sup>≈</sup> 32.9 <sup>μ</sup> mol/m2/s. This is expected, because the analytical model cannot account for all influential factors (e.g., other ions in solvents and some processing parameters) compared to the DF model. Furthermore, a large deviation of H+ concentration in neutral and alkaline solvents increases the difficulty of optimizing the simple-structure analytical model.


physicochemical properties of C3S and solvents) optimized for the analytical model of the *generic* solvent scenario.

**Table 4.** Seven coefficients and one constant (for seven input variables corresponding to the

**Figure 5.** The analytical model's predictions of C3S dissolution rate against experimental measurements for (**a**) *generic* solvent and (**b**) *alkaline* solvent. Coefficient of determination (*R*2) is shown in the legend, providing a measure of the prediction performance. The dashed line represents the ideal prediction.

**Table 5.** *R*, *R*2, *MAE*, *MAPE*, and *RMSE* evaluating the prediction performance of the analytical model for *generic* and *alkaline* solvent scenarios against experimental measurements.


Table 6 shows the optimal coefficients of the analytical model for *alkaline* solvent scenario. Predicted results of the C3S dissolution rate, as produced by the analytical model based on the coefficient in Table 6, are demonstrated in Figure 5b. Five statistical parameters pertaining to the results are listed in Table 5. As shown in Figure 5b and Table 6, predictions for the dissolution rate of C3S were high-fidelity, with *R*<sup>2</sup> of 0.92 and *RMSE* of 9.545 μmol/m2/s, respectively. The predictions of the *alkaline* solvent scenario are superior, in terms of *R2*, to those from *generic* solvent scenario. The high-quality prediction is expected because the *alkaline* solvent scenario minimizes the effect from H+; in other words, the input–output correlations become simpler due to the reduction of the influence of H+. Therefore, the trend for the simple system can be captured by the analytical model exactly.


**Table 6.** Seven coefficients and one constant (for seven input variables corresponding to the physicochemical properties of C3S and solvents) optimized for the analytical model of the *alkaline* solvent scenario.

#### **6. Conclusions**

In this study, the DF and analytical models were demonstrated to predict the dissolution rate of C3S. The DF model was used to predict the dissolution rate of C3S in relation to temperature, ion concentration in solvent, and pH, which can be directly obtained from experimental measurements. To the best of the authors' knowledge, this is the first study to employ ML to predict the dissolution rate of C3S when it is undersaturated with respect to a wide range of solvents. Another novel point of this study is the leveraging of the DF model for evaluating the influence of input variables and using such knowledge to develop an analytical model.

The database was collected from two distinct experimental setups: *reactor connected to ICP spectrometer* and *flow chamber with VSI*. The DF model was rigorously trained by 75% of the parent database that consisted of 292 data records. Subsequently, the model was tested against the remaining 25% of the data records to evaluate prediction performance. The results demonstrated that the DF model was able to yield reliable predictions, with an *R<sup>2</sup>* value of approximately 0.97, of C3S dissolution rate in the undersaturated solution. The DF model allows researchers to acquire the dissolution rate of C3S by simply knowing the ion concentration and temperature of solvents without the cumbersome dissolution experiments. The DF model was also employed to evaluate the influence of input variables on the dissolution rate of C3S. It was found that the pH value of solvents and the concentration of Ca2+ exerted significant influences on the dissolution process, while the concentration of silicate ions had little influence.

The analytical model (only using data from the *reactor connected to ICP spectrometer* method) was classified into two scenarios: *generic* solvent and *alkaline* solvent. The coefficients of the *generic* solvent and *alkaline* solvent scenarios were optimized by 92 data records and 75 data records, respectively. The physiochemical properties—which were used as inputs for both scenarios—comprised SSA of C3S, temperature, ion activity of Ca2+, OH<sup>−</sup>, and H+, ionic strength of solvent, and degree of undersaturation. The results showed that the analytical model was able to produce reliable predictions of *generic* solvent with R ≈ 0.83 and *alkaline* solvent with R ≈ 0.96 when all coefficients were rigorously optimized. Unlike ML, the analytical model can quantitively interpret aqueous chemistrydissolution correlations.

Overall, the DF model is an apposite platform that can be used in the future to study the dissolution kinetics of cementitious materials. A large volume and diverse database can further enhance prediction accuracy. By incorporating a wide range of data, the model can better capture the complex dissolution behavior of cementitious materials. This can improve the reliability of the model's predictions, allowing it to be used more effectively in the design of cementitious materials. Overall, the DF model has the potential to be a valuable tool for studying the dissolution kinetics of cementitious materials.

**Author Contributions:** Conceptualization, development, training, and validation of machine learning and analytical models, and preparation of original manuscript, T.H.; development of analytical model and preparation of original manuscript, S.A.P.; manuscript review and editing, A.R.; supervision, manuscript review and editing; and funding acquisition, J.H.; supervision, manuscript review and editing, and funding acquisition, G.S.; conceptualization, manuscript review and editing, and funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was financially supported by the Leonard Wood Institute (LWI: W911NF-07- 2-0062); the National Science Foundation (NSF-CMMI: 1661609; NSF-CMMI: 1932690; NSF-DMR: 2034856); and the Federal Highway Administration (Award no: 693JJ31950021); the Ministry of Education and Science of North Macedonia.

**Data Availability Statement:** The data used in this study are available on request.

**Acknowledgments:** The authors thank Missouri S&T for providing facilities to accomplish the experimental and computational work of this research.

**Conflicts of Interest:** The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

**Mihai Alexandru Niculescu 1, Stefan Ruseti <sup>1</sup> and Mihai Dascalu 2,\***


**Abstract:** Significant progress has been achieved in text generation due to recent developments in neural architectures; nevertheless, this task remains challenging, especially for low-resource languages. This study is centered on developing a model for abstractive summarization in Romanian. A corresponding dataset for summarization is introduced, followed by multiple models based on the Romanian GPT-2, on top of which control tokens were considered to specify characteristics for the generated text, namely: counts of sentences and words, token ratio, and n-gram overlap. These are special tokens defined in the prompt received by the model to indicate traits for the text to be generated. The initial model without any control tokens was assessed using BERTScore (F1 = 73.43%) and ROUGE (ROUGE-L accuracy = 34.67%). Control tokens improved the overall BERTScore to 75.42% using <LexOverlap>, while the model was influenced more by the second token specified in the prompt when performing various combinations of tokens. Six raters performed human evaluations of 45 generated summaries with different models and decoding methods. The generated texts were all grammatically correct and consistent in most cases, while the evaluations were promising in terms of main idea coverage, details, and cohesion. Paraphrasing still requires improvements as the models mostly repeat information from the reference text. In addition, we showcase an exploratory analysis of the generated summaries using one or two specific control tokens.

**Keywords:** RoGPT2; control tokens; summarization; text generation; human evaluation

#### **1. Introduction**

A remarkable development in Natural Language Processing (NLP) towards creating models that understand human languages has been observed in recent years. Text generation is one of the main challenges in the field of NLP, and this task has seen an important development after the introduction of Transformers [1]. The Transformer uses an encoder– decoder architecture, self-attention, and positional encodings to facilitate parallel training. The GPT-2 model developed by OpenAI [2] was the first model with remarkable text generation capabilities. GPT-2 was trained for predicting the next token in a sequence and could easily be adjusted for specific tasks. The follow-up improving the GPT-3 model [3] is more than 10-times larger in terms of the parameters and deduces the task only from the provided prompt. There have been several open-source variations of the model, such as GPT-Neo [4] and GPT-J [5]. Other architectures consider a unified framework to cover text-to-text formats and convert text-based language problems, such as the Text-To-Text Transfer Transformer (T5) [6]. This model can perform zero-shot learning and deduce the task from the context of the prompt received as the input, even if it was not presented in the training stage.

For the Romanian language, there are not many specific resources (i.e., pre-trained models and datasets), although there has been significant progress in recent years. The most notable models for Romanian consider the BERT architecture (e.g., RoBERT [7], BERT-basero [8], Distil-BERT [9]) and the GPT-2 architecture (e.g., RoGPT2 [10]) and were developed in the last 2 years. Romanian has only one available benchmark, namely LiRo [11]. However,

**Citation:** Niculescu, M.A.; Ruseti, S.; Dascalu, M. RoSummary: Control Tokens for Romanian News Summarization. *Algorithms* **2022**, *15*, 472. https://doi.org/10.3390/ a15120472

Academic Editor: Frank Werner

Received: 31 October 2022 Accepted: 6 December 2022 Published: 11 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the models are small compared to their English counterparts, and there are no available datasets for common NLP tasks. Overall, Romanian remains a low-resource language with low international usage (https://www.worlddata.info/languages/romanian.php; last accessed on 20 October 2022), despite recent efforts in terms of datasets and models; as such, we argue for the necessity of our efforts to develop tools tailored to this language.

Text summarization is a task of particular importance in NLP centered on extracting critical information from the text using two approaches. First, extractive summarization involves removing the most-important phrases or sentences that include the main ideas of a text. Second, abstractive summarization considers the generation of a new summary starting from the text. One of the most popular datasets in English used for this task is *CNN/Daily Mail* [12], having a total number of 280,000 examples; the dataset was afterward extended to other languages, including French, German, Spanish, Russian, and Turkish, thus generating the large-scale multilingual corpus *MLSUM* [13]. Another dataset used in studies for abstractive summarization is Extreme Summarization (*X-Sum*) [14] to generate a short, one-sentence summary for each news article; *X-Sum* was derived from BBC news and consists of 220,000 examples. Another dataset is *Webis-TLDR-17 Corpus* [15] with approximately three million examples constructed with the support of the Reddit community. Extractive summarization in Romanian has been previously tackled by Cioaca et al. [16] and Dutulescu et al. [17] with small evaluation datasets. We now introduce the first dataset for Romanian abstractive summarization (https://huggingface.co/datasets/readerbench/ro-text-summarization; last accessed on 20 October 2022).

A wide variety of architectures has been employed for text summarization, including general Transformer-based models [6,18–20] and specific models such as BRIO [21], ProphetNet [22], or PEGASUS [23]. We aim to provide a baseline abstractive summarizer for Romanian built on top of RoGPT2 [10] and to control the characteristics of the generated text. This is an additional step to better imitate human capabilities by considering one or more specifications that improve the summary. As such, we assessed the extent to which text generation is influenced by using control tokens specified in the prompt received by the model to induce specific characteristics of a text. The idea of specifying control tokens directly in the prompt was exploited first in MUSS [24] and CONTROL PREFIXES [25]. The GPT-2 model was also used in combination with BERT [26]; however, to our knowledge, the generation task was not tackled until now in combination with control tokens to manipulate the characteristics of the generated summary.

Following the introduction of various models for text summarization, evaluating the quality of a generated text is a critical challenge, which can be even more difficult than the text generation task itself. Text evaluation is generally performed using synthetic metrics developed for machine translation, such as Bilingual Evaluation Understudy (BLEU) [27], Recall Oriented Understudy for Gisting Evaluation (ROUGE) [28], or Metric for Evaluation for Translation with Explicit Ordering (METEOR) [29]; however, these metrics are limited as they focus on the lexical overlap. Newer metrics based on Transformers, such as BERTScore [30], BARTScore [31], or Bilingual Evaluation Understudy with Representations from Transformers (BLEURT) [32], are much more accurate compared to the classical metrics. Still, they require more resources (i.e., pre-trained models and higher computing power) and have longer processing times. Besides comparing automated similarity metrics, Celikyilmaz et al. [33] argued that a human evaluation is the gold standard for evaluating a Natural Language Generation (NLG) task; nevertheless, it is the most expensive and cumbersome to accomplish.

Thus, our research objective is threefold: create a dataset for summarization in Romanian, train a model that generates coherent texts, and introduce control tokens to manipulate the output easily. Following this objective, our main contributions are the following:

• Publish a clean version of the dataset for Romanian text summarization (https:// huggingface.co/datasets/readerbench/AlephNews; last accessed 20 October 2022).


#### **2. Method**

This section presents the dataset created for the summarization task, the model architecture, the training method with the control tokens, as well as the methods employed to evaluate the generated text.

#### *2.1. Corpus*

The dataset for the summarization task was constructed by crawling all articles from the AlephNews website (https://alephnews.ro/; last accessed on 20 October 2022) until July 2022. The site presents a section with the news summary as bullet points with sentences representing the main ideas for most articles. This peculiarity of the site enabled the automatic creation of a reasonably qualitative dataset for abstractive summarization. The news articles that did not have a summary or were too short were eliminated by imposing a minimum limit set of 20 characters. This resulted in 42,862 collected news articles. The news and summary texts were cleaned using several heuristics: these were the repair of diacritics, the elimination of special characters, the elimination of emoticons, and fixing punctuation (if it has more points, if it has no punctuation mark, a period is added at the end of the sentence), eliminating words such as "UPDATE", "REPORT", "AUDIO", etc. The dataset was split into 3 partitions (i.e., train, dev, and test) with proportions of 90%–5%–5%. Articles with a maximum of 715 tokens based on the RoGPT2 tokenizer were selected for the test partition; out of 724 tokens, 9 were reserved for the control tokens. After analyzing the dataset and based on the limitations regarding the sequence length of a context, the maximum size was set to 724 tokens. In the case of entries from the training and dev partitions having the combined length of the article and the summary greater than 724, the article content was divided into a maximum of 3 distinct fragments, which had the last sentences removed; this was applied to approximately 10% of the entries to increase the number of examples and to keep the beginning of the news, which contains key information to be considered. We chose not to apply this augmentation technique for the entries in the test partition, as this would have altered the content of the original texts and would have generated multiple artificial test entries; moreover, we limited the text to the first 715 tokens so that control tokens could also be added when running various configurations. The total number of examples for each partition was: 47,525 for training, 132 for validation, and 2143 for testing.

#### *2.2. RoGPT2 Model for Summarization*

The model was trained to predict the next token using the previous sequence, similar to the RoGPT2 [10] training for the Romanian language. The model architecture consists of several decoder layers of architecture Transformers [1], as presented in Figure 1. There are 3 versions of the model, each with a different number of decoder layers: 12 layers were used for the base version, 24 layers for the medium version, and 36 layers for the large version.

Control tokens were used to indicate the task and the characteristics of the generated text, which are presented in the following subsections. This assumes that the model maximizes the probability of a subword depending on the context and the previously generated subwords:

$$P(w\_{1\dots m}) = \prod\_{i=1}^{m} P(w\_m | w\_{1\prime}, w\_{2\prime}, w\_{3\prime}, \dots, w\_{m-1}) \tag{1}$$

Cross-entropy was the loss function for the supervised learning task:

$$L\_{CE} = -\sum\_{i=1}^{n} t\_i \log(p\_i) \tag{2}$$

where *ti* is the label and *pi* is the probability of the *i*th class, or more specifically, a class is considered the *id* of a token.

**Figure 1.** RoGPT2 architecture.

Due to a large number of parameters, the model was trained on TPU v3-8. The batch size was limited to fit into memory 724 tokens per entry. The Adam optimizer [34], the ReduceLROnPlateau (https://keras.io/api/callbacks/reduce\_lr\_on\_plateau/; last accessed on 20 October 2022) and EarlyStopping (https://keras.io/api/callbacks/early\_stopping/; last accessed on 20 October 2022) callbacks were used.

Three decoder methods for text generation were considered to choose the next token depending on the tokens generated up to that point and the probability distribution over the vocabulary.

**Greedy search**: This strategy is based on choosing a local optimum, in this case the token with the highest probability, which converges to a local maximum. First, the probability distribution is generated, and then, the next token is selected by choosing the highest probability. The procedure continues until the desired length is achieved or the token indicating the end is found. An advantage of this method is that it is efficient and intuitive, but it does not guarantee finding a global optimum for the generated sequence; this can lead to the non-exploration of some branches with a higher probability.

**Beam search**: Beam search [35] partially solves the maximum global problem by keeping the best beam width sequences with a higher total probability. Multiple contents are generated for each step, and the sequence with the highest probability is chosen at each step. The advantage of this method is that it obtains better results for relatively small beam widths, but it requires more memory for a larger beam width or longer sequences, whereas the text does not vary much, being quite monotone. Beam search also does not guarantee finding the global optimum. Beam search works quite well when it can

approximate the generated text's length, but has issues when the corresponding length varies greatly. Holtzman et al. [36] argued that people do not choose the phrase with the highest probability as the element of unpredictability is important.

**Top-p (nucleus) sampling**: This method involves choosing the smallest subset of words with a probability equal to *p*. Based on the new probability distribution, a new token is chosen. The advantage of this method is that it achieves results quite close to human ones and does not require many resources. The disadvantage is that *p* is fixed and not dynamic.

#### *2.3. Control Tokens*

Starting from previous studies presented in the Introduction and related to the specifics of the summarization task, we chose to specify a set of 4 control tokens representative of various characteristics of the text, namely:


The first 3 control tokens are purely quantitative and reflect different use-case scenarios: a summary containing at most a specific number of sentences, a summary having an imposed number of words, or a compression ratio to be used globally. The last control token ensures a lower or higher degree of lexical overlap between the two texts.

The prompt for the summarization task was the following:

$$\text{Text} : \{attribute\} \text{ Summary} : \{summary\} < \text{endof text} \mid > \tag{3}$$

The model learns that, after the control token **"Summary:"**, it must generate the summary of the text preceding that token. Control tokens are specified before the token that indicates the input (i.e., marked by the **Text** token), while the token specific to the task is placed after the end. The prompt used for an item from the dataset used for training is the following:

$$\text{FeatureToken} : \{value\} \text{Text} : \{particle\} \text{ Summary} : \{summary\} < \text{endof} \text{texture} \tag{4}$$

#### where FeatureToken is <*NoSentences*>, <*NoWords*>, <*RatioTokens*>, or <*LexOverlap*>.

Following the initial experimentation, we noticed that the model learns best when subsequent entries have the same input text, but with different values for the control tokens and a different text to be generated; this refers to the extraction of fragments from the original summary and their use as the output. This variation is reflected in the text to be generated and was used for the <NoSentences>, <NoWords>, and <RatioToken> control tokens. The generation of multiple variations was applied if the summary text had more than 3 sentences; thus, incremental examples were generated by adding sentences and calculating the value for the control token each time. An example for a summary comprising 4 sentences *s*1, *s*2, *s*3, *s*<sup>4</sup> and <NoWords> would consider two entries in the training dataset: the first item would consist of the first 3 sentences and the corresponding <NoWords> for this first shorter summary and a second item where the *s*<sup>4</sup> sentence would be added and <NoWords> is set at the global count of words from the summary.

Besides training the summarization model with each control token individually, we also considered combinations of 2 control tokens, namely: <NoWords>-<NoSentences>, <RatioTokens>-<NoSentences>, and <LexOverlap>-<NoWords>. The combination <NoWords>- <NoSentences> was chosen because it reflects the most straightforward manner to manually enforce the length of the summary by an end user (i.e., specify an approximate number of words and the number of sentences that the generated summary should have). <RatioTokens> presents the same idea as <NoWords>, only that it is much more difficult to learn by the model as it represents the ratio between the length of the news and that of the

summary. The combination of <LexOverlap>-<NoWords> is interesting because it forces the model to generate a text with an approximate number of words. Still, the generated text must not match the one received by the model. <NoWords> indicates how many words the summary should have, while <LexOverlap> restricts the percentage of combinations of words that are present in the news and generated text by the model; a small value for <LexOverlap> indicates that the model must reformulate an idea from the news, whereas a large value makes the model extract the most important phrases within a word limit.

#### *2.4. Evaluation Metrics*

Our evaluations considered both automated and human evaluations of the generated summaries. We wanted the evaluation of the model to be a sustainable one; for this, the three evaluation metric methods used were: Recall Oriented Understudy for Gisting Evaluation (ROUGE) [28] as a classic metric, which is used in the majority of research in the field of abstract summarization, BERTScore [30], a metric that uses a pre-trained model to understand the generated text and the reference to provide a better comparison, and human evaluation. To evaluate the characteristics of the control token, the following metrics were used: Mean Absolute Error (MAE) and Mean-Squared Error (MSE) for <NoSentences> and <NoWords>, and the Pearson and Spearman coefficients were used for <RatioTokens> and <LexOverlap>.

#### 2.4.1. BERTScore

Metrics based on Transformers [1], such as BERTScore [30], have been introduced to better capture the similarity between texts. BERTScore shows how good and realistic a text generated by a model is at the semantic level (i.e., the metric considers the meaning of the text by computing the cosine similarity between token embeddings from the generated sentences versus the tokens in the given sentences as a reference). The token embeddings are the numerical representations of subwords obtained using the BERT [37] tokenizer. The precision, recall, and F1 scores are computed based on the scalar product between the embeddings in the two texts. Precision refers to the generated text and is calculated as the average value for the largest scalar product between the embeddings of the generated sentence and those of the reference sentence; in contrast, recall is centered on the reference text and is computed in an equivalent manner while considering the embedding of the reference versus the generated sentence embeddings. The original paper showed good correlations to human evaluations. Even if BERTScore is more accurate when compared to classical machine translation metrics, which account for the overlap between words using n-grams or synonyms (e.g., BLEU, ROUGE), the metric requires a language model for the targeted language. We used the implementation offered by HuggingFace (https: //huggingface.co/spaces/evaluate-metric/bertscore; last accessed on 20 October 2022), which considers mBERT [37] for the Romanian language. The performance metrics are computed as follows:

$$P\_{BERT} = \frac{1}{\left|\hat{\mathbf{x}}\right|} \sum\_{\hat{\mathbf{x}}\_{j} \in \hat{\mathbf{x}}} \max\_{\boldsymbol{\mathcal{X}}\_{l} \in \boldsymbol{\mathcal{X}}} (\boldsymbol{\mathcal{X}}\_{l}^{T} \hat{\mathbf{x}}\_{j}) \tag{5}$$

$$R\_{BERT} = \frac{1}{|\mathcal{X}|} \sum\_{\mathcal{X}\_l \in \mathcal{X}} \max\_{\hat{\mathcal{X}}\_j \in \hat{\mathcal{X}}} (\mathbf{x}\_l^T \hat{\mathbf{x}}\_j) \tag{6}$$

$$F\_{BERT} = \mathcal{Z} \* \frac{P\_{BERT} \* R\_{BERT}}{P\_{BERT} + R\_{BERT}}\tag{7}$$

where:


#### 2.4.2. Human Evaluation

Human evaluation is considered the gold standard in measuring the quality of generated text [33], but it is costly and difficult to achieve. For human evaluation, the most-used

method is the one by which a form is created, and the respondents are asked to evaluate the generated text. In our case, correspondents were asked to assess the generated text from the point of view of five metrics: main idea (i.e., the main idea of the article is present within the summary), details (i.e., the key information is found in the generated text for irrelevant ideas), cohesion (i.e., phrases and ideas have a logic), wording/paraphrasing (i.e., the text is not the same as that of the news and the model-made changes), and language beyond the source text (i.e., there is a varied range of lexical and syntactic structures). The scores ranged from 1 to 4, the best being 4. The summary scoring rubric is based on the studies of Taylor [38] and Westley, Culatta, Lawrence, and Hall-Kenyon [39]. The raters were asked to evaluate 5 examples chosen randomly from the texts generated using the 3 decoding methods, and for 3 variants of the model; in total, 45 questions were included in the form. The Intraclass Correlation Coefficient (ICC3) [40] was calculated for each configuration and model-version-decoding method to measure the consistency of the evaluations. The form was sent to people collaborating with our research laboratory to obtain the relevant results, primarily due to the complexity of the 5 metrics used.

#### *2.5. Experimental Setup*

The Adam [34] optimizer started from a learning rate equal to 1 <sup>×</sup> <sup>10</sup>−<sup>4</sup> and was reduced to 4 <sup>×</sup> <sup>10</sup>−<sup>6</sup> using the callback ReduceLROnPlateau, for patience equal to 2 and a factor of 1/e. The patience parameter was set to 1 for combinations of control tokens due to the task's complexity and the dataset's size; the training was more aggressive, modifying the learning rate if there were no improvements after an epoch. The training was stopped if no improvements were noticed after 3 epochs for baseline summarization or 4 epochs for the control token. A context size equal to 724 was considered, and the batch size varied for each model version: 128 for the base, 24 for the medium, and 16 for the large models. Three decoding methods were used for text generation: greedy, beam-search, and top-p sampling. The experiments were performed on TPU v3.8 for training, while the NVIDIA Tesla A100 and NVIDIA Tesla P100 were used for text generation and evaluation. The model received prompts that contained the summary token and those that specified the characteristics of the text to be generated.

#### **3. Results**

This section presents the results obtained by the models for the summarization task and the experiments for control tokens. In most experiments, the same configuration was used for text generation. After training, the following generation strategies were used: greedy, beam search with a width equal to four, and top p sampling (with top k = 25 and *p* = 0.94). In addition, we introduced an exploratory analysis to highlight the benefits of using control tokens when generating summaries with various specificities.

#### *3.1. News Summary*

This experiment aimed to generate summaries for news articles without any particular characteristics. The model knows that it must generate text after the control token <Summary>. The evaluation of the model was performed using the metrics: ROUGE [28] score (the F1-score average was calculated for ROUGE-1, ROUGE-2, ROUGE-L) and BERTScore [30]. The results are available in Table 1. The medium version using beam search achieved the best scores (74.34% for BERTScore *F*<sup>1</sup> and 34.67% for ROUGE-L *F*1), surpassing the large version with beam search by 0.1% for BERTScore.


#### *3.2. Human Evaluations*

The next experiment was to evaluate the model trained on the AlephNews dataset to generate summaries on the DigiNews test dataset introduced by Niculescu et al. [10]. As the DigiNews dataset does not have a summary for a news story, a human evaluation was performed to assess the quality of the generated text. The form was completed by six raters, and the scores from Table 2 consider the average for the five evaluated texts from each combination.

**Table 2.** Results for human evaluation (bold marks the best results).


#### *3.3. Control Tokens*

For the following experiments, control tokens were used individually or in combination to indicate the characteristics of the generated text, in addition to the one indicating the task. For the more complex scenarios, we wanted to observe if the model learns a combination of several control tokens that were not reproduced in the training stage and if the order of tokes from the prompt matters. BERTScore [30] was used holistically as a means to compare different combinations; the Mean Absolute Error (MAE) and Mean-Squared Error (MSE) were considered for <NoSentences> and <NoWords>, whereas the Pearson and Spearman coefficients were used for <RatioTokens> and <LexOverlap>. Table 3 shows the best BERTScores obtained for each control token separately; the beam search and top-p sampling decoding methods were selected because they obtained the most revealing results. Detailed results for each control token are presented in Tables A1–A4. The best score was 75.42% with the <LexOverlap> control token.

Subsequently, we explored the extent to which the model succeeded in learning combinations of control tokens, having only examples for each one in the training stage. The following combinations of control tokens were chosen in line with the argumentation from the Method Section: <RatioTokens>-<NoSentences>, <NoWords>-<NoSentences>, <NoWords>-<LexOverlap>. We decided to focus only on the condensed results that

consider BERTScore for the medium and large versions using beam search and the top-p sample as the decoding methods (see Table 4). Tables A5–A10 present the full results of the previous combinations. The best score was achieved by the combination of <NoWords>- <LexOverlap> using the medium version with beam search (F1 = 74.95%).


**Table 3.** BERTScore [30] for control tokens taken individually (bold marks the best results).

**Table 4.** BERTScore [30] for complex control tokens (bold marks the best results).


#### *3.4. Exploratory Analysis of Generated Summaries Using Control Token*

Besides assessing the performance of various configurations, our aim was also to explore the extent to which control tokens change the generated texts. As such, we generated summaries for the same news by varying the values for the control token(s), while assessing the impact on the quality of the generated summary and its resemblance to the original text. Given the previous best results, medium and large RoGPT models with beam search configurations were chosen for this experiment. We experimented with an individual control token (i.e., <NoSenentences>) that is easily explainable, as well as with a more complex scenario that forces a compression/expansion of the generated text (i.e., a combination used of <NoSentences>-<NoWords>). The range for <NoSenentences> was 2–5; there were extremely few training samples with only 1 sentence within the summary, and our model is incapable of generating such over-condensed summaries. The <NoWords> control token considered five values −50%, −25%, 0%, +25%, +50%, which signified a compression of −50% words from the reference summary, all the way to an expansion with +50% additional words. A sample of 100 news articles from the test partition was chosen, and BERTScore *F*<sup>1</sup> was calculated for each value of the control token(s); the corresponding results are presented in Figures 2 and 3. An example of text generation when only the <NoSentence> was varied is presented in Appendix C.1, whereas Appendix C.2 showcases the example for <NoSentence>-<NoWords>.

**Figure 3.** BERTScore for NoSentences-NoWords.

#### **4. Discussion**

The baseline model managed to achieve good results (see Table 1) for the summarization task, and the best results for ROUGE-L (34.67%) and BERTScore (74.34%) were obtained by the medium version with the beam search decoding method. It is worth noting that the best results were obtained with the beam search decoding method regardless of the considered model. Poorer results obtained by the large version are arguable, given the relatively small size of the dataset.

Results from the human evaluations (see Table 2) were also consistent, based on the obtained ICC3 score. The best score for the main idea was obtained by the large model with greedy decoding (3.73/4), followed by the medium version with beam search with a score of 3.43/4, thus arguing that the models managed to identify the main idea from the news. In terms of the provided details, the best score (3.36/4) was achieved by the medium model with beam search decoding (see Appendix A.1 for an example). The model managed to have coherent sentences with an elevated language; this was also shown in the paper that introduced RoGPT2 [10]. The large model obtained the highest overall score in terms of cohesion with greedy decoding (3.27/4), followed by the medium model with beam search with a score of 3.13/4; this lower score is justifiable since the contents of some randomly sampled news articles were challenging to summarize (see Appendix A.2 for a horoscope example). Paraphrasing was the main problem of the texts generated by the model since the models mostly repeated information from the reference text. Nevertheless, the results obtained by the model are impressive, considering that the human-evaluated news articles originated from a dataset on which the model was not trained.

The summaries using control tokens obtained better scores than the baseline summarization task (see Table 3). The small differences indicate that a winning configuration cannot be determined with certainty as the largest difference was up to 2%; however, we observed that beam search consistently obtained the best results. Despite being the most complex token, the largest improvement in BERTScore *F*<sup>1</sup> with 1.08% was obtained with the <LexOverlap> control token. The worst results for controlling text characteristics were obtained by <NoSentences>, whereas <RatioTokens> obtained a lower BERTScore than <NoWords> because it is a token more difficult to understand by the model.

Lower performance for combinations of tokens was expected because the dataset is relatively small and the task difficulty was higher. Then, comparing the performance of the models on each control token individually, we noticed that a higher performance was obtained for the second token specified in the prompt; this suggests that the model was influenced more by the second token from the prompt. The combination <NoWords>- <LexOverlap> obtained the best overall results, highlighting the benefits of complementarity between control tokens. Overall, the best decoding method was beam search.

When considering the exploratory analysis, the best results when varying the number of sentences were obtained for values of 2 and 3; this was expected as most summaries had 3 sentences. The example from Appendix C.1 highlights that the model seems to only extract sentences from the original text without paraphrasing. With <NoSentences> set at three, the model copied a central sentence and reiterated it based on a repetition present in the source text (i.e., the news article contained "Roxana Ispas este fondatoarea brandului Ronna Swimwear." and "Roxana Ispas, fondatoare Ronna Swimwear", which confused the model). Furthermore, there was a problem when setting the control token to 5 as the model failed to generate five sentences; nevertheless, it generated considerably longer sentences than the previous use case with only four sentences.

The best results for the experiment with the <NoSentences>-<NoWords> combination were obtained when the number of sentences was equal to 2 or 3 and the number of words was equal to +25% or +50% more words than the original summary. The best BERTScore was obtained for the medium version with <NoSentence> = 3 and <NoWords> = +25%, followed by a similar scenario with <NoSentences> = 2 and the same value for <NoWords>. As exemplified in Appendix C.2, the model takes into account the number of words that must be generated, i.e., there is a proportional relationship between the length of the summary and the value of the control token. Furthermore, a higher compression rate given by a smaller number of words forced the model to generate one less sentence than specified.

#### **5. Conclusions**

This paper introduced a novel dataset, a baseline model, and control tokens for manipulating text characteristics when summarizing texts in Romanian; all previous resources have been publicly released. Our model obtained overall good results (F1-scores above

0.73 in most configurations), indicating that the models learn even from limited samples. The generated texts were grammatically correct and primarily consistent, as highlighted by the human evaluation. Using control tokens led to the improvement of BERTScore [30]. The best results were obtained when using beam search as a decoding strategy, while medium and large models shared similar performances; however, the medium models are more suitable given the size of the dataset. Higher scores were obtained when only one control token was used. In contrast, the model emphasized the second token specified in the prompt when generating the text in complex scenarios.

In terms of future work, we aim to increase the quality and size of our dataset with examples originating from other news websites targeting specific fields in contrast to AlephNews, which is a generalist news site. This will ensure a higher diversity of text characteristics and introduce the possibility of new control tokens specific to the new categories. Moreover, we plan to register the summarization task in the LiRo benchmark [11] to ensure the development of robust natural-language-understanding systems for Romanian.

**Author Contributions:** Conceptualization, M.D. and S.R.; methodology, M.D., S.R. and M.A.N.; software, M.A.N. and S.R.; validation, M.A.N., S.R. and M.D.; formal analysis, S.R.; investigation, M.A.N. and S.R.; resources, M.A.N.; data curation, M.A.N.; writing—original draft preparation, M.A.N.; writing—review and editing, M.D. and S.R.; visualization, M.A.N.; supervision, M.D.; project administration, M.D.; funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the "Innovative Solution for Optimizing User Productivity through Multi-Modal Monitoring of Activity and Profiles – OPTIMIZE"/"Solut,ie Inovativ ˘a de Optimizare a Productivit ˘at,ii Utilizatorilor prin Monitorizarea Multi-Modala a Activit ˘at,ii s,i a Profilelor— OPTIMIZE" project, Contract Number 366/390042/27.09.2021, MySMIS code: 121491.

**Institutional Review Board Statement:** The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Faculty of Automated Control and Computers, University Politehnica of Bucharest.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The dataset for Romanian text summarization is freely available on HuggingFace (https://huggingface.co/datasets/readerbench/AlephNews; last accessed 20 October 2022); the models built on top of RoGPT-2 are available on HuggingFace (https://huggingface.co/ readerbench/RoSummary-large; last accessed 20 October 2022); the corresponding code is released on GitHub (https://github.com/readerbench/RoSummary; last accessed 20 October 2022).

**Acknowledgments:** Special thanks to the TensorFlow Research Cloud (https://www.tensorflow.org/ tfrc; last accessed on 20 October 2022) programs for providing us the Tensor Processing Unit (TPU) (https://cloud.google.com/tpu/; last accessed on 20 October 2022) that was used to train the models.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **Appendix A**

#### *Appendix A.1*

**News**: "Zona Cheile Râ¸snoavei, sufocat ˘a de gunoaie Un telespectator Digi24 a trimis o sesizare la autorit˘a¸tile locale, dar reprezenta¸tii prim˘ariei ridic˘a neputincio¸si din umeri: au încercat s ˘a cure¸te, dar romii din apropiere fac mizerie din nou, peste noapte. La câ¸tiva metri de drumul care duce spre Cheile Râ¸snoavei, unul din cele mai frumoase locuri de vizitat din jude¸t, frumuse¸tea peisajului este umbrit˘a de gunoaiele aruncate pe o raz˘a de mai bine de o sut˘a de metri. Profitând de faptul c˘a zona este marcat˘a ca fiind poligon de trageri, oamenii au aruncat aici nestingheri¸ti saci întregi de gunoi, f ˘ar ˘a s˘a se gândeasc˘a la turi¸stii care trec pe aici sau la felul în care nep ˘asarea lor stric ˘a imaginea ora¸sului. Cunoscut pentru cetatea ¸t˘ar˘aneasc˘a ¸si pentru peisajele de poveste de pe Cheile Râ¸snoavei, ora¸sul Râ¸snov atrage la fiecare sfâr¸sit de s ˘apt ˘amân ˘a mii de turi¸sti. Cei care ajung îns ˘a la periferie r˘amân dezam˘agi¸ti: mormane întregi de sticle, hârtii, gunoi menajer ¸si chiar c ˘ar ˘amizi s-au adunat aici. Iar în ultima perioad ˘a, localnicii au început s ˘a duc ˘a acolo chiar ¸sI gunoaiele din grajduri. Ovidiu Dr ˘agunoiu locuie¸ste în Bra¸sov ¸si este un împ ˘atimit al drume¸tiilor. Zilele trecute a vrut s ˘a mearg ˘a cu câinele la plimbare pe drumul spre Cheile Râ¸snoavei. A filmat mizeria ¸si a anun¸tat autorit ˘a¸tile. Lua¸ti la întreb ˘ari, reprezenta¸tii prim ˘ariei au g ˘asit rapid o scuz˘a. "Am f˘acut ordine acolo, am cur˘a¸tat, am nivelat zona ca într-o s˘apt˘amân˘a zona s˘a fie la loc. Vin cu c˘aru¸tele, cu sacii, cu fel ¸sI fel de modalit˘a¸ti ei pun acolo din curte parc˘a inten¸tionat parc˘a sfid˘ator s˘a arate zona din nou murdar˘a ¸si plin˘a de mizerie", a declarat Marian Ilie, responsabil cu problemele de mediu la Prim˘aria Râ¸snov. Locuitorii ora¸sului sunt si ei afecta¸ti de mormanele de gunoaie. Nu de pu¸tine ori s-au îmboln˘avit animalele duse la p˘ascut în zon ˘a. Autorit˘a¸tie din Râ¸snov spun c ˘a toate gunoaiele vor fi adunate, iar cei care nu le depoziteaz˘a corespunz˘ator vor fi amenda¸ti. Deocamdat˘a îns˘a, nimeni nu a primit nicio sanc¸tiune."

**Summary**: Zona Cheile Râ¸snoavei, sufocat˘a de gunoaie Un telespectator Digi24 a trimis o sesizare la autorit˘a¸tile locale, dar reprezenta¸tii prim˘ariei ridic˘a neputincio¸si din umeri: au încercat s ˘a cure¸te, dar romii din apropiere fac mizerie din nou, peste noapte. La câ¸tiva metri de drumul care duce spre Cheile Râ¸snoavei, unul din cele mai frumoase locuri de vizitat din jude¸t, frumuse¸tea peisajului este umbrit ˘a de gunoaiele aruncate pe o raz ˘a de mai bine de o sut˘a de metri. La câ¸tiva metri de drumul care duce spre Cheile Râ¸snoavei, unul din cele mai frumoase locuri de vizitat din jude¸t, frumuse¸tea peisajului este umbrit ˘a de gunoaiele aruncate pe o raz ˘a de mai bine de o sut ˘a de metri.

#### *Appendix A.2*

**News**: "HOROSCOP. Leii pleac ˘a într-o vacant,a mult-a ˘ s,teptat ˘a PES, TI Atent,ie la cheltuieli. Nu cedat,i tentat,iilor. BERBEC Se anunt,a o zi dinamic˘ ˘ a. Sunt favorizate activit˘at,ile intelectuale. TAUR V ˘a putet,i întâlni cu persoane care v ˘a îndeamn ˘a la comunicare. GEMENI Suntet,i nemult,umit de ritmul în care se desf ˘as, oar ˘a un proiect. Lucrurile vor reveni la normal. RAC Este posibil s˘a primit,i nis,te bani din colabor˘ari mai vechi. LEU Putet,i pleca într-o c˘al˘atorie pe care o as,teptat,i de mult timp. FECIOARA Petrece ˘ t,i o sear˘a special˘a cu cei dragi. Primit,i vizita copiilor. BALANT, A Sunte ˘ t,i foarte solicitat,i la birou. Avet,i o serie de responsabilit˘at,i. SCORPION Foarte implicat,i în relat,ia de iubire, Scorpionii petrec o sear˘a special˘a al˘aturi de partener. SAGETATOR Nu cump˘ ˘ arat,i tot ce v˘a iese în cale. Mai mult de jum˘atate dintre achizit,ii se vor dovedi inutile. CAPRICORN În aceste zile vet,i vedea rezultate concrete ale muncii dumneavoastr˘a s,i vet,i avea ocazia s˘a v˘a exprimat,i ideile. VARS ˘ ATOR A ˘ t,i putea primi o veste important ˘a, care v ˘a ret,ine la birou. Nu neglijat,i totus,i, familia."

**Summary**: Berbecii pleac˘a într-o vacant,a mult-a ˘ s,teptat˘a PES, TI Atent,ie la cheltuieli. Nu cedat,i tentat,iilor.

#### **Appendix B. Results for Control Tokens**

*Appendix B.1. Simple Scenarios*

**Table A1.** Results for NoSentences (bold marks the best results).


**Table A2.** Results for NoWords (bold marks the best results).


**Table A3.** Results for RatioTokens (bold marks the best results).


**Table A4.** Results for LexOverlap (bold marks the best results).


#### *Appendix B.2. Complex Scenarios*


**Table A5.** Results for RatioTokens-NoSentences (bold marks the best results).

**Table A6.** Results for NoSentences-RatioTokens (bold marks the best results).


**Table A7.** Results for NoWords-NoSentences (bold marks the best results).


**Table A8.** Results for NoSentences-NoWords (bold marks the best results).



**Table A9.** Results for LexOverlap-NoWords (bold marks the best results).

**Table A10.** Results for NoWords-LexOverlap (bold marks the best results).


#### **Appendix C. Summaries Generated While Varying Values for Control Token(s)**

*Appendix C.1. Summaries Generated with <NoSentences>*

**News**: "O românc˘a a vândut costume de baie de lux în valoare de 2 milioane de euro în 2020. Cine a fost put,in creativ anul trecut a f˘acut bani frumos,i. Roxana Ispas este fondatoarea brandului Ronna Swimwear.A lucrat mai mult,i ani în domeniul juridic, apoi a avut un business în domeniul consultant,ei, iar acum s-a reprofilat. Face costume de baie de lux. A profitat de faptul c ˘a multe românce au mers anul trecut în vacant,e exotice s,i a f˘acut haine de plaj˘a, sutiene s,i chilot,i. S, i nu îi merge r˘au deloc. Are comenzi din toat˘a lumea s,i cliente fidele mai ales în Germania, Frant,a sau Dubai. În 2020 Roxana a avut încas˘ari de 2 milioane de euro.Roxana Ispas, fondatoare Ronna Swimwear: Profilul clientei–cea ¨ care vrea s˘a ar˘ate ca un milion dollar baby. As, a am vândut bine s,i în Madrid s,i la Monte Carlo s,i la Paris s,i Dubai. Online am vândut prima oar˘a cel mai bine, dar anul trecut a fost jumate-jumate chiar. În 2021 sper s˘a se dea drumul s˘a c˘al˘atorim, s˘a fie mult mai ok". S, i deja st˘am foarte ok la capitolul vacant,e exotice, România se afl˘a pe locul 5 în topul t,arilor care au trimis cei mai mul ˘ t,i turis,ti în Maldive. Doar în primele 10 zile din ianuarie 2021 aproape 1500 de turis,ti români au preferat aceast˘a destinat,ie. I-am dep ˘as,it pe nemt,i, francezi, elvet,ieni s,i chiar pe americani."

**Original**: "Pandemia le-a adus celor îndr ˘aznet,i vânz ˘ari urias, e în 2020. Roxana vinde costume de baie de lux în Frant,a, Germania sau Dubai. Roxana a avut vânz ˘ari de 2 milioane de euro în 2020. România a trimis mai mult,i turis,ti în Maldive decât SUA, Germania sau Elvet,ia."

**Generate for <NoSentences>=2**: "Roxana Ispas este fondatoarea brandului Ronna Swimwear. Roxana a vândut costume de baie de lux în valoare de 2 milioane de euro în 2020."

**Generate for <NoSentences>=3**: "Roxana Ispas este fondatoarea brandului Ronna Swimwear. Roxana a vândut costume de baie de lux în valoare de 2 milioane de euro în 2020. Roxana Ispas este fondatoarea brandului Ronna Swimwear."

**Generate for <NoSentences>=4**: "Roxana Ispas este fondatoarea brandului Ronna Swimwear. Roxana a vândut anul trecut costume de baie de lux în valoare de 2 milioane de euro. Roxana Ispas: "Profilul clientei–cea care vrea s˘a ar˘ate ca un milion dollar baby. As, a am vândut bine s,i în Madrid s,i la Monte Carlo s,i la Paris"."

**Generate for <NoSentences>=5**: "Roxana Ispas este fondatoarea brandului Ronna Swimwear. Roxana a lucrat mai mult,i ani în domeniul juridic, apoi a avut un business în domeniul consultant,ei. Roxana a vândut costume de baie de lux în valoare de 2 milioane de euro în 2020. Roxana Ispas: "Profilul clientei – cea care vrea s ˘a ar ˘ate ca un milion dollar baby. As, a am vândut bine s,i în Madrid s,i la Monte Carlo s,i la Paris"."

#### *Appendix C.2. Summaries Generated with the Combination <NoSentences>-<NoWords>*

**News**: "Studen¸tii de la UMF Craiova sunt voluntari în campania de vaccinare. Când se termin˘a campania de vaccinare non-stop. Student,ii de la Universitatea de Medicin˘a s,i Farmacie din Craiova ajut ˘a la campania de vaccinare. Spun c˘a doar as, a se pot întoarce în s ˘alile de curs.Ioana Deaconu e student ˘a în anul II. Vrea s˘a devin˘a chirurg, dar pân˘a acum n-a reus,it deloc s ˘a fac ˘a practic ˘a în spital, din cauza restrict,iilor anti pandemice. Acum, s-a înscris voluntar în campania de vaccinare non-stop. Împreun˘a cu prietena ei, Amalia, îi ajut˘a pe cei care vin la vaccinare s˘a completeze formularele de triaj.Printre primii care au venit s˘a se vaccineze au fost mai mult,i str˘aini. Ana este student ˘a în Spania, dar va locui o perioad ˘a în România, unde a venit s ˘a lucreze ca voluntar la o organizat,ie de lupt ˘a împotriva traficului de persoane. Spune c˘a, atunci când a auzit c˘a se poate vaccina f ˘ar ˘a programare, nu a stat pe gânduri.Peste 100 de student,i de la Medicin ˘a particip ˘a, timp de o s ˘apt ˘amân ˘a, la campania de vaccinare non-stop de la stadionul Ion Oblemenco.Tudor B ˘als, anu, prodecanul Facult˘at,ii de Medicin ˘a, spune c˘a student,ii nu sunt obligat,i s˘a fie vaccinat,i pentru a putea participa la cursuri, dar riscurile scad atunci când sunt imunzat,i cât mai mult,i.Campania de vaccinare non-stop din Craiova se va încheia pe 17 mai. Centrul are 10 fluxuri de vaccinare ziua s,i dou ˘a noaptea."

**Original**: "Aproximativ 100 de student,i de la UMF Craiova sunt voluntari în campania de vaccinare. La Craiova, pân˘a în data de 17 mai, te pot,i vaccina non-stop, la stadionul Ion Oblemenco. S, i str ˘ainii se pot vaccina, f ˘ar ˘a programare, la centrul deschis la stadionul Ion Oblemenco."

**Generate for <NoSentences>=3 and <NoWords>-50%**: "Student,ii de la UMF Craiova ajut˘a la campania de vaccinare non-stop. Ioana Deaconu este student˘a în Spania, dar va locui o perioad ˘a în România."

**Generate for <NoSentences>=3 and <NoWords>-25%**: "Student,ii de la UMF Craiova sunt voluntari în campania de vaccinare non-stop. Ioana Deaconu este student ˘a în Spania, dar va locui o perioad ˘a în România, unde a venit s ˘a lucreze ca voluntar la o organizat,ie de lupt ˘a împotriva traficului de persoane."

**Generate for <NoSentences>=3 and <NoWords> + 0%**: "Student,ii de la UMF Craiova sunt voluntari în campania de vaccinare non-stop. Ioana Deaconu, student˘a în anul II, spune c˘a doar as, a se pot întoarce în s˘alile de curs. Peste 100 de student,i de la Medicin˘a particip˘a, timp de o s˘apt˘amân˘a, la campania de vaccinare non-stop de la stadionul Ion Oblemenco."

**Generate for <NoSentences>=3 and <NoWords> + 25%**: "Student,ii de la UMF Craiova sunt voluntari în campania de vaccinare non-stop. Ioana Deaconu, student˘a în anul II, spune c˘a doar as, a se pot întoarce în s˘alile de curs. Peste 100 de student,i de la Medicin˘a particip˘a, timp de o s˘apt˘amân˘a, la campania de vaccinare non-stop de la stadionul Ion Oblemenco. Campania de vaccinare non-stop din Craiova se va încheia pe 17 mai."

**Generate for <NoSentences>=3 and <NoWords> + 50%**: "Student,ii de la Universitatea de Medicin ˘a s,i Farmacie din Craiova ajut ˘a la campania de vaccinare non-stop. Ioana Deaconu e student˘a în anul II. Vrea s˘a devin˘a chirurg, dar pân˘a acum n-a reus,it deloc s˘a fac˘a practic˘a în spital, din cauza restrict,iilor anti pandemice. Peste 100 de student,i de la Medicin˘a particip˘a, timp de o s˘apt˘amân˘a, la campania de vaccinare non-stop de la stadionul Ion Oblemenco."

#### **References**

