*Article* **Prediction of Sorption Processes Using the Deep Learning Methods (Long Short-Term Memory)**

#### **Dorian Skrobek 1,\*, Jaroslaw Krzywanski 1, Marcin Sosnowski 1, Anna Kulakowska 1, Anna Zylka 1, Karolina Grabowska 1, Katarzyna Ciesielska <sup>1</sup> and Wojciech Nowak <sup>2</sup>**


Received: 17 November 2020; Accepted: 12 December 2020; Published: 14 December 2020

**Abstract:** The paper introduces the artificial intelligence (AI) approach for modeling fluidized adsorption beds. The idea of fluidized bed application allows a significantly increased heat transfer coefficient between adsorption bed and the surface of a heat exchanger, improving the performance of adsorption cooling and desalination systems. The Long Short-Term Memory (LSTM) network algorithm was used, classified as a deep learning method, to predict the vapor mass quantity in the adsorption bed. The research used an LSTM network with two hidden layers. The network used in the study is composed of seven inputs (absolute pressures in the adsorption chamber and evaporator, the temperatures in adsorption chamber and evaporator, relative pressure, the temperatures in the center of adsorption bed and 25 mm from the bed center, the kind of the solids mixture, the percentage value of the addition) and one output (mass of the sorption bed). The paper presents numerical research concerning mass prediction with the algorithm mentioned above for three sorbents in fixed ad fluidized beds. The results obtained by the developed algorithm of the LSTM network and the experimental tests are in good agreement of the matching the results above 0.95.

**Keywords:** sorption processes; deep learning; neural networks; Long Short-Term Memory (LSTM)

#### **1. Introduction**

The process when chemical compounds are bound to a solid phase is generally known as sorption. Adsorption occurs when adsorption of a substance takes place at the surface, while absorption occurs when the substance is absorbed in the entire volume of the solid phase. These processes can apply to volatile substances and particles dissolved in a liquid medium associated with the solid phase particles. Molecules and atoms can attach to surfaces in two ways. In the process of physical adsorption process between the adsorbate and the adsorbent, there are van der Waals interactions. In the process of chemical adsorption, molecules or atoms join with the surface to form chemical bonds.

The adsorption chillers [1] (Figure 1) are quiet, non-corrosive, reliable, environmentally friendly, and economical in operation appliances. They are consist of an evaporator, a condenser, separating valves, and a sorption bed. In some solutions, more than one sorption bed may be used. Adsorption chillers are capable of utilizing low-grade waste heat and renewable heat (e.g., solar energy) to produce cool and/or desalinated water. The adsorption chiller with silica gel-water, powered by a waste heat source, has been successfully commercialized in Japan [2]. Waste heat in the industry is rarely used and is currently usually discharged into the environment. The article [2] presents a three-stage adsorption chiller and computer program to simulate the cycle to predict its operation. Most often, in scientific

studies, sorption processes are predicted using the nonlinear autoregressive network with exogenous inputs (NARX) [2–4] or feed forward neural network (FFNN) [3].

**Figure 1.** Scheme of adsorption chiller.

Neural networks (NNs) are used to predict various dependencies, among others, to predict the traffic volume [5], the efficiency and generator power of a supercritical coal-fired power plant [6,7], the hydrogen concentration in the syngas [8], in order to optimize a heat exchanger and adsorption chillers [8–10]. They come in many variants, feed-forward NN [7,11] fuzzy NN [10,12], recurrent NN (RNN) [13], and hybrid NN [14]. Recurrent Neural Networks (RNNs) by their chain-like structure and internal memory with loops are widely used. Recently, the deep learning model, such as RNNs, has been increasingly used [15]. The disadvantage of RNN is the vanishing gradient problem, which prevents them from modeling time series with long-term relationships such as wind speed and wind direction [16]. There have been several attempts to overcome the difficulty of training RNNs over the years. These difficulties were successfully addressed by the Long Short-Term Memory networks (LSTMs) [17], a type of RNN capable of learning long-term dependencies.

Long Short-Term Memory (LSTM) as a deep learning method can process sequential data [15] and is applied in many real-world problems, such as image captioning [18], music composition [19], predicting for COVID-19 [20], speech recognition [21], and human trajectory prediction in crowded places [22]. The papers [23,24] show algorithms by which there is the time at the input of neural networks and the data entered into the network are given in chronological order. In the presented article, no time variable was given at the input of the network. In the last few years, LSTM has gained popularity due to its ability to model long-term dependencies [25,26]. The long-term dependencies are typically learned from chronologically arranged input data, considering only forward dependencies, while dependencies

learned from randomly fed inputs data have never been explored. NARX, FFNN, and LSTM are neural networks mainly dedicated to modeling time series cases. In this study, LSTM was used, which turned out to be one of the best and easy to interpret neural networks suitable for time-series problems.

The architecture of the LSTM-based model sought to be capable of describing the dynamics of sorption processes. Since most of the newly proposed LSTM-based prediction models are of one hidden-layer shallow architecture [27–29], their performance is poorer than those with several hidden layers models [30,31].

All time-series sets of data ought to be utilized during prediction by an LSTM model. Usually, the model's dataset is chronologically arranged from time epoch *t*−1 to *t* [32]. However, this may lead to filtering out, or ineffectively passing through the network structure of useful information. Therefore, it may be a good idea to consider randomizing data. Another reason for the sampling of data into our study is the periodicity of sorption cycles. Analyzing time-series data periodicity, especially in recurring patterns, will enhance the predictive performance from both forward and backward temporal perspectives [33]. However, based on our literature review, the dataset fed to LSTM is chronologically arranged, and the network itself uses forward and/or backward data prediction dependencies. The use of chronological data in the LSTM network may cause the network to start learning training data and incorrectly predict data, which is why it was decided that data would be entered into the network randomly within the research.

Since the literature review has already reported the advantages of the LSTM approach over other networks such as FFNN or NARX [34–36], the purpose of the paper is to use the LSTM network in the novel field of application, i.e., for adsorption processes in innovative fluidized adsorption beds. This work presents numerical research results related to predicting the adsorption bed mass using the Long Short-Term Memory. Therefore, the considered issue corresponds to the innovative concept of replacing the fixed adsorption beds in conventional adsorption chillers with fluidized beds described in detail in [37,38].

Adsorption chillers are promising appliances allowing to use of low-grade thermal energy [39–41], including renewable sources of energy such as solar heat, wastewater, underground resources, and waste heat, instead of high valued energy sources, e.g., electricity and fossil fuels-driven appliances [42–44].

The idea of fluidized bed application [45–47] significantly increases the heat transfer coefficient between the adsorption bed and the surface of a heat exchanger and the bed conductance of fluidized bed adsorption chillers, improving the performance of adsorption cooling and desalination systems [48–50]. Moreover, the set of experimental data used is unique because the advanced test stand was utilized, which allows for the fluidized state implementation into the adsorption bed under lowered pressure conditions, even up to 1500 Pa.

The present work is the first in the literature, dealing with the deep learning method, such as LSTM for modeling fluidized and fixed adsorption beds to the best of our knowledge. The data used in the deep learning network was recorded within the experimental research related to sorption processes. In the LSTM, the input dataset has been given in random order rather than in chronological order, and the network itself uses forward dependencies. This paper deals with an innovative approach consisting of a fluidized bed application. Such an idea allows for improving heat and mass transfer processes, with helps increase adsorption chiller' performance.

The second chapter contains a description of the test stand and research equipment, experimental research results, and the discussion on the algorithms used during the numerical research. The third section depicts the LSTM network hyperparameters and the structure of the LSTM network inputs and outputs as well as the results and their discussion. The work is finalized with a conclusion and proposal for further research.

#### **2. Problem Formulation and Solving**

#### *2.1. Experimental Test*

The data needed to predict the adsorption bed's mass comes from the previously conducted experimental studies carried out on the innovative test stand.

The test stand (Figure 2) consists of an evaporator, adsorption chamber, vacuum pump, three valves (V1, V2, V3), and sensors: *P*1—absolute pressure sensor in the adsorption chamber, *P*2—absolute pressure sensor in the evaporator, *P*3—relative pressure sensor, *T*1—temperature sensor in the adsorption chamber, *T*2—temperature sensor in the evaporator, *T*3—temperature sensor in the adsorption bed (bed center), *T*4—temperature sensor in the adsorption bed (25 mm from the bed center).

**Figure 2.** The testing stand and diagram of the test stand.

The first stage of work on the stand is to obtain the saturation pressure in the evaporator (*P*2) in the temperature *T*2. After obtaining the appropriate pressure for the evaporator (*P*2) and the chamber (*P*1), the water begins to boil, and the steam is released through the open valve (V3) to the sorption bed, where the adsorption process takes place. The changes taking place in the bed are monitored using temperature sensors *T*3, *T*4, and the relative pressure sensor *P*<sup>3</sup> and mass sensors measuring the sorption bed's weight. In the test process, assumptions were made for the valve opening/closing time (V3), according to Table 1. The table also shows the initial test conditions. Valves V1 and V2 are used to maintain the appropriate pressure difference in the evaporator and the adsorption chamber to keep the adsorbent's fluidized and fixed beds.

**Table 1.** Initial research conditions.


<sup>1</sup> the opening time of valve V3, <sup>2</sup> stabilizing time of conditions in the chamber (valve V3 state—closed), <sup>3</sup> pressure in the chamber, <sup>4</sup> pressure in the evaporator, <sup>5</sup> silica gel, <sup>6</sup> aluminum, <sup>7</sup> fluidized state, <sup>8</sup> stationary state.

Commercial silica gel from Fuji Silysia Chemical Ltd. (Greenville, USA) was employed for the research. Using the Analysette 3 Spartan shaker (FRITSCH GmbH, Idar-Oberstein, Germany), the material was separated to obtain granulation 250–300 μm. In the present study, aluminum (Al, granulation 45–450 μm) particles were used as an additive to improve the thermophysical properties of a silica gel (SG) adsorption bed, due to high thermal conductivity [51].

The exemplary results of the experiment are shown in Figure 3. They concern the tests of the 85% SG + 15%Al mixture for the stationary state. In this test and the other test variants (Table 1), the valve V3 was open for 10 s. The figure below shows ten consecutive opening and closing cycles of the valve V3.

**Figure 3.** An exemplary result of the experimental test for the mixture 85%SG + 15%Al stationary state for ten consecutive cycles of valve V3 opening and closing.

Based on Table 1, experimental studies were performed, and the data from these experiments were used as inputs (*P*1, *P*2, *P*3, *T*3, *T*4, type of mixture, the percentage value of the additive) and outputs (sorption bed mass) of the LSTM network. Exemplary data that is entered into the LSTM network is shown in Figure 3; six such studies were performed as shown in Table 1. The test results from the six experiments were fed into the LSTM network as outlined above.

#### *2.2. Recurrent Neural Network (RNN)*

A recurrent Neural Network is a deep learning model consisting of neurons. It is mainly useful when considering sequence data, as each neuron can use its internal memory to store information about the previous input. This action resembles a loop (Figure 4) in which the output of a neuron at one specific stage is provided to the next neuron as an input. The RNN considers two inputs; the first is the current input, and the second is the previous computation [32]. RNNs contain an input layer, hidden layers, and; an output layer as with other neural networks.

All recurrent neural networks have the form of a chain of repeating modules of the neural network. In standard RNNs, this repeating module has a straightforward structure, such as a single *tanh* (hyperbolic tangent) layer (Figure 5).

**Figure 4.** An unrolled Recurrent Neural Network.

**Figure 5.** The repeating module in a standard RNN contains a single layer.

#### *2.3. Long Short-Term Memory (LSTM)*

Long Short-Term Memory networks (LSTMs) are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter and Schmidhuber in 1997 [17] and then refined and popularized by other researchers [52–54]. LSTMs have a chain-like structure show in Figure 6; the repeating module has a structure shown in Figure 7 (where *tanh*-hyperbolic tangent).

**Figure 6.** An unrolled LSTM.

**Figure 7.** Graphical representation of the LSTM cell.

In order to implement the LSTM recurrent network, first, the LSTM cell should be implemented. The LSTM cell has three gates and two internal states, which should be determined to calculate the current output and current cell state. We distinguish the following LSTM cell gateways:


In addition to these three gates, the LSTM cell contains a cell update usually activated by the tanh function.

Three variables fall into each LSTM cell:


Calculations for the LSTM cell in its individual layers can be described as follows.

• the forget gate ft (sigmoid layer):

$$f\_t = \sigma(\mathcal{W}\_f \circ [h\_{t-1}, \mathbf{x}\_t] + b\_f) \tag{1}$$

• the input gate it (sigmoid layer):

$$i\_t = \sigma(\mathcal{W}\_i \circ [h\_{t-1}, \mathbf{x}\_t] + b\_i) \tag{2}$$

*Energies* **2020**, *13*, 6601

• the cell state Ct:

$$
\hat{\mathfrak{a}}\_t = \tanh(\mathcal{W}\_\mathfrak{c} \circ [\mathfrak{h}\_{t-1}, \mathfrak{x}\_t] + \mathfrak{b}\_\mathfrak{c}) \tag{3}
$$

$$\mathbf{C}\_{t} = f\_{t} \mathbf{C}\_{t-1} + i\_{t} \hat{\mathbf{c}}\_{t} \tag{4}$$

• the output gate ot (sigmoid layer):

$$\rho\_t = \sigma(\mathcal{W}\_o \circ [h\_{t-1}, \mathbf{x}\_t] + b\_o) \tag{5}$$

where: *c*ˆ*t*—the cell update; *Wf*, *Wi*, *Wc*, *Wo*—matrices of weights; *bf*, *bi*, *bc*, *bo*—bias vector.

The bias vector is specified as a numeric array. They are learnable parameters. When training the network, the biases vector in the first iteration is refilled with zeros.

The matrices of weights are specified as a numeric array, they are parameters that can be learned. The initial value of the weights in the algorithm is computed with the Glorot initializer [55] (also known as Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and variance 2/(*numIn* + *numOut*) where *numIn*—number of inputs in *i*-th layer, *numOut*—number of outputs *i*-th layer.

The final stage of the calculations in the LSTM cell is defining the current output ht. The current output is calculated with the multiplication operation between the output gate layer and tanh layer of the current cell state *Ct*:

$$h\_t = o\_t \cdot \tanh(\mathcal{C}\_t) \tag{6}$$

The current output *ht* passes through the network as the previous state for the next LSTM cell or as the input for the neural network output layer.

The structure of the LSTM network is shown in Figure 8. The same network settings were adopted in all studies. Hyperparameters (Table 2) were selected on the basis of a series of studies not presented in this article. From the LSTM network tests carried out earlier, the hyperparameters' values were selected based on the best fit. Every 30 epochs (the epoch is the full passage of the training algorithm through the entire training set) the learning coefficient changed its value according to the equation: *ilr* = 0.2 \* *lr* (*lr*—current value of the learning coefficient).

**Figure 8.** Diagram of the LSTM network.

The network input layer comprises the following inputs: *P*1—absolute pressure in the adsorption chamber, *P*2—absolute pressure in the evaporator, *P*3—relative pressure, *T*3—temperature in the adsorption bed (bed center), *T*4—the temperature in the adsorption bed (25 mm from the bed center), type of the mixture, and the percentage value of the addition. The mass of the sorption bed constitutes the output of the neural network.


**Table 2.** The values of the hyperparameters for the LSTM.

#### **3. Results of Numerical Calculations**

By adopting the assumptions, formulations, and experimental research results presented in the previous chapters, the LSTM network algorithm and a computer program were developed, which enabled predicting the sorption bed's mass during the sorption process. The experimental test results for the first ten valve V3 opening cycles (6 tests, see Table 1) have been normalized to a range of 0 to 100 and divided into three parts. The training data is presented to the network during the training stage. Validation data is exploited to improve learning and possibly to stop training. Finally, the test data do not affect training and validation, and thus, provide an independent measurement of network performance after training. These data were randomized without duplication as follows:

	- training data—60% of all data,
	- validation data—20% of all data,
	- test data—20% of all data,
	- training data—70% of all data,
	- validation data—15% of all data,
	- test data—15% of all data,
	- training data—80% of all data,
	- validation data—10% of all data,
	- test data—10% of all data.

Figures 9–11 show test data as the trend line (linear fit) for all studies and the 95% prediction interval of LSTM network results.

The first analysis of the prediction of mass in the sorption bed using the LSTM network concerned the division of data in the ratio of 60-20-20, the results of this study are shown in Figure 9 and Table 3.


**Table 3.** Coefficient of determination *R*<sup>2</sup> for the Linear Fit (60-20-20).

**Figure 9.** Comparison of the experimental results with the numerical results predicted by the LSTM (60-20-20).

Figure 9 shows the LSTM network operation results compared to the values obtained during the experiment. The LSTM network predicts the worst results for pure silica gel (100% SG) in a fluidized state.

Table 3 shows the fit for all data and individual mixture. The coefficient of determination for all data is 0.9515. The LSTM network predicts the worst values for pure silica gel (100% SG). In the case of fluidization, the coefficient of determination is 0.8934, and for the fixed bed, it is 0.9218, which may be the reason for the low repeatability of the cycles during the experiment. The network achieves the best match for the mixture 95% SG + 5%Al, where the coefficient of determination for fluidized and fixed bed was equal to 0.989, and 0.973, respectively.

The second analysis of mass prediction in the sorption bed using the LSTM network concerned the data division in the ratio of 70-15-15. The results of this study are presented in Figure 10 and Table 4.

**Table 4.** Coefficient of determination *R*<sup>2</sup> for the Linear Fit (70-15-15).


Figure 10 shows the result of the LSTM network in comparison with the values obtained during the experiment. As in the previous study, the LSTM network predicts the worst results for pure silica gel (100% SG) in a fluidized state.

All datasets achieved a fit of 0.9507, and Table 4 also shows the coefficient of determination of the individual mixture for the fit function. The coefficient of determination in this study is lower than in the previous study. The LSTM network predicts the worst values for pure silica gel (100% SG) in fluidized bed conditions. The coefficient of determination for fixed and fluidized bed was 0.9250 and 0.8404, respectively.

The model's best accuracy, in this case, was achieved for the fluidized bed of 85% SG + 15%Al mixture with *R*<sup>2</sup> equal to 0.98.

**Figure 10.** Comparison of the experimental results with the numerical results predicted by the LSTM (70-15-15).

The third analysis of the prediction of mass in the sorption bed using the LSTM network concerned the distribution of data in the ratio of 80-10-10. The results of this study are presented in Figure 11 and Table 5.


**Table 5.** Coefficient of determination *R*<sup>2</sup> for the Linear Fit (80-10-10).

**Figure 11.** Comparison of the experimental results with the numerical results predicted by the LSTM (80-10-10).

Figure 11 shows the results of the LSTM in comparison with the values obtained during the experiment. As in previous studies, the LSTM network predicts the worst results for pure silica gel (100% SG), fluidized state. In this case, the network best predicts the results of the experimental research.

The coefficient of determination for all is equal to 0.9554. The accuracy of the developed model is the best of the two previous ones. Only a slight decrease in *R*<sup>2</sup> be seen for the 95% SG + 5%Al(S), and 95% SG + 15%Al(S) blends. The LSTM network prediction is still worst for the fluidized bed of pure silica gel (100% SG) with *R*<sup>2</sup> equal to 0.867. However, the best prediction was achieved for the fluidized bed of the mixture 95% SG + 5%Al with *R*<sup>2</sup> = 0.9915.

#### **4. Conclusions**

This paper deals with an innovative concept of a fluidized bed instead of a fixed adsorption beds application, currently employed in conventional adsorption chillers. The model, developed in the study, correctly predicts the vapor mass adsorbed in the adsorption chillers.

In this work, the Long Short-Term Memory networks, classified as a deep learning method, were also used to predict the sorption bed's mass. The LSTM network is one of the particular kinds of recursive networks that are capable of learning long-term dependencies.

The solution to predicting the results was based on the most accurate mapping of the experimental values by the LSTM network. In the mathematical model, all network inputs were normalized to the range <0–100> due to the different units of parameters used in the study.

The analysis was performed by splitting the input data set into three parts (training data, validation data, and test data), in three variants: 60-20-20, 75-15-15, 80-20-20. The LSTM network, while increasing the amount of data used for training, better reproduced the experimental results. By increasing the training data make it possible to increase the accuracy of LSTM. The division of data into training data, validation data, and test data in deep learning networks are problematic because increasing one of the above values reduces the other two. A better solution seems to be to increase the amount of data entered into the network, but in this case, it was impossible due to the number of sorption cycles that the adopted mixtures could perform. In order to increase the amount of data, the mass of the mixture should be increased, as well as the initial conditions under which the tests were performed, e.g., the absolute pressure in the adsorption chamber and evaporator.

The developed model using the LSTM network and the high accuracy of the obtained numerical results confirm that the LSTM network is suitable for predicting sorption processes.

The LSTM network predicted the worst experimental test for pure silica gel (100% SG) in the fluidized conditions where the coefficient of determination did not exceed the threshold of 0.9 since these experimental tests are the least repeatable. The test results for 100% SG are more difficult to predict because no additive in the mixture would stabilize the sorption processes during the experimental test, so the sorption cycles for 100% SG are not very repeatable. Due to its high thermal conductivity, aluminum's addition to the silica gel stabilizes the mixture, improving the sorption bed's thermophysical properties. The LSTM network achieved the best accuracy for the mixture of 95% silica gel with 5% aluminum of addition in the fluidized conditions. For data splitting of 80-10-10 the highest coefficient of determination was equal to 0.9915.

Future research is planned to conduct comparative studies of several deep learning methods.

**Author Contributions:** The contribution of co-authors in creating article is: conceptualization, D.S., J.K.; methodology, D.S., J.K., software, D.S., J.K.; validation, D.S., J.K., M.S., W.N.; formal analysis, J.K., M.S.; investigation, A.K., A.Z., K.G., K.C.; resources, A.K., A.Z., K.G., D.S., K.C., M.S., J.K.; data curation, A.K., A.Z., K.G., D.S., M.S., J.K.; writing—original draft preparation, D.S., J.K.; writing—review and editing, D.S., M.S., J.K.; visualization, D.S., J.K., M.S.; supervision, J.K.; project administration, J.K., K.G.; funding acquisition, J.K., M.S., K.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by project No. 2018/29/B/ST8/00442, supported by the National Science Centre.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**


#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
