*Data Processing and Analysis*

The quality of the original electric bus data on the platform was not flawless. The acquired data often appeared to have outliers and missing values due to the influence of electromagnetic radiation and the unreliability of the circuit system. As a result, preprocessing of the original data was needed. During the process, data interpolation, outlier removal and data segmentation were performed.

The two main types of missing data are missing multiple rows and missing single features. For the first case, the data exhibit a discontinuity in specific intervals. Missing data were interpolated using the mean value [12]. For the latter case, the Lagrangian interpolation method was used to interpolate the data. Considering the outliers, the first quartile and the third quartile of the data were calculated by constructing a box plot. Values exceeding the upper and lower edges of the box plot were defined as outliers.

To facilitate the extraction of data features, the data were divided into short segments according to the state of charge in the dataset. The electric bus data transmission process and data preprocessing results are shown in Figure 1.

**Figure 1.** Electric bus operation data transmission and preprocessing results. (**a**) Driving route, (**b**) electric vehicle big data cloud platform and (**c**) preprocessing results of battery SOC.

After preprocessing the original data, the data were reconstructed according to the timestamp, vehicle velocity, etc. The vehicle driving season, departure time and velocityrelated features were obtained. The statistics of the vehicle energy consumption under different operating conditions are shown in Figure 2. In the figure, the heading of the ordinate is the energy consumption. For brevity, it is abbreviated as EC. For electric vehicles, the operating temperature is related to the use of air conditioning, energy efficiency, etc. Figure 2a shows the relationship between energy consumption and operating temperature. It can be seen that the relationship can be approximated by a parabolic function. The minimum energy consumption occurs at a temperature of approximately 25 degrees. This phenomenon coincides with the fact that air conditioning and high energy transfer efficiency are rarely used [6]. Figure 2b shows the relationship between the variance of velocity and energy consumption. It can be seen that there is an approximately linear relationship between them. The higher the variance, the higher the energy consumption. As shown in Figure 2c, there is no obvious linear relationship between average speed and energy consumption, and this part of the analysis is described in detail later. Figure 2d shows the relationship between departure time and energy consumption. Departure times are related to road congestion and vehicle passenger weight. Specifically, vehicles consume more energy between 6 a.m. and 8 p.m. Vehicle energy consumption values differ by more than 20%.

**Figure 2.** Statistical results of vehicle energy consumption after reconstruction. The relationships of (**a**) temperature, (**b**) vehicle velocity variance, (**c**) average velocity and (**d**) departure time with vehicle energy consumption.

#### **3. Vehicle Energy Consumption Modeling**

According to the powertrain dynamics of the vehicle, the energy consumption of the vehicle is mainly influenced by air resistance, rolling drag and kinetic energy changes during the driving process. Additionally, the energy consumption of the air-conditioning system in the electric bus should be taken into account [13]. The influence of these factors can be modeled with physics-based functions. However, the influences of driving habits and environmental factors are somewhat random and cannot be directly described by physical modeling. Therefore, the fluctuating energy consumption resulting from these factors is more suitable to be modeled by data-driven approaches. Data-driven modeling methods such as decision trees, support vector machines and neural networks are commonly used in many fields. The approaches have a good ability to solve complex, non-linear problems [14]. Based on the analysis, a fusion of the physical modeling and the data-driven modeling is proposed in this paper to achieve an accurate estimation of vehicle energy consumption.

#### *3.1. The Physical Energy Consumption Model*

As the energy source of the electric vehicle is attributed to charging stations, charging energy is regarded as the original energy of the vehicle in this paper. The charging efficiency of the battery pack is represented by *ηch*. Due to the existence of the battery internal resistance, the value of the efficiency *ηch* is less than 1. Additionally, the parameter fluctuates with the operating temperature. During the vehicle driving process, the discharging efficiency of the battery pack is also influenced by the battery internal resistance. Depending on the powertrain dynamics of the vehicle, the chemical energy of the battery pack is converted into electrical energy, which is further converted into mechanical energy to drive the vehicle. Meanwhile, vehicle energy is also consumed by the air-conditioning system. Therefore, vehicle energy consumption *E* can be expressed by:

$$E = \frac{1}{\eta\_{ch}\eta\_{mut}\eta\_{bat}}(E\_{roll} + E\_{air} + (1 - \eta\_{rc})E\_{lra}) + \frac{(1 - \eta\_{rc})}{\eta\_{ch}\eta\_{bat}}E\_{ac} \tag{1}$$

As shown in Equation (1), the energy consumption consists of four main components: energy consumption from rolling drag *Eroll*, energy consumption from air resistance *Eair*, braking consumption *Ebra* and energy consumption from air conditioning *Eac*. Energy transmission is also accompanied by motor efficiency *ηmot*. Energy recovery efficiency *ηre* shows that the change in kinetic energy during braking will reverse-charge the vehicle. Assuming that the energy recovery coefficient is *ηre*, the kinetic energy consumption lost during braking should be (1 − *ηre*)*Ebra*. Battery discharge efficiency is *ηbat*.

The energy consumption from rolling drag *Eroll* is influenced by the vehicle mass, velocity and other factors, and the equation is expressed as:

$$E\_{roll}(i) = mgfv(i)t(i)\tag{2}$$

where *m* is the vehicle mass, *g* is the gravitational acceleration, *f* is the rolling drag coefficient, *v*(*i*) is the speed at that time, and *t*(*i*) is the sampling interval.

Considering the practicability of the energy consumption model, the number of model inputs should be as small as possible. Therefore, the parameters in Equation (1) need to be simplified and approximated. Herein, the velocity *v*(*i*) is approximated as the average velocity, which is simplified as:

$$E\_{roll} = mgf \overline{v} t\_{total} \tag{3}$$

where *ttotal* is the total time of travel, and *v* is the average velocity.

Energy consumption from air resistance is influenced by the vehicle velocity and windproof area of the vehicle. The energy consumption can be expressed by:

$$E\_{air} = \rho \mathbb{C} A \overline{\upsilon}^3 t\_{total} \tag{4}$$

where *ρ* is the air density, *C* is the air resistance coefficient, and *A* is the windproof area of the vehicle. However, according to the literature [8], there is a negative correlation between vehicle speed and energy consumption when the vehicle speed is lower than 45 km/h. With the increase in vehicle speed, the energy consumption should decrease slightly, which is inconsistent with Equation (4). As the average velocity of the electric bus used in this paper is very low, the energy consumed by air resistance was ignored in the energy consumption modeling process.

For the kinetic energy consumption of the vehicle, as the initial and end velocities of the vehicle in a driving cycle are both zero, it can be concluded that the deceleration kinetic energy and acceleration kinetic energy are roughly equal. As a result, an energy efficiency coefficient was added to the kinetic energy consumption to characterize the energy recovery performance during the driving cycle. Vehicle kinetic energy consumption can be expressed by Equation (5). In practical application scenarios, the velocity at each time point is unknown. For simplification, the variance of velocity is correlated with the change in kinetic energy, so the variance of velocity is used to replace the change in kinetic energy.

$$E\_{\rm bra} = \sum\_{i=1}^{cnd} 0.5m\overline{v}(v(i+1) - v(i)) \approx 0.5m\text{var}(v) \tag{5}$$

Due to the existence of a large passenger space, the energy consumption of air conditioning should be accounted for. According to the literature [15], the energy consumption of air conditioning during driving is directly proportional to the square of the temperature difference inside and outside the vehicle. Therefore, the energy consumption can be expressed as:

$$E\_{\rm ac} = c(T - 25)^2 \tag{6}$$

where *c* is the air-conditioning coefficient, and *T* is the temperature.

Substituting Equations (3), (5) and (6) into Equation (1), the physical model can be obtained:

$$\begin{cases} E = \beta\_0 F\_0 + \beta\_1 F\_1 + \beta\_2 F\_2 \\ \quad \beta\_0 = \frac{mgft}{\eta\_{ch}\eta\_{mut}\eta\_{ult}} \\ \quad \beta\_1 = \frac{0.5 m(1 - \eta\_{re})}{\eta\_{ch}\eta\_{mut}\eta\_{ult}} \\ \quad \beta\_2 = \frac{c}{\eta\_{ch}\eta\_{mut}\eta\_{ult}} \end{cases} \tag{7}$$

where *F*<sup>0</sup> = *ttotal n* ∑ *i*=1 *v*, *F*<sup>1</sup> = var(*v*), and *F*<sup>2</sup> = (*T* − 25) 2 .

In the physical energy consumption model, the parameters related to the energy transmission efficiency are influenced by environmental factors. However, the model can be simplified by considering all energy efficiencies as fixed values. *β*0, *β*1, *β*<sup>2</sup> are constants and can be obtained using the least-squares fitting method based on statistical vehicle data [4].

#### *3.2. The Data-Driven Energy Consumption Model*

#### 3.2.1. Analysis of Influencing Factors

The factors that cause energy consumption fluctuations can be summarized into three aspects, including driving habits, environmental factors and vehicle performance [16]. In terms of driving habits, vehicle velocity, acceleration and deceleration conditions can cause fluctuations in vehicle energy consumption. The conditions can be quantified accordingly as the average vehicle velocity, vehicle velocity variance and number of accelerator pedal presses. In terms of the environmental factors, temperature, which is related to the energy transfer efficiency and air-conditioning usage, is the main factor causing fluctuations in vehicle energy consumption. In addition, the road conditions can also cause fluctuations in energy consumption, such as whether the departure time is congested and whether the departure date is on the weekend [17]. In terms of vehicle performance, the energy efficiency of the battery storage system during the charging and discharging process can also cause energy consumption fluctuations. The energy efficiency is mainly affected by the internal resistance of the battery system and is directly related to the ambient temperature and battery aging. Based on the analysis, the statistical results of the impact of various energy consumption fluctuation factors on vehicle energy consumption are shown in Figure 3. In the figure, it should be noted that the fitting curves were obtained by fitting experimental data with polynomial functions.

It can be seen in Figure 3a,c,d that there are positive correlations between the velocity variance, acceleration pedal statistical parameter, deceleration pedal statistical parameter and vehicle energy consumption. In Figure 3b, there is no clear correlation between the average velocity and the change in vehicle energy consumption. According to a study in the literature [8], a negative correlation between vehicle speed and energy consumption is found when the vehicle speed is below 45 km/h. Due to the large number of stopping and idling situations during the driving of electric buses, when the speed is low, the energy consumption at low speed is greater than that at high speed. Figure 3e shows that the effect of temperature on vehicle energy consumption is relatively large, and the relationship can be approximated by a quadratic function. In Figure 3f,g, the difference in departure time and departure date affects vehicle energy consumption; however, at the same point in time, vehicle energy consumption fluctuates greatly. In Figure 3h, a positive correlation is shown between the internal resistance of the battery system and the energy consumption of the vehicle. Through further analysis of the internal resistance, it was found that there is a strong correlation between the internal resistance and the battery temperature. During the driving cycle, the battery internal resistance series data have great fluctuations. As a result, the internal resistance is not taken as an input feature.

**Figure 3.** The statistical results of the impact of various energy consumption fluctuation factors on vehicle energy consumption. (**a**–**h**) The relationships between the velocity variance, average velocity, number of accelerator pedal presses, number of deceleration pedal presses, departure time, departure date, ambient temperature, internal resistance of the battery pack and energy consumption, respectively.

Based on the statistical analysis of the fluctuation factors of vehicle energy consumption, the main influencing features of vehicle energy consumption fluctuation are: velocity variance, average velocity, accelerator pedal parameter, deceleration pedal parameter, temperature and battery internal resistance. Considering that the internal resistance of the battery is mainly affected by the ambient temperature, the influencing features, except the internal resistance of the battery, are regarded as input features in the data-driven model.

#### 3.2.2. Principle of CatBoost Modeling

In this paper, the CatBoost modeling approach is used to model fluctuations in vehicle energy consumption. CatBoost is an improvement of the gradient boosting decision tree (GBDT) model [18]. The approach has the ability to improve the estimation accuracy with weak learners. Moreover, it has significant advantages in extracting important features and processing categorical features. In addition, the problem of poor model accuracy and overfitting can be avoided when the dataset is uneven. The main principle of this method is to construct many weak learners for training. The weights of the training samples are adjusted to focus on samples with large estimation errors and train the weak learners in turn. Finally, the weak learners are combined into a stronger learner model [19]. In the following content, the gradient boosting decision tree algorithm is introduced. Then, the optimization strategy of the CatBoost modeling approach is given. On the basis of the modeling approach, the vehicle's fluctuating energy consumption results can be obtained.

1. Gradient boosting decision tree

Gradient boosting decision tree is an iterative decision tree algorithm. The algorithm is composed of multiple decision trees, and the results of all trees are accumulated to obtain the final result [20]. Given a training dataset *<sup>D</sup>* <sup>=</sup> {(**x***i*, *yi*)}*<sup>n</sup> <sup>i</sup>*=1, **x** is the characteristic affecting energy consumption, and *y* is the predicted energy consumption of output. The goal of GBDT is to find a function *F*ˆ(**x**) that minimizes the given loss function *L*(*y*, *F*ˆ(**x**)). *F*ˆ(**x**) is accumulated by a series of decision trees *F*(**x**). Each decision tree *F*(**x**) is optimized as:

$$F\_m(\mathbf{x}) = F\_{m-1}(\mathbf{x}) + \rho\_m h\_m(\mathbf{x}) \tag{8}$$

where *h*(**x**) is the decision tree function. *ρ<sup>m</sup>* is the weight of the *mth* decision tree function *hm*(**x**). The initial value of *F*(**x**) can be obtained by:

$$F\_0(\mathbf{x}) = \operatorname\*{argmin}\_{\boldsymbol{\alpha}} \sum\_{i=1}^{N} L(y\_{i\prime}, \boldsymbol{\alpha}) \tag{9}$$

Subsequently, the optimization process of the model is achieved by minimizing the loss functions:

$$L(\rho\_{m}, h\_{m}(\mathbf{x})) = \operatorname\*{argmin}\_{\rho, h} \sum\_{i=1}^{N} L(y\_i, F\_{m-1}(\mathbf{x}\_i) + \rho h(\mathbf{x}\_i)) \tag{10}$$

The gradient descent method is used to solve the above optimization problems. For each model *<sup>F</sup>*(**x**), a new dataset *<sup>D</sup>* <sup>=</sup> {**x***i*,*rmi*}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> is constructed and trained to obtain *hm*(**x**). *rmi* can be obtained by:

$$r\_{mi} = \left[\frac{\partial L(y\_i, F(\mathbf{x}))}{\partial F(\mathbf{x})}\right]\_{F(\mathbf{x}) = F\_{m-1}(\mathbf{x})} \tag{11}$$

The value of *ρ<sup>m</sup>* is subsequently computed by solving a line search optimization problem. Its training process is shown in Figure 4.

**Figure 4.** The training process of the gradient boosting decision tree modeling approach.

2. The CatBoost modeling approach

CatBoost is a kind of gradient-enhanced decision tree algorithm, which can handle category features well [21]. The variables extracted in this paper have certain category features. Therefore, CatBoost was selected for energy consumption modeling. This method differs from GBDT in the following ways [22]:

(1) CatBoost can process features during training [23]. First, the sample data are randomly sorted to generate multiple groups of random sequences. Then, for each random sequence, the average value of the same sample is calculated. When the sequence is Θ = [*σ*1,..., *σn*] *T <sup>n</sup>* , it can be calculated by:

$$\chi\_{\sigma\_{p,k}} = \frac{\sum\_{j=1}^{p-1} \left[ \mathbf{x}\_{\sigma\_{j,k}} = \mathbf{x}\_{\sigma\_{p,k}} \right] \cdot y\_{\sigma\_j} + \boldsymbol{\beta} \cdot \boldsymbol{P}}{\sum\_{j=1}^{p-1} \left[ \mathbf{x}\_{\sigma\_{j,k}} = \mathbf{x}\_{\sigma\_{p,k}} \right] \cdot y\_{\sigma\_j} + \boldsymbol{\beta}} \tag{12}$$

where *P* is an a priori value. For regression tasks, the prior value is the average value in the label. *β* is the weight of *P*.

(2) Feature combination. The numerical features calculated by Equation (12) may lose some information. Combining features can solve this problem and produce a more effective feature. CatBoost uses a greedy approach to consider feature combinations. The first segmentation does not consider the combination of category features, and the subsequent segmentation considers all feature combinations. CatBoost takes both groups of values after segmentation as category features to participate in the following combination.

In the previous sections, the fluctuation factors of vehicle energy consumption are analyzed. Seven features can be obtained that are related to vehicle energy consumption. Based on a full understanding of the factors and the CatBoost modeling approach, features such as average vehicle velocity, vehicle velocity variance, number of accelerator pedal presses, number of brake pedal presses, departure time, day of the week and temperature, mentioned in Section 3.2.1, are taken as the input features of the CatBoost decision tree model. The statistical range of input vehicle features is a round trip of the vehicle. After model parameter optimization, the data-driven model of energy consumption can be obtained.

#### *3.3. A Fusion of Physical and Data-Driven Models*

After physical and data-driven modeling of the basic energy consumption and fluctuating energy consumption of electric buses, the two parts needed to be fused to obtain a vehicle energy consumption model. In this study, the integrated learning approach in machine learning theory was used for model fusion. The reconstructed electric bus data were used to train the physical energy consumption model. The residual of the basic model was retrained as the training label of the data-driven model [24]. The flow chart of the energy consumption fusion modeling approach is shown in Figure 5.

The modeling approach can be divided into three steps:

(1) Data processing. In this process, the original data are interpolated. For different missing data types, the methods of average interpolation and Lagrange interpolation are adopted. For outliers in the data, the method of constructing quartile positions with a box plot is used to remove them. Then, the data are segmented according to the state of charge. After the specific driving segments are divided, data such as vehicle speed, data acquisition time, temperature, accelerator pedal value and deceleration pedal value can be obtained. The data are further processed for energy consumption modeling.

**Figure 5.** Flow chart of the energy consumption fusion modeling approach.

(2) Modeling and fusion. The original features obtained from the data processing step include vehicle speed, data acquisition time, temperature, accelerator pedal value and deceleration pedal value. These features need to be processed separately and input into the model. For the physical model, the vehicle driving distance, speed variance and the square of the difference between the temperature in the vehicle and the standard temperature are calculated as inputs. For the data-driven model, the departure time of the vehicle, whether it is a weekend, the temperature in the vehicle, the value of acceleration and deceleration pedal, the average speed and the speed variance are extracted and input into the CatBoost model. In engineering applications, many specific data in vehicle operation are unknown. Therefore, the input of the model needs to meet the following conditions: (1) The input parameters of the model can be obtained before the vehicle is driven. (2) The input parameters of the model need to include parameters that reflect the working condition information. This paper simplifies the input parameters according to this criterion and obtains the following input parameters: the mileage of the current route, the average speed, the speed variance, the temperature, the air-conditioning condition, the departure time, the departure day of the week and the average values of the accelerator pedal and the deceleration pedal. These parameters can be planned before driving. The aim of the fusion step of the model is to train the physical model to obtain the preliminary estimation results of energy consumption. Then, the residual of the physical model is retrained as the training label of the data-driven model to minimize the residual. The final energy consumption result is the sum of the results of the two models.

(3) Model evaluation. Two indicators are selected for the verification of model results, namely, the average relative error and the R-squared parameter. The verification is divided into a single vehicle division training set and a test set for verification. In order to test the robustness of the model, different vehicles are selected for verification.

#### **4. Results and Discussions**

#### *4.1. Analysis of the Results of the Physical Model*

The physical model to obtain the basic energy consumption was analyzed. In this work, data provided by the new energy vehicle big data platform were used for model training. The parameters of the physics-based basic energy consumption model of six electric buses were estimated. The results are shown in Table 2. Figure 6 shows the basic energy consumption estimation results of two electric buses (Bus 1 and Bus 2). As a large number of data points can cause the bar chart to be too small, only 20 points in Figure 6a,b were used to draw Figure 6c–f.

**Table 2.** Parameters of the physical model.


**Figure 6.** The basic energy consumption estimation results. (**a**,**b**) The fitting curves of two electric buses. (**c**,**d**) The energy consumption fitting results for each part. (**e**,**f**) The fitting errors.

It can be seen that the vehicle's rolling drag energy consumption coefficient is the largest. This shows that the main energy consumption of the vehicle during driving is consumed by the rolling drag. In the figure, Figure 6a,b compare the fitted values of the model with the real values of vehicle energy consumption. In the figure, the xaxis represents the number of vehicle round trips, and the y-axis represents the energy consumption. The model-fitting results are close to the values in terms of vehicle energy consumption data. Figure 6c,d show the proportion of the energy consumption of each component in the total energy consumption as a histogram. It is obvious that rolling drag and kinetic energy change dominate the energy consumption. Since the use of an air conditioner is closely related to the temperature difference between inside and outside the vehicle, the energy consumption data of the air conditioner fluctuate greatly. The errors of the basic energy consumption model are shown in Figure 6e,f. It can be seen that the average error of the estimation results is 7%. Since the model does not consider the influence of energy consumption fluctuations, there are large errors in the estimation of vehicle energy consumption.

### *4.2. Analysis of the Results of the Fusion Model*

The energy consumption estimation results obtained from the physical energy consumption model only take into account energy consumption in ideal conditions. The model does not account for the influence of driving habits and environmental factors. In this context, the physical model and the data-driven model are fused to estimate vehicle energy consumption. The results of the vehicle energy consumption estimation of the fusion model are shown in Figure 7.

**Figure 7.** Energy consumption estimation results with the fusion model. (**a**–**d**) Bus 1 to Bus 4.

In Figure 7, the blue line represents the vehicle energy consumption obtained using the platform data, which can be considered a reference value for the vehicle energy consumption. The red points are the vehicle energy consumption obtained by the fusion model. It is clear that the estimation results of vehicle energy consumption are able to track changes in the real energy consumption of the vehicle with small estimation errors. The relative error of the fusion model on the Bus 1 dataset is 4.8%. However, not all results performed well. In some cases, the error reaches 15%. When the data were analyzed separately, it can be found that the larger errors occurred mainly during morning peaks and severe weather

periods. Modeling these situations is complex and beyond the scope of this paper. To verify the generalizability capability of the model, two buses (Bus 1 and Bus 2) were selected as training samples, and other vehicle data were used as test data.

The energy consumption estimation results with multi-vehicle data and the fusion model are shown in Figure 8. The results show that the energy consumption estimation errors for multiple vehicles are within 8.1%. The statistics of the estimation results for the energy consumption of the fusion model are shown in Table 3. The average error of the vehicle energy consumption estimation results is 7.5%.

**Figure 8.** Energy consumption estimation results with multi-vehicle data and the fusion model. (**a**–**d**) Bus 1 to Bus 4.


**Table 3.** The statistics of the estimation results for energy consumption of the fusion model.

#### *4.3. Method Comparison and Verification*

To further validate the effectiveness of the fusion model, several other vehicle energy consumption estimation models are introduced for comparison with the model proposed in this paper. The algorithms for comparison include the physical model, CatBoost decision tree model and fusion models with different approaches. In terms of validity assessment, the average relative error and the coefficient of determination are regarded as indicators for evaluation. The coefficient of determination is a correlation index that measures how well

the data trend fits. Herein, the coefficient of determination is obtained using the R-squared method. The assessment indicators can be calculated by:

$$\mathcal{L}\_{\text{Relative}} = \frac{1}{n} (\sum\_{i=1}^{n} \frac{|(y\_i - \mathcal{Y}\_i)|}{y\_i}) \tag{13}$$

$$R^2 = 1 - \frac{MSE(\hat{y}\_i y)}{\text{var}(y)} = 1 - \frac{\sum\_{i} \left(\hat{y}\_i - y\_i\right)^2}{\sum\_{i} \left(y\_i - \overline{y}\_i\right)^2} \tag{14}$$

where *yi* and *y*ˆ*<sup>i</sup>* represent the real value and the estimated value, respectively. *n* is the number of samples.

The results of the different energy consumption estimation models are shown in Table 4. These models were used to calculate the vehicle energy consumption in this paper. It can be seen that the physical model gives the worst energy consumption estimation results. In contrast, the CatBoost decision tree modeling approach has better estimation results. Ultimately, the physical-CatBoost decision tree model gives the best estimation results, with relative errors and coefficients of determination of 6.1% and 0.79, respectively.

**Table 4.** Vehicle energy consumption estimation results with different models.


The complexity of the Physics-CatBoost fusion model was tested. One million pieces of data were processed on a computer with Intel® core™ i5-10400 CPU @ 2.90 GHz running memory of 32 GB (Santa Clara, CA, USA). The data processing time was 6 s, and the model training time was only 0.9 s. It can be seen that the complexity of the model is low.

#### **5. Conclusions**

This research focused on the energy consumption estimation of electric buses based on a physical and data-driven fusion model. In terms of physical modeling, a basic energy consumption model was constructed. Rolling drag, kinetic energy consumption and air-conditioning factors were considered. In terms of data-driven modeling, the main factors affecting the fluctuation of vehicle energy consumption were studied. The input characteristics of the model were simplified so that the input of the model can be built before vehicle driving. A CatBoost decision tree modeling approach was employed to construct the model for estimating fluctuating energy consumption. In the model training process, the idea of integrated learning was utilized to optimize the model in a hierarchical iteration. The results show that the average relative error of the vehicle energy consumption estimation result is 6.1%. The coefficient of determination is 0.79. Compared with other energy consumption modeling methods, the fusion model performs best with the two indicators. The fusion model proposed in this paper has better accuracy and generalization ability than other models. It provides a reference basis for the optimization of the energy consumption of electric buses, vehicle scheduling and the rational layout of charging stations.

Based on the results, most of the points with large errors are concentrated in bad weather. In order to further improve the accuracy of the model, weather factors can be added to the model in the future. In addition, vehicle mass is regarded as a constant value in the driving process, which is also a reason for the model error. Therefore, the establishment of the dynamic estimation of vehicle mass in a follow-up work can improve the accuracy of the model.

**Author Contributions:** Conceptualization, X.L. and T.W.; methodology, X.L.; software, T.W.; validation, T.W. and J.L.; formal analysis, X.L. and T.W.; investigation, T.W. and J.L.; resources, Y.T. and J.T.; data curation, J.T.; writing—original draft preparation, X.L. and T.W.; writing—review and editing, X.L. and J.T.; visualization, T.W.; supervision, Y.T. and J.T.; project administration, J.T.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (No. 52177219) and the Natural Science Foundation of Guangdong Province (2021A1515010525).

**Data Availability Statement:** The National Big Data Alliance of New Energy Vehicles (NDAEV) provided research data. The source of the data is the China National New Energy Vehicle Monitoring and Management Platform.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


**Paolo Maria Congedo 1, Cristina Baglivo 1,\*, Simone Panico 1,2, Domenico Mazzeo 1,3 and Nicoletta Matera <sup>1</sup>**


**Abstract:** Energy storage makes energy continuously available, programmable, and at power levels different from the original intensity. This study investigates the feasibility of compressed-air energy storage (CAES) systems on a small scale. In addition to the CAES systems, there are two TES (thermal energy storage) systems for the recovery of calories and frigories. The micro-CAES + TES system is designed for a single-family residential building equipped with a photovoltaic system with a nominal power of 3 kW. The system is optimized as a potential alternative to battery storage for a typical domestic photovoltaic system. The multi-objective optimization analysis is carried out with the modeFRONTIER software. Once the best configuration of the micro-CAES + TES system is identified, it is compared with electrochemical storage systems, considering costs, durability, and performance. The efficiency of CAES (8.4%) is almost one-tenth of the efficiency of the most efficient batteries on the market (70–90%). Its discharge times are also extremely short. It is shown that the advantages offered by the application of mechanical accumulation on a small scale are mainly related to the exploitation of the thermal waste of the process and the estimated useful life compared to the batteries currently on the market. The studied system proves to be non-competitive compared to batteries because of its minimal efficiency and high cost.

**Keywords:** CAES; TES; small-scale; battery storage; optimization
