Next Article in Journal
Entropy Generation in Thermal Radiative Loading of Structures with Distinct Heaters
Next Article in Special Issue
Discovering Potential Correlations via Hypercontractivity
Previous Article in Journal
Randomness Representation of Turbulence in Canopy Flows Using Kolmogorov Complexity Measures
Previous Article in Special Issue
Survey on Probabilistic Models of Low-Rank Matrix Factorizations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropy Ensemble Filter: A Modified Bootstrap Aggregating (Bagging) Procedure to Improve Efficiency in Ensemble Model Simulation

Department of Civil Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
*
Author to whom correspondence should be addressed.
Entropy 2017, 19(10), 520; https://doi.org/10.3390/e19100520
Submission received: 11 August 2017 / Revised: 12 September 2017 / Accepted: 26 September 2017 / Published: 28 September 2017
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science)

Abstract

:
Over the past two decades, the Bootstrap AGGregatING (bagging) method has been widely used for improving simulation. The computational cost of this method scales with the size of the ensemble, but excessively reducing the ensemble size comes at the cost of reduced predictive performance. The novel procedure proposed in this study is the Entropy Ensemble Filter (EEF), which uses the most informative training data sets in the ensemble rather than all ensemble members created by the bagging method. The results of this study indicate efficiency of the proposed method in application to synthetic data simulation on a sinusoidal signal, a sawtooth signal, and a composite signal. The EEF method can reduce the computational time of simulation by around 50% on average while maintaining predictive performance at the same level of the conventional method, where all of the ensemble models are used for simulation. The analysis of the error gradient (root mean square error of ensemble averages) shows that using the 40% most informative ensemble members of the set initially defined by the user appears to be most effective.

1. Introduction

Machine learning is one of the key components of computational intelligence and its main objective is to use computational methods to become more accurate in predicting outcomes without being explicitly programmed. Machine learning has a wide spectrum of applications in different science disciplines [1,2,3,4,5,6,7,8,9]. Advanced computational methods, including artificial neural networks (ANN), process input data in the context of previous training history on a defined sample database to produce relevant output [7]. To avoid negative effects of over-fitting, an ensemble of models is sometimes used in prediction [10]. In machine learning jargon, an ensemble of models is often referred to as a committee [5]. Bagging (abbreviated from Bootstrap AGGregatING) [11] developed from the idea of bootstrapping [12,13] in statistics. Under bootstrap resampling, data are drawn randomly from a dataset to form a new training dataset, which has the same number of data points as the original dataset. In committee machines, bagging is widely used for its simplicity and efficiency in enhancing the prediction power of individual models, also called experts [11]. Applications have spanned a wide range of fields. Zhu et al. [14] applied the bagging method to the forecasting of tropical cyclone tracks over the South China Sea. Fraz et al. [15] used an ensemble system of bagged and boosted decision trees to retinal blood vessel segmentation. Brenning [16] investigates the performance of bagging in spatial prediction models for landslide hazards. Dietterich [17] compared the effectiveness of bagging, boosting, and randomization methods for constructing ensembles of decision trees. A recurring question in these previous works was: how to choose the ensemble of training data sets for tuning the weights in machine learning? The computational cost of ensemble-based methods scales with the size of the ensemble, but excessively reducing the ensemble size comes at the cost of reduced predictive performance. The choice of ensemble size was often based on the size of input data and available computational power, which can become a limiting factor for larger datasets and models.
This paper presents the Entropy Ensemble Filter (EEF) as a method to reduce ensemble size without significantly deteriorating prediction performance. Conversely, we show that for small ensemble sizes, selecting for high entropy training sets can improve performance at the same computational burden. Entropy can be defined as uncertainty of a random variable or, conversely, the information that samples of that random variable provide. It is also known as the self-information of a random variable [18]. In this work, entropy is used as a measure of information content for each bootstrap resample of the dataset. The method selects high entropy bootstrap samples for ensemble model training, aiming to maximize information content in the selected ensemble. We applied our proposed method on a simulation of synthetic data with the ANN machine learning technique. The performance results of our proposed method are analyzed in comparison with those obtained by the conventional method when all ensemble members or a random subset are used for training.

2. Methods: Entropy Ensemble Filter

The philosophy of the EEF method is rooted in using self-information of a random variable, defined by Shannon’s information theory [19] for selection, to provide direction in the inherent randomness of ensemble models which are created by bootstrapping. In previous work, a weighting of model-generated ensemble members based on relative entropy was used [20] to reflect additional information available after ensemble generation. In this work, the focus is on selecting an ensemble of training datasets before ensemble model tuning (training of the ANNs). It is our hypothesis that if an ensemble of ANN models or any other machine learning technique uses the most informative ensemble members for training purpose rather than all bootstrapped ensemble members, it could reduce the computational time substantially without negatively affecting the performance of simulation. We discuss the EEF algorithm based on Shannon information theory. Shannon quantifies information by calculating the smallest possible number of bits, on average, to communicate outcomes of a random variable, e.g., per symbol in a message (here, symbols represent bins in a probability mass function which are defined with respect to input data resolution) [18,19,21,22]. The Shannon entropy H, in units of bits (per symbol), of ensemble member m in the bootstrapped dataset (generated from step 1, Algorithm 1), is given by:
H m ( Y ) = k = 1 K p y k log 2 p y k ,
where pyk is the probability of occurrence, within ensemble member m, with values according to random variable Y, of the kth possible value of the variable (K is a total number of discrete values Y can take, i.e., the number of bins in discretization). This equation gives the entropy in the units of “bits” because it uses a logarithm of base 2. Algorithm 1 illustrates the workflow of the EEF method. The EEF method can assess and cluster the ensemble members to provide the most informative ones for training, selected from the initially generated ensemble. Since model training is by far the most computationally expensive part of the procedure, overall computation time is roughly linear with the number of retained ensemble members, potentially leading to significant savings.
Algorithm 1. Entropy Ensemble Filter
BeginComment
1Initialize the procedure of bagging
Generate M new training datasets from input data using bootstrapping (M committee members)M: Initially user defines the ensemble size
2Estimate the entropy of each ensemble member
Hm ← Equation (1)
3Find the top L ensemble member with maximum entropyDetermine L based on computational constraints.
Alternative choice is to use 40% of members based on the analysis of error gradient (L = 0.4M)
Sort the ensemble members and find the L most informative ones
4Set up neural networks or other machine learning techniques
Use the most informative ensemble L members rather than M ensemble members for training or calibrating the weights inside the model
5Use ensemble averages instead of individual ensemble models
The rationale for using ensemble averages is that the expected error of the ensemble average is less than or equal to the average expected error of the individual models in the ensemble
End

3. Application: Synthetic Data Simulation

In this section, the EEF method is tested by using synthetic data and artificial neural networks. Le et al. [23] note that “the deep learning community has reported remarkable results taking the synthetic data to train artificial neural networks”. We use artificial signals that we corrupt with noise before model training to examine the model’s capability to capture the essence of the signal from the noisy signal. In this study, a sinusoidal signal, a non-sinusoidal periodic waveform (sawtooth wave), and a nonperiodic composite signal have been used to create signals that we interpret as a true underlying process (target signal) we wish to simulate. However, these signals are not directly observable for model training, but corrupted by noise that represents, e.g., measurement error or unknown external influences. These target signals are chosen because of the following reasons:
  • Sinusoids are ubiquitous in physics because many physical systems that resonate or oscillate produce quasi-sinusoidal motion.
  • The performance of the method for simulation of a non-sinusoidal waveform was tested on a sawtooth signal, a classical geometric waveform.
  • A composite signal has been used to test the performance of the method for simulation of nonperiodic signals. The signal has been composed of upward steps followed by exponential decay functions, which resemble typical behaviour for river flow response to rainfall events.

Procedure

First, random noise with a normal distribution was added to the known sinusoidal, sawtooth, and composite signals (Equations (2)–(4) respectively) to make the noisy signal (Equation (5)) presented in Figure 1, Figure 2 and Figure 3. The noisy signal was used as an input in the bagging procedure to generate an ensemble of input datasets, referred to as ensemble members. Following the steps described in Algorithm 1, the chosen members by the EEF method are used for training ANN’s and subsequently generating the simulation result for each member (Equation (6)).
y ( x ) = sin ( 2 π 50 x ) ,  
y ( x ) = 1 2 1 π n = 1 [ 1 n sin ( 2 π n x 0.02 π 2 ) ] ,  
y ( x ) = { 0 0 x < 10 e ( x 10 ) / 90 10 x < 80 e ( x 10 ) / 90 + 0.5 × e ( x 80 ) / 10 80 x < 120 e ( x 10 ) / 90 + 0.5 × e ( x 80 ) / 10 + 0.3 × e ( x 120 ) / 50 120 x 200 ,  
y i n p u t = y + ε ,       f ( ε ) ~ N ( 0 , σ ) ,
y t , m p r e d = A N N ( x t , m )    t { 1 T } ,
where T is the number of data points in the signal. Subsequently, a prediction is made using the ensemble average over the selected subset of the ensemble. There are three options for the formation of the subset used in the analysis in this paper: (1) Mall: all originally generated ensemble members; (2) Mrand: a randomly selected subset of size L (reduced from original size M); and (3) MEEF: the EEF subset, formed by selecting the top L highest entropy training data sets generated by bootstrapping. In Equation (7), the case for option 3 is shown.
y t E E F = 1 L m M E E F ( y t , m p r e d ) ,
The RMSE of the ensemble average in Equation (8) shown for the EEF method is calculated with respect to the original target signals y (Equations (2)–(4)).
R M S E ( y t E E F , y t ) = 1 T t = 1 T ( y t E E F y t ) 2 .
The entropy calculations for each ensemble member are performed in a discretized space, where the signals are processed using 10 bins of equal bin-size arranged between the signal’s minimum and maximum values. These bin sizes were chosen to strike a balance between being fine enough to capture the distribution of the values in the time series, while being coarse enough so that enough data points are available per bin to have a representative histogram. The entropies of all training datasets in the ensemble are then calculated by Shannon entropy equation (Equation (1)). Since entropy is calculated empirically, the method can be applied regardless of the data distribution type. The index of the highest entropy ensemble member found is used to determine the new ensemble size (see Appendix A). Then, the ensemble of training data sets are filtered, and only the top highest entropy training data sets are retained. ANN models were then trained on all bootstrapped noisy data sets retained in the ensemble, and on all original ensemble members for reference. In the experiments, the ANN that was used was a feed-forward multilayer perceptron model (by using a hyperbolic tangent activation function) with one input and output layer (the bootstrapped datasets), and 10, 50, and 20 hidden neurons. These were fitted to the bootstrapped noisy sinusoidal, sawtooth, and composite signals, respectively, using the early stopping procedure. For each ensemble, the predictions of the ANNs were averaged to yield an ensemble prediction. The distribution of the ensemble predictions is not forced to any parametric form, and, in general, bagging and our proposed modification are not sensitive to distribution type. The predictions were evaluated by calculating RMSE against the target signal, i.e., the synthetic data before the corruption by noise. Note that the true signal was not available for the ANN during training.

4. Results and Analysis

The variations of information content for each ensemble member training data set for the sinusoidal signal, sawtooth wave, and composite signal are shown in Figure 4, Figure 5 and Figure 6, respectively. In the figures it is visible that the bootstrapping leads to significant variability in the training dataset entropies.
After the most informative ensemble members are chosen to train ANNs and their outputs have been processed through ensemble averaging, the predictions are plotted in Figure 7, Figure 8 and Figure 9. For comparison, the conventional bagging method, based on all ensemble members, is used to train a separate ensemble of neural networks. The prediction from these ensemble averages is included in the same figures. As illustrated in Figure 7, Figure 8 and Figure 9, the simulation results of using all ensemble members and the chosen ones by the EEF method closely resemble each other, which indicates that filtering the ensemble models could be a reliable method.
To get insight in the trade-off between ensemble size (i.e., computation time) and accuracy in terms of RMSE, analysis of the error gradient with growing ensemble size was conducted. In this analysis, the decrease in error was compared between using the EEF method and conventional bagging with increasing ensemble size. To filter out some of the inherent randomness from the results, the whole process was repeated ten times with different realizations of the random noise, and resulting RMSEs were averaged over these 10 realizations. The error gradient shows the effect of varying the final ensemble size after selection. However, also, the initial ensemble size plays a role in the prediction accuracy, since selecting from a larger initial pool of ensemble members means higher entropy values in the selection. In current practice, the user will decide on how many ensemble models are needed for training and tuning the weights in machine learning. Therefore, we show the results of error gradient analysis for 100 and 1000 initial bootstrapping in Figure 10, Figure 11 and Figure 12 and Figure 13, Figure 14 and Figure 15, respectively. The idea of ranking the ensemble by the EEF method and subsequently using it for machine learning shows its advantages in Figure 10 and Figure 13, for the sinusoidal signal. For the other signals, the advantages are mostly in the smallest ensemble sizes, visible in Figure 10, Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15. The results show that using the 40% most informative ensemble members of the set initially defined by the user appears to be most effective.
An upwards jump in RMSE, such as seen for the conventional bagging in Figure 10 and Figure 13, indicates that an ensemble member (training data set) was picked that led to an ANN that does not perform well in prediction, deteriorating the ensemble average when added to the ensemble. The effect of adding such an ensemble member will be larger when the selected ensemble is still small in size, since the relative weight of the new member in the average will be higher. In the entropy-based ordering of the EEF, those ensemble members would also be picked eventually, but generally later in the sequence, when the effect on the total ensemble is small enough not to cause an important upward jump in RMSE. Since the EEF reduces the ensemble size, in many cases some of the poorly performing members will be eliminated from the ensemble. In the limit of using the full ensemble, the EEF and the conventional method converge upon each other (as seen at the extreme right of Figure 10, Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15), since the full ensembles are identical. The fact that those jumps are not displayed in the EEF results indicates that these poorly performing ANNs are not among the ones trained on the top highest entropy training data sets. These are the ones that would typically be retained by the EEF method.
Furthermore, the EEF method has been tested with a different initial number of committee members illustrated in Table A1, Table A2 and Table A3 (see Appendix A). The results of the sinusoidal signal, sawtooth wave, and composite signal simulation indicate that the EEF method can improve the simulation error 3% on average for a sinusoidal signal, and relatively maintain error performance at the same level for sawtooth wave and composite signal. More importantly, empirical testing showed that it can reduce the simulation time by 54%, 56%, and 45% on average, respectively.

Protection against Overfitting

There are several layers in the procedure that offer protection against overfitting. Firstly, it is important to note that the entire prediction procedure never sees the original data set that is tested against, since only the noise-corrupted version of the data is used for training; however, the final evaluation of performance is against the non-noisy original data set.
Secondly, for both compared methods, the individual ensemble member ANNs are trained on bootstraps of these noise-corrupted data. For each individual data set in the selected ensemble, the ANN training uses the standard and well-tested early stopping (also known as stopped training) procedure to prevent overfitting. In this procedure, the data is divided in training and validation data and training continues until validation performance starts to deteriorate [5].
Thirdly, the bagging procedure adds another layer of protection against overfitting where the outcomes of several fitted models are averaged, reducing reliance on one single model. As can be seen in Figure 10, Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15, larger ensemble sizes improve prediction up to a certain ensemble size. Therefore, a trade-off between accuracy and ensemble size exists for smaller ensembles. The EEF method provides a way to reduce ensemble size (and computational cost) with smaller decrease in performance, or, conversely, improve performance for fixed small ensemble sizes. In that sense, the EEF method is a Pareto improvement over the conventional method. The EEF selects ensemble members before any model is trained and therefore does not have access to the original signal or predictive performance. Summarizing, the EEF does not increase overfitting issues compared to conventional bagging, which already has safeguards in place at different levels.

5. Conclusions

In this article, we introduced a novel procedure to assess and cluster ensemble members for bootstrap aggregating (bagging). Fundamentally, we assert that the EEF method can reduce the computational time of simulation very substantially while maintaining error performance at the same level of the conventional method, where all of the ensemble models used for simulation. The idea of ranking and selecting the ensemble with the EEF method and subsequently using them for machine learning shows its advantages in Figure 10 and Figure 13. Figure 10, Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15 show a clear effect of ensemble size on prediction quality for the smaller ensemble sizes. The positive effects of using the EEF method are most pronounced in the smallest ensemble sizes. The EEF method can be useful to meet the computational power constraints for the continual arrival of new data, which necessitates frequent model updating in atmospheric science. Peng et al. [24] note that computational expense is one of the difficulties in air quality forecasting. Although the results of this study indicated the efficiency of the proposed framework in application to synthetic data simulation, further evaluations of the proposed framework are still necessary, especially in applications to data assimilation problems with real data and numerous observations.

Acknowledgments

This research was supported with funding from Hossein Foroozand’s NSERC CGS M award and Steven V. Weijs’s NSERC discovery grant.

Author Contributions

Hossein Foroozand and Steven V. Weijs designed the method and experiments; Hossein Foroozand and Steven V. Weijs performed the experiments; Hossein Foroozand and Steven V. Weijs wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Results of EFF Method for Three Different Signals

Table A1. Sinusoidal signal outputs of the EEF method for different initial committee members.
Table A1. Sinusoidal signal outputs of the EEF method for different initial committee members.
All EnsembleEEF EnsembleRun Time EEF Ensemble (s)Run Time All Ensemble (s)RMSE EEF EnsembleRMSE All EnsembleAverage % of Saving TimeAverage Rate of Change in Error
1005614.725.00.3740.3610.460.05
1009423.024.00.3720.375
100236.024.30.3610.400
1005514.023.70.3670.380
1008621.024.60.3400.395
1004411.024.60.3640.397
100358.526.30.3420.377
100214.625.10.3620.389
1005211.825.10.3570.380
1008319.824.90.3860.385
20015140.757.40.3490.3980.560.04
2005413.448.80.3740.375
20010323.747.60.3400.376
2009722.349.40.3550.371
20016639.147.70.3740.369
2006115.452.50.4010.411
200286.946.80.3890.387
20010623.849.30.3630.366
2006916.047.60.3830.394
2008619.448.10.3810.404
100024662.1250.40.3790.3830.620.01
1000647161.7257.10.3710.368
100037391.2246.70.3690.388
1000413100.9251.30.3630.374
100039598.0248.70.3910.388
1000624156.0251.10.3810.382
10009121.8248.80.3780.382
100061.4250.20.3780.384
1000627153.7249.20.3730.379
Table A2. Sawtooth wave outputs of the EEF method for different initial committee members.
Table A2. Sawtooth wave outputs of the EEF method for different initial committee members.
All EnsembleEEF EnsembleRun Time EEF EnsembleRun Time All EnsembleRMSE EEF EnsembleRMSE All EnsembleAverage % of Saving TimeAverage Rate of Change in Error
1004015.934.90.3710.3430.62−0.012
1003813.135.90.3400.353
1004315.333.10.3510.358
1006121.333.30.3540.341
1009329.730.20.3420.351
10093.131.10.3380.354
100144.634.10.3410.348
1003711.833.00.3400.349
100164.832.50.3370.349
100165.734.20.3420.353
20014449.164.00.3510.3490.57−0.001
20012542.067.00.3470.357
2006018.264.50.3470.341
2006820.669.00.3530.349
2006420.370.80.3520.349
2008427.969.20.3510.350
20010937.366.80.3430.351
20062.473.70.3410.344
2007325.369.40.3490.348
20014847.970.40.3450.344
1000861278.5312.60.3460.3420.50−0.004
1000409122.1308.20.3470.346
100014241.0313.30.3440.345
100028588.0320.70.3480.347
1000511154.8313.40.3470.343
100028289.0310.20.3470.343
1000743222.5311.50.3430.343
1000689214.4316.10.3440.346
1000948306.1320.50.3460.344
Table A3. Composite signal outputs of the EEF method for different initial committee members.
Table A3. Composite signal outputs of the EEF method for different initial committee members.
All EnsembleEEF EnsembleRun Time EEF EnsembleRun Time All EnsembleRMSE EEF EnsembleRMSE All EnsembleAverage % of Saving TimeAverage Rate of Change in Error
1009128.634.30.090.0890.4−0.019
1003210.444.10.0890.092
1007824.737.40.0910.09
1009134.541.90.0930.09
1005518.934.70.0920.09
1007223.134.30.090.089
1007625.633.70.0910.089
100269.236.50.0920.089
1008329.336.60.0950.093
1003613.733.80.0940.089
2006922.261.90.0880.0880.48−0.001
2007624.264.60.0920.092
20017556.563.60.0890.089
2002910.465.90.0890.088
20018959.465.90.0920.092
20011238.562.30.0920.09
2006019660.0910.09
20017454.667.40.0910.091
2005516.166.70.090.09
20011536.768.40.0890.091
1000861264.4293.40.0890.090.480.011
100031793.7291.60.0890.09
100019457.52900.0890.09
10007119.7294.70.0890.092
1000489141.8293.60.0890.091
1000534155.3290.70.090.09
1000655188.2286.60.0890.09
1000673196.5288.30.0890.091
1000878252.4293.80.090.09

References

  1. Lazebnik, S.; Raginsky, M. Supervised Learning of Quantizer Codebooks by Information Loss Minimization. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 1294–1309. [Google Scholar] [CrossRef] [PubMed]
  2. Raginsky, M.; Rakhlin, A.; Tsao, M.; Wu, Y.; Xu, A. Information-Theoretic Analysis of Stability and Bias of Learning Algorithms. In Proceedings of the IEEE Information Theory Workshop (ITW), Cambridge, UK, 11–14 September 2016; pp. 26–30. [Google Scholar]
  3. Giffin, A.; Urniezius, R. Simultaneous State and Parameter Estimation Using Maximum Relative Entropy with Nonhomogenous Differential Equation Constraints. Entropy 2014, 16, 4974–4991. [Google Scholar] [CrossRef]
  4. Zaky, M.A.; Machado, J.A.T. On the Formulation and Numerical Simulation of Distributed-Order Fractional Optimal Control Problems. Commun. Nonlinear Sci. Numer. Simul. 2017, 52, 177–189. [Google Scholar] [CrossRef]
  5. Hsieh, W.W. Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels, 1st ed.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2009. [Google Scholar]
  6. Huang, S.; Ming, B.; Huang, Q.; Leng, G.; Hou, B. A Case Study on a Combination NDVI Forecasting Model Based on the Entropy Weight Method. Water Resour. Manag. 2017, 31, 3667–3681. [Google Scholar] [CrossRef]
  7. Amato, F.; López, A.; Peña-Méndez, E.M.; Vaňhara, P.; Hampl, A.; Havel, J. Artificial Neural Networks in Medical Diagnosis. J. Appl. Biomed. 2013, 11, 47–58. [Google Scholar] [CrossRef]
  8. Foroozand, H.; Afzali, S.H. A Comparative Study of Honey-Bee Mating Optimization Algorithm and Support Vector Regression System Approach for River Discharge Prediction. Case Study: Kashkan River Basin. In Proceedings of the International Conference on Civil Engineering Architecture and Urban Infrastructure (CIVILICA; COI: ICICA01_0049), Tabriz, Iran, 29–30 July 2015; Volume 1. [Google Scholar]
  9. Ghahramani, A.; Karvigh, S.A.; Becerik-Gerber, B. HVAC System Energy Optimization Using an Adaptive Hybrid Metaheuristic. Energy Build. 2017, 152, 149–161. [Google Scholar] [CrossRef]
  10. Elshorbagy, A.; Corzo, G.; Srinivasulu, S.; Solomatine, D.P. Experimental Investigation of the Predictive Capabilities of Data Driven Modeling Techniques in Hydrology—Part 2: Application. Hydrol. Earth Syst. Sci. 2010, 14, 1943–1961. [Google Scholar] [CrossRef]
  11. Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  12. Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
  13. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap, Softcover Reprint of the Original, 1st ed.; Chapman and Hall/CRC: New York, NY, USA, 1993. [Google Scholar]
  14. Zhu, L.; Jin, J.; Cannon, A.J.; Hsieh, W.W. Bayesian Neural Networks Based Bootstrap Aggregating for Tropical Cyclone Tracks Prediction in South China Sea. In Neural Information Processing; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; pp. 475–482. [Google Scholar]
  15. Fraz, M.M.; Remagnino, P.; Hoppe, A.; Uyyanonvara, B.; Rudnicka, A.R.; Owen, C.G.; Barman, S.A. An Ensemble Classification-Based Approach Applied to Retinal Blood Vessel Segmentation. IEEE Trans. Biomed. Eng. 2012, 59, 2538–2548. [Google Scholar] [CrossRef] [PubMed]
  16. Brenning, A. Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation. Nat. Hazards Earth Syst. Sci. 2005, 5, 853–862. [Google Scholar] [CrossRef]
  17. Dietterich, T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Mach. Learn. 2000, 40, 139–157. [Google Scholar] [CrossRef]
  18. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  19. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  20. Weijs, S.V.; van de Giesen, N. An Information-Theoretical Perspective on Weighted Ensemble Forecasts. J. Hydrol. 2013, 498, 177–190. [Google Scholar] [CrossRef]
  21. Shannon, C.E. Communication in the Presence of Noise. Proc. IRE 1949, 37, 10–21. [Google Scholar] [CrossRef]
  22. Weijs, S.V.; van de Giesen, N.; Parlange, M.B. HydroZIP: How Hydrological Knowledge Can Be Used to Improve Compression of Hydrological Data. Entropy 2013, 15, 1289–1310. [Google Scholar] [CrossRef] [Green Version]
  23. Le, T.A.; Baydin, A.G.; Zinkov, R.; Wood, F. Using Synthetic Data to Train Neural Networks Is Model-Based Reasoning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 9–14 May 2017; pp. 3514–3521. [Google Scholar]
  24. Peng, H.; Lima, A.R.; Teakles, A.; Jin, J.; Cannon, A.J.; Hsieh, W.W. Evaluating Hourly Air Quality Forecasting in Canada with Nonlinear Updatable Machine Learning Methods. Air Qual. Atmos. Health 2017, 10, 195–211. [Google Scholar] [CrossRef]
Figure 1. Sinusoidal signal and unknown noisy signal.
Figure 1. Sinusoidal signal and unknown noisy signal.
Entropy 19 00520 g001
Figure 2. Sawtooth signal and noisy signal.
Figure 2. Sawtooth signal and noisy signal.
Entropy 19 00520 g002
Figure 3. Composite signal and noisy signal.
Figure 3. Composite signal and noisy signal.
Entropy 19 00520 g003
Figure 4. Variation of ensemble models’ entropy (sinusoidal signal).
Figure 4. Variation of ensemble models’ entropy (sinusoidal signal).
Entropy 19 00520 g004
Figure 5. Variation of ensemble models’ entropy (sawtooth wave).
Figure 5. Variation of ensemble models’ entropy (sawtooth wave).
Entropy 19 00520 g005
Figure 6. Variation of ensemble models’ entropy (composite signal).
Figure 6. Variation of ensemble models’ entropy (composite signal).
Entropy 19 00520 g006
Figure 7. Sinusoidal signal simulation results.
Figure 7. Sinusoidal signal simulation results.
Entropy 19 00520 g007
Figure 8. Sawtooth signal simulation results.
Figure 8. Sawtooth signal simulation results.
Entropy 19 00520 g008
Figure 9. Composite signal simulation results.
Figure 9. Composite signal simulation results.
Entropy 19 00520 g009
Figure 10. The error gradient analysis for sinusoidal signal and 100 initial bootstrapped ensembles.
Figure 10. The error gradient analysis for sinusoidal signal and 100 initial bootstrapped ensembles.
Entropy 19 00520 g010
Figure 11. The error gradient analysis for sawtooth signal and 100 initial bootstrapped ensembles.
Figure 11. The error gradient analysis for sawtooth signal and 100 initial bootstrapped ensembles.
Entropy 19 00520 g011
Figure 12. The error gradient analysis for composite signal and 100 initial bootstrapped ensembles.
Figure 12. The error gradient analysis for composite signal and 100 initial bootstrapped ensembles.
Entropy 19 00520 g012
Figure 13. The error gradient analysis for sinusoidal signal and 1000 initial bootstrapped ensembles.
Figure 13. The error gradient analysis for sinusoidal signal and 1000 initial bootstrapped ensembles.
Entropy 19 00520 g013
Figure 14. The error gradient analysis for sawtooth signal and 1000 initial bootstrapped ensembles.
Figure 14. The error gradient analysis for sawtooth signal and 1000 initial bootstrapped ensembles.
Entropy 19 00520 g014
Figure 15. The error gradient analysis for composite signal and 1000 initial bootstrapped ensembles.
Figure 15. The error gradient analysis for composite signal and 1000 initial bootstrapped ensembles.
Entropy 19 00520 g015

Share and Cite

MDPI and ACS Style

Foroozand, H.; Weijs, S.V. Entropy Ensemble Filter: A Modified Bootstrap Aggregating (Bagging) Procedure to Improve Efficiency in Ensemble Model Simulation. Entropy 2017, 19, 520. https://doi.org/10.3390/e19100520

AMA Style

Foroozand H, Weijs SV. Entropy Ensemble Filter: A Modified Bootstrap Aggregating (Bagging) Procedure to Improve Efficiency in Ensemble Model Simulation. Entropy. 2017; 19(10):520. https://doi.org/10.3390/e19100520

Chicago/Turabian Style

Foroozand, Hossein, and Steven V. Weijs. 2017. "Entropy Ensemble Filter: A Modified Bootstrap Aggregating (Bagging) Procedure to Improve Efficiency in Ensemble Model Simulation" Entropy 19, no. 10: 520. https://doi.org/10.3390/e19100520

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop