In recent times, various fault diagnosis methods centered on machine learning have been proposed for the determination of bearing faults [
4]. Feature extraction and fault pattern recognition are two common and fundamental processes for bearing fault diagnosis. During the feature extraction process, features in different domains, like the time domain, frequency domain, and time–frequency domain, have been utilized to enhance the fault diagnosis performance [
5,
6]. Time domain features can be conveniently extracted by applying statistical calculations, including the mean, variance, standard deviation, etc. [
7]. They are suitable for fault diagnosis as well as feature extraction from stationary signals. The time domain features might demonstrate vulnerability to data distinction; in addition, they possess non-linearity, which may cause further difficulties in diagnosis in real applications [
8]. Subsequently, frequency domain techniques are taken as alternative choices to describe fault patterns in another respect, as they have a better ability to discover and separate the frequency components. In this class, the most extensively utilized technique is FFT, i.e., fast Fourier transform [
9,
10]. Thus, in the frequency domain, some features, including the root variance of frequency, the frequency root mean square, and the frequency center, have been extracted by FFT and engaged in bearing fault diagnosis. However, in bearing fault diagnosis using the above methods, the major constraint is their inability to manage non-stationary signals [
11]. Moreover, features examining signals in both the time and the frequency domains are known as time–frequency features, and they are viewed as a potent practice for investigating non-stationary signals [
8]. Short-time Fourier transform, empirical mode decomposition (EMD), and wavelet packet transform (WPT) are three commonly applied methods for extracting time–frequency domain features that have been used in previous studies [
12]. All the features can reflect faults in different aspects and contribute to the final fault diagnosis results. Therefore, appropriate feature extraction approaches and manual feature strategies are required to obtain these statistical features, which require further expertise and domain knowledge. However, through signal processing methods, the extraction of statistical features includes merely superficial information about fault patterns, thus limiting the fault diagnosis performance [
13]. To better represent the fault patterns, deeper information about the faults should be considered in the feature extraction process. Deep learning methods can capture more hidden knowledge within hierarchical structures [
14,
15]. Generally, in bearing fault diagnosis, commonly considered deep learning methods include the convolutional neural network (CNN), long short-term memory network (LSTM), deep belief network (DBN), and stacked auto-encoder (SAE), since deep-learning-based fault diagnosis methods use vibration signals directly as inputs and automatically learn complex diagnostic information from the signals [
16,
17]. Zhang et al., for instance, proposed a CNN-based network to process two-dimensional image features in an attempt to discover the integral process of the CNN model in feature learning and the classification of fault diagnoses [
18]. Further, Qiao et al. developed a dual-input time–frequency model on the basis of a network of LSTM for rolling bearing fault diagnosis, which proved the LSTM method’s effectiveness [
19]. Moreover, Shao et al. proposed a unique approach labeled optimization DBN for the bearing diagnosis, whose effectiveness was validated with simulation and experimental signal data [
20]. Although these deep learning methods have achieved remarkable diagnosis performance, they still usually require the labeling of information in the learning process since, if the collected labeled data are insufficient, limitations can develop in industrial applications. To address this problem, using an autoencoder (AE) is a better choice since it automatically learns to self-express representations in an unsupervised way. Additionally, by using some stacked AEs, SAEs can extract high-level representational features by setting target values equivalent to the inputs, and, comparably with other networks, they can be conveniently and highly effectively trained. For example, in the SAE network, Liu et al. analyzed the effects of several hidden layers and, in each hidden layer, the neuron number on the model performance [
21]. Similarly, Lee et al. mentioned that SAEs can extract highly complex features and, consequently, can be considered more useful for practical applications when using non-linear activation functions [
22]. In brief, statistical and deep representation features from diverse perspectives manipulate specific fault information, which also signifies the existence of heterogeneity. However, these heterogeneous features’ complementarity has been rarely explored in bearing fault diagnosis, leaving a large gap regarding the supplementary augmentation of the diagnostic performance. Therefore, combining the novel idea of statistical as well as deep representation features might be a better fusion strategy and a favorable research idea in fault diagnosis. Hence, for conducting a successful bearing fault diagnosis, we adopted such a fusing technique to wholly describe the fault information in this study by combining statistical as well as deep representation features.
During the process of fault pattern recognition, some machine learning methods, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Decision Trees (DTs) have been advantageously exploited in the fault diagnosis of bearings [
23]. Nonetheless, using a single classification method has some consequences that impact the bearing fault diagnosis performance, like low generalization capability caused by the complicated states of bearing systems [
24]. Thus, for dealing with such issues, ensemble learning methods have been utilized, where bearing fault diagnostic decisions are developed from the consensus of several classifiers. Ensemble learning methods can be separated into feature partitioning and instance partitioning methods for the aim of base learner generation. Recently, instance partitioning methods in fault diagnosis, for example, Bagging and Boosting, have been broadly utilized [
25,
26]. However, the combination of features extracted from different domains will result in a high-dimensional and feature-redundant problem, which may lead Bagging and Boosting methods to perform poorly. Alternatively, fault diagnosis feature partitioning methods, such as random subspace (RS), have proven their superior advantage and capability to cope with the high-dimensional issue [
27]. Consequently, on the basis of the above discussion and for the objective of bearing fault diagnosis, the RS method is employed in the present study. Nevertheless, redundant features may be chosen into the same feature subset in RS, leading to the adverse effect on the precision of base learners. Fortunately, one of the sparse methods, the Least Absolute Shrinkage and Selection Operator (lasso) method, can filter out features from high-dimensional feature sets by L1 regularization, improving the prediction performance [
28,
29]. Benefiting from such excellent performance, this method has been favored in past research. For example, Lateko et al. introduced Lasso into the designed method to achieve effective optimization of learner parameters, and the experimental results confirmed the effectiveness of this method [
30]. Duque-Perez et al. improved the traditional Logistic regression classifier with the help of lasso to enhance the model performance of bearing fault diagnosis, and the experimental results confirmed its effectiveness [
31]. However, these methods focus more on utilizing lasso to optimize the basic classifier parameters without explicitly incorporating the time domain, frequency domain, and deep representation features related to bearing faults. To overcome these limitations, the RS method and the lasso method are combined in this study to better declare the relationship between multi-domain features and different fault types.
In the current study, a novel random subspace method, i.e., IHF-RS, is proposed by fusing statistical and deep representation features for the precise diagnosis of bearing fault. Firstly, heterogeneous features, including statistical features and deep representation features, are extracted by statistical methods in time domain, frequency domain, and time–frequency domain methods, as well as a deep learning method. Secondly, taking the different predictive power of feature sets into account, a modified lasso is introduced into the RS method for better base classifier construction. Finally, for the purpose of improving the diagnosis accuracy of the bearing fault, a majority voting strategy is employed to aggregate the outputs of various based learners. For verification of the proposed IHF-RS performance, comprehensive experiments are performed on the datasets granted by the bearing data center of Case Western Reserve University (CWRU) and Paderborn University. The experiment results revealed improvements regarding fault diagnosis of bearings via the proposed method, IHF-RS, in comparison with other methods.