1. Introduction
Induction motors (IMs) are widely used rotating machinery in the manufacturing and power industries, due to their certain advantages such as low cost, easy controlling mechanism, robust design, high efficiency, and reliability. However, the likelihood of faults cannot be overlooked, as the motors experience significant electrical and mechanical loads because of their prolonged working periods ^{[1]}. An intrinsic flaw in the machine or adverse surrounding conditions could be the reason for failure. If the initial erratic behaviour is not identified, it can lead to motor failure, which will result in downtime and increased operation loss. Rotating machine condition monitoring has thus become increasingly interesting to researchers due to the inherent vulnerability to damage and failure of these machines. In order to improve the accuracy and capabilities of fault diagnosis systems, researchers are currently analyzing weak fault signals to extract fault features and classify them to enable realtime monitoring and diagnosis ^{[2]}. It is important to diagnose and monitor faults accurately and in a timely manner to prevent significant damage, extend the life of machines, increase accessibility, and lessen maintenance costs ^{[3]}.
Depending on the components, IM faults can be classified as bearing faults, faults connected to the rotor, faults connected to the stator, etc. Among them, approximately 44% of these faults occur in bearings ^{[4]}^{[5]}. In the case of bearing faults, the damage can occur in any of the four main components: the inner race, the outer race, the balls, and the cage. However, 90% of faults occur in the inner and outer races ^{[6]}.
In attempts to avoid dangerous accidents due to electric motor failure, breakdown maintenance methods were initially replaced by timebased preventive maintenance techniques. These were performed in accordance with working time periods, regardless of whether the machine needed a maintenance checkup or not. This approach is not only expensive but causes an interruption in the continuous working flow. Therefore, noninvasive conditionbased maintenance techniques are currently considered to be more effective because they can reduce the amount of unnecessary scheduled preventive maintenance operations and lower the operation cost ^{[7]}. Numerous studies have been conducted on bearing fault diagnosis to develop new advanced approaches by utilizing innovative technologies and industrial equipment. Modelbased ^{[8]} and datadriven ^{[9]} approaches are two basic techniques utilized in fault diagnosis. Modelbased methods require precise modelling of the dynamics of a system with a comparatively small dataset, which is crucial to design approaches for highly nonlinear and ambiguous circumstances. On the other hand, datadriven approaches have become popular as data acquisition processes have become easier due to improvements in advanced sensor technology. A datadriven approach requires less engineering and design effort, and it is possible to extract useful information about a system’s current condition using modern feature engineering techniques ^{[10]}.
Various types of sensor data are available for bearing fault diagnosis, such as vibration signals, acoustic emission signals, current signals, stray flux, thermal images, etc. ^{[11]}. Vibration signalbased analysis is a popular approach because of its high sensitivity to bearing faults, which can transmit any sudden change of intrinsic information immediately. The main limitation of using this type of signal is the high cost and high maintenance requirements of vibration sensors ^{[12]}. Fault analysis using acoustic emissions can be effective for early fault detection with a lowenergy signal, but it requires a high amount of data to provide a good result, which increases the computational complexity of the overall method ^{[13]}. Motor current signals have been used to effectively diagnose electrical faults (broken rotor bar faults, stator winding faults) and bearing faults. The acquisition of the current signal does not require any external sensors, which reduces the overall installation and data collection costs of the system.
Furthermore, current transducers can be used to measure the stator current from a single input source if frequency inverters and current transformers are not available. In addition to being highly reliable and noninvasive, motor current signal analysis (MCSA) is also considered one of the most effective conditions monitoring methods in bearing fault diagnosis ^{[14]}^{[15]}^{[16]}. MCSA has been applied to both to analyze bearing faults and the fault severity in IMs with fault frequency analysis ^{[1]}^{[17]}.
Generally, the original signal acquired from sensors is not enough to spot the existence of a fault and classify fault conditions, due to the presence of surrounding noise. To avoid ambiguity, extracting effective features from the sensor data by applying signal processing techniques is essential. There are diverse techniques for feature extraction. In fault diagnosis, timedomain features such as the rms, peaktopeak, root mean square, etc., are calculated using statistical formulas on the sensor signal; frequencydomain feature extraction involves fast Fourier transform, envelope analysis, and highorder spectral analysis of the timeseries signal ^{[18]}; and timefrequency domain features are derived using the wavelet transform, shorttime Fourier analysis, Hilbert–Huang transform, etc. ^{[6]}^{[19]}. Based on the processing gain and the ability to separate the fault characteristic frequency from the noise, frequencydomain analysis can provide a better understanding of fault frequencies than timedomain analysis. However, in many cases, methods based on the frequency domain do not perform well with nonstationary signals, whereas timefrequencybased methods can be an effective approach to deal with both stationary and nonstationary types of signals ^{[20]}.
The main drawback of the Fourier transformbased feature extraction process is that it becomes unstable at high frequencies. In such cases, the wavelet transform is considered an effective signal processing technique for fault classification of the rotation machinery ^{[21]}^{[22]}^{[23]}. To create time shift, the discrete wavelet transform (DWT) and the secondgeneration wavelet transform (SGWT) perform splitting or downshifting operations, which result in erroneous output due to the aliasing effect, which hampers reflection on the original state of the system ^{[24]}. Another variant of the wavelet transform, named dualtree complex wavelet transform (DTCWT), reduces the aliasing effect due to its time shift invariance and parity sampling properties. Although the wavelet transform is stable for signal deformation, this approach is not translation invariant when subsampling is involved. For these reasons, the Fourier, as well as wavelet transforms, cannot be considered the ideal feature extractors.
Recently, a knowledgebased feature extraction technique has been developed by Bruna and Mallat named wavelet scattering transform (WST), which utilizes complex wavelets to balance the discrimination ability and stability of the timefrequency domain signal ^{[25]}. This method filters the signal by assembling a cascade of wavelet decomposition coefficients, complex moduli, and lowpass filtering operations. The WST approach enables the modulus and averaging operation of the wavelet coefficients to acquire stable features. After that, the cascaded wavelet transform is employed to recover the highfrequency information loss due to the previous wavelet coefficients’ averaging modulus operation. The resultant scattering coefficients possess local stability and translation invariance, and they have shown good performance in different application areas, such as image processing ^{[26]}, sound classification ^{[27]}, and heart sound classification ^{[28]}. The WSTbased feature extraction process provides two advantages compared to other approaches in the fault diagnosis field. Firstly, the complex wavelet decompositions at multiple scales can provide rich descriptors of complicated structures for fault diagnosis through the cooccurrence of coefficients. Secondly, by using local weighted averaging, it is possible to reduce feature variability and preserve the local consistency of the class labels. It can also reduce the impact of noise originating from acquisition signals. Due to these reasons, researchers have become interested in this method and started implementing the WST in bearing and gearbox fault signal analysis. In ^{[29]}, with the extracted scattering coefficients, a bearing fault was classified by SVM with 99% accuracy by utilizing vibration signals. Gearbox fault was analyzed in ^{[30]} with an acoustic emission signal by utilizing the WST with linear discriminant analysis (LDA); this approach had an affordable computational cost. Additionally, in ^{[31]}, single and compound fault conditions were diagnosed by combining a denoising approach with WST coefficients to analyze rolling element bearings faults.
With the help of an effective feature extraction process, the original signal from sensors is transferred into a compact significant representation, which can be used as the input of machine learning classifiers for further training and optimizing decision functions. Common ML classifiers for fault diagnosis include support vector machine (SVM) ^{[32]}, gradient boosting decision trees, knearest neighbours (KNN) ^{[33]}, random forest (RF) ^{[34]}^{[35]}, and neural network approaches ^{[36]}^{[37]}. Furthermore, deep learning (DL) methods have been implemented in multiple research areas, including bearing fault analysis and provide very good performance ^{[38]}^{[39]}. Recently, unsupervised crossdomain diagnosis based on a joint transfer network ^{[40]} and modified auxiliary classifier GAN (MACGAN) ^{[41]} were implemented to generate multimode fault samples where the fault samples are limited.
2. Wavelet Scattering Transform (WST)
A wavelet transform is a widely applied timefrequency analysis method that has the advantage of being stable and multiscale in the presence of local deformation. It can effectively extract the local features from signals, but it is subject to change over time and can easily exclude significant signal features. A better timefrequency analysis technique built on the wavelet transform is the wavelet scattering transform (WST), which was proposed by Mallat ^{[42]}. The procedure is simply an iterative combination of a deep convolution network, consisting of lowpass filter averaging, a complex wavelet transform, and modulus operation ^{[36]}. With additional advantages of translation invariance, local deformation stability, and rich feature information representation, it also addresses the drawback of changing over time. For any given timedomain signal, x, the operation of WST can be described as follows:
 1.

At first, x is convolved with the dilated mother wavelet ψ, which has the center frequency of λ, to calculate the WST. This operation can be expressed as x*ψλ. Here, the average of the convolved signal, which oscillates at a scale of 2j, is zero.
 2.

After that, a nonlinear operator, such as a modulus, is applied to the convolved signal to eliminate these oscillations (i.e., x*ψλ). This procedure is used to make up for the information lost due to downsampling by doubling the frequency of the given signal.
 3.

Finally, a lowpass filter φ is applied to the resultant absolute convolved signal, which is equivalent to x*ψλ*φ
Therefore, for any scale (1≤j≤J), the firstorder scattering coefficients are calculated as the average absolute amplitudes of wavelet coefficients over a halfoverlapping time window having the size 2j. This can be written as (1):
$${S}_{1x}\left(t,{\lambda}_{1}\right)=\leftx*{\psi}_{{\lambda}_{1}}\right*\phi $$
The invariance ability will undoubtedly decrease when the highfrequency components are restored as a result of the aforementioned approach. By repeating the discussed steps on x*ψλ1, the scattering coefficients for the second order can be calculated as (2):
$${S}_{2x}\left(t,{\lambda}_{1},{\lambda}_{2}\right)=\left\rightx*{\psi}_{{\lambda}_{1}}*{\psi}_{{\lambda}_{2}}*\phi $$
The wavelet scattering coefficients for higher orders, where m ≥ 2, can be computed by iterating the mentioned process. This can be expressed as (3):
$${S}_{mx}\left(t,{\lambda}_{1},{\lambda}_{2},\dots ,{\lambda}_{m}\right)=\left\left\leftx*{\psi}_{{\lambda}_{1}}\right*{\psi}_{{\lambda}_{2}}\right\dots {\psi}_{{\lambda}_{m}}\right*\phi $$
The resultant scattering coefficients can be found by accumulating all of the coefficient sets of the scattering transform generated from the 0th to mth order, as shown in Equation (4) ^{[25]}.
$${S}_{x}=\left\{{S}_{0x},{S}_{1x},\dots ,{S}_{mx}\right\}$$
The basic steps of computing the wavelet scattering coefficients up to level 2 are illustrated in Figure 1. Here, the final feature matrix will be found by accumulating all the features from levels S_{0x}, S_{1x}, and S_{2x}.
Figure 1. The schematic diagram of the feature extraction procedure with the secondorder WST.
Here, S_{0x} represents the zeroorder scattering coefficients, which evaluate the local translation invariance of the given input signal. The highfrequency components of the convolved signal are lost during each stage’s averaging operation, but they can be recovered in the following stage’s convolution operation with the wavelet. The WST method possesses the stability of time warp deformation, conversion in energy, and contraction, which makes the overall system robust in a noisy environment and appropriate for many classification tasks ^{[30]}.
As a result of implementing the lowpass filter, φ, the network is invariant to translations up to a certain invariance scale. The resultant features from S_{x} inherit properties of wavelet transforms, which make them stable against local deformations. This also allows the scattering decomposition to detect subtle changes in bearing signals’ amplitudes under different conditions and makes the classification task easier. Therefore, the wavelet scattering network can be used as an effective way to create robust representations of different bearing conditions that minimize the differences under the same condition and maintain enough discriminability to distinguish among different bearing conditions.
Despite the similarity in structure between wavelet scattering networks and CNNs, there exist two main differences: the filters are predetermined rather than learned, and the features are not just the outputs of the final convolution layer but are all the layers combined. Based on previous research, nearly 99% of the scattering coefficient energy is contained within the first two layers of the scattering coefficient, with the energy decreasing rapidly as the layer level increases ^{[25]}^{[43]}. The WST applied in this research also considers scattering coefficients for two orders, which are represented as S_{1x} and S_{2x}. Through the cascaded wavelet decomposition, the WST can extract detailed feature information, and the local averaging technique can lessen the impact of noise. For these reasons, the WST can be considered a useful technique for extracting features in order to identify fault features in signals.