2. Machine Learning in Physical Layer of IVLC
2.1. Channel Emulator
The end-to-end channel for visible optical communication is exceptionally complex. In the transmission model, for example, in atmospheric environments, gas molecules and aerosol particles in the atmosphere absorb and scatter light radiation in the near-infrared band, resulting in a loss of signal received power. In addition, the change of atmospheric turbulence causes severe distortion to the optical signals. For another example, in the underwater environment, the attenuation of underwater light depends on the wavelength, where the attenuation of the signal increases with frequency. Moreover, there are other propagation effects such as temperature fluctuations, salinity, scattering, dispersion, and beam steering. For underwater VLC applications whose bandwidth is not too high (tens of MHz), the power attenuation with frequency can be approximately modeled as a linear relationship, allowing the modeling of underwater VLC multipath channels using compressive sensing (CS) method
[14]. Traditional methods for high-speed point-to-point VLC cannot support accurate VLC end-to-end channel modeling, but machine learning is able to simulate the complicated nonlinear dynamics of VLC channels
[15]. In massive multiple-input multiple-output (m-MIMO) VLC, the machine learning-based methods enable accurate estimation of the channel matrix
[16].
2.1.1. TTHNet
Conducting an experimental transmission test in an underwater environment is costly, but there is no accurate analytic model as a reference for underwater high-speed VLC. In order to reduce the cost of testing underwater VLC systems, a machine learning method is needed to model the underwater channel. The two-tributaries heterogeneous neural network (TTHnet) uses a convolutional neural network (CNN) for modeling the linearity of the underwater VLC channel and a two-layer MLP with a hollow layer for modeling the nonlinearity of the underwater VLC channel
[15]. The two-branch heterogeneous structure makes full use of the CNN’s shared parameters, thus reducing the system complexity. At the same time, it utilizes the MLP’s extremely strong nonlinear fitting capability to fit the nonlinearity in the channel. Experiments show that the channel modeled by TTHnet is extremely close to the real channel, and the average spectrum mismatch is only 36.2% of the MLP-based channel emulator and 44.3% of the CNN-based channel emulator.
2.1.2. FFDNet
Since the modulation bandwidth of a single LED is limited, the use of m-MIMO LED and PD arrays are expected to substantially increase the capacity and transmission rate of VLC systems. However, due to the complexity of VLC channels, it is extremely difficult to estimate the m-MIMO channel matrix, which requires deep learning methods. Fast and flexible denoising convolutional neural network (FFDnet) is used for channel estimation in millimeter-wave communication recently
[17][18], which is also applicable in VLC
[16]. As an image denoising tool using machine learning, FFDnet is able to recover the input noisy channel matrix into an almost noiseless channel matrix. Compared with the minimum mean square error (MMSE) method, the FFDnet has a stronger denoising effect, which can increase the peak signal-to-noise ratio (PSNR) of the recovered channel matrix image. Unlike the nonlinear channel modeling in point-to-point high-speed VLC links, the channel matrix is treated as an image and processed using machine learning methods of image processing, which is of great importance in channel estimation of m-MIMO-VLC channels.
2.1.3. Conclusions
The channel capacity determines the upper bound of the communication system rate, and therefore, the accuracy of the channel estimation determines the communication efficiency of the actual system. Complex VLC channels should be accurately predicted thanks to the widespread use of powerful ML techniques in channel estimation. ML algorithms will guide IVLC to break through its own bottlenecks and complete the comprehensive integration of high-speed communication and large-scale heterogeneous networking to achieve technical solutions for next-generation communication.
2.2. Channel Equalization
Channel equalization techniques generally estimate the transfer function of communication channels and try to remove the channel distortion by an adaptive filter
[19]. However, the common equalizers with linear adaptive algorithms become powerless in the field of high-speed VLC, because of the intrinsically limited modulation bandwidth of LEDs
[20] and nonlinear distortion introduced by photoelectric devices and VLC channels. Recently, ML-based equalizers, such as artificial neural networks (ANN)
[21], etc., have been developed for VLC systems. ML-based equalizers have shown outstanding equalizing performance, especially on modeling nonlinear phenomena, by adopting neural-network-based algorithms. Despite this, challenges such as massive computational complexity, slow convergence speed, and relatively poor generalization still prevent the further practical application of ML-based equalizers for VLC systems. Therefore, researchers have developed many variants, as presented next, to overcome those challenges.
2.2.1. Pre-Equalization GK-DNN
Conventionally, one would replace postequalization with pre-equalization to reduce the computational complexity and power consumption at the receiver side. Research works such as a weighted lookup table (WLUT), etc., have been proposed to mitigate the nonlinear distortion in VLC systems
[22]. However, LUT-based pre-equalization methods suffer from a massive increase in computational complexity when dealing with high-order and high-ISI communication scenarios. Therefore, researchers have come up with ML-based pre-equalization methods in the field of VLC systems to provide a new way of solving computational problems of LUTs.
In
[23], a pre-equalization method, namely Gaussian kernel-aided deep neural network pre-distortion (GK-DNN-PD), is proposed for a high-order modulated high-speed VLC system. GK-DNN-PD outperforms the LUT-PD in terms of memory depth (MD) and the required training dataset, which leads to lower computational complexity. The experimental results show a 1.56 dB Q-factor gain compared with LUT-PD.
The proposed GK-DNN-PD method consists of two phases: the training phase and the communication testing phase. In the training phase, the received signal, which is not pre-distorted, will be linearly equalized, giving researchers the label sets of the GK-DNN channel estimator. Then, the clean transmitted signal with certain MD would be the feature sets. Then, the GK-DNN channel estimator will be trained to obtain the weight and bias of the estimator. Next in the communication testing phase, the weight and bias obtained in the first phase would be used to pre-distort the clean signal that is to be transmitted. Specifically, the difference between the clean signal and the output of the GK-DNN channel estimator is also considered, in addition to the weight and bias during the pre-distortion progress. Additionally, clipping operation is also adopted to reduce the peak to PAPR, which consequently reduces the nonlinear degradation.
Moreover, an NN-based pre-equalizer is proposed in
[24] to mitigate the semiconductor optical amplifier (SOA) pattern effect for 50G PON, confirming the feasibility of NN-based pre-equalizer in intensity modulation and direct detection (IM/DD) system.
2.2.2. Postequalization GK-DNN
Since the conventional nonlinear postequalization methods based on the Volterra series suffer from a massive increase in computational complexity when dealing with high-order nonlinearity, researchers have turned to the ML for new inspirations. However, the time-consuming training progress of most ML-based postequalizers limits its actual application. To accelerate the training processing and greatly relieve the computational complexity of the equalizer at the receiver side, researchers have proposed the Gaussian kernel-aided deep neural network (GK-DNN)
[25] in the field of VLC systems.
Compared to the classical MLP, the major unique feature of GK-DNN is that the input data would go through a functional mapping that is based on Gaussian function, namely the Gaussian kernel, which maps the windowed input data to a nonlinear space to reduce the number of iterations and time consumption of the fitting progress. The researchers believe that the adjacent symbols’ influence towards the central (or current) one is in accordance with Gaussian distribution, hence the mapping operation would accelerate the training processing. The expression of the Gaussian kernel is given in
[25]. It should be noted that the scope-controlling parameter of the Gaussian kernel would greatly affect the equalization performance of GK-DNN. Generally, the larger the parameter is, the faster the training process would be. However, there is a trade-off between the training process acceleration and equalization performance. Therefore, the Gaussian kernel parameter selection is vital to obtain the best performance. Moreover, the selection of the number of hidden layer nodes is equivalently important, which directly decides the computational complexity of the equalizer. According to the experimental results in
[25], the GK-DNN equalizer could efficiently realize the postequalization in the VLC system with the aid of Gaussian kernel, which reduces the iteration epochs of the neural network by 47.06%.
2.2.3. Postequalization FSDNN
The frequency-slicing deep neural network (FSDNN) is a variant application of DNN that could be used in a high-speed VLC system
[26]. It has the characteristics of processing high and low frequency respectively to decrease computation complexity by 11.15% compared to the traditional MLP when it comes to the equalization performance in VLC system.
In order to solve the nonlinear frequency spectrum fading issue of the received signal after going through the VLC channel, DNN is introduced as an outstanding postequalizer to equalize linear and nonlinear distortion. However, the DNN structure must be complex enough, which means that more layers and nodes are needed and computation complexity improves to handle complicated linear and nonlinear distortions. For the expectation to release the pressure of DNN, it is worth noticing that high and low domain frequency suffer different degrees of fading. The high-frequency spectrum suffers more serious amplitude attenuation, while the low-frequency spectrum suffers less fading in the received signal in VLC system, so complex MLP structure is unnecessary for the low-frequency domain. Therefore, the received signal can be separated into high-frequency and low-frequency domains and processed, respectively, using a DNN equalizer with different complexity.
The received wide-band signal is split into two narrow-band parts in the frequency domain. Its frequency spectrum is separated into two sub-bands using a low-pass filter and a high-pass filter. Then, the two sub-band signals are respectively fed into two MLPs to train individually. The main factors of the two-MLP network should be tested artificially and adjusted to optimal values, including the number of layers, nodes in every layer, taps, and epochs. Once the MLP is finished training and the weight values are fixed, the sum of the output signal from two MLPs is the equalized and recovered signals.
2.2.4. Postequalization TFDNet
The commonly used ML-based equalizers in VLC systems often aim at fitting the waveform of the transmitting signal, which is a time-domain-serial signal. It is expected that the well-learned received signal should have the same spectrum as the transmitted one. However, waveform-fitting ML equalizers would sometimes cause the spectrum difference between the equalized signal and the original one. This suggests that researchers should take both time- and frequency-domain information into consideration to obtain a better equalization performance.
A novel postequalizer, namely joint time-frequency deep neural network (TFDNet), is reported in
[27] to compensate for the nonlinear distortions in the VLC system. TFDNet could reveal comprehensive information of nonstationary signals received in the VLC system by considering both time and frequency domain information simultaneously. TFDNet can be divided into three main procedures: (1) the received one-dimensional (1D, time domain) signal goes through a short-time Fourier transformation (STFT) operation and would be transferred into a two-dimensional (2D, time-frequency domain) signal, which is a matrix and could be denoted as Y; (2) then, the obtained STFT matrix Y is fed into the NN to be trained. The labels could always be obtained by manipulating the original transmitting signal. If researchers assume that each row of Y represents a certain frequency component, then Y would be fed into the following network column by column; (3) finally, after the NN finishes the training progress, the reconstructed transmitting signal could be obtained by carrying out the inverse STFT (ISTFT) operation, where the analysis window must satisfy the COLA constraint
[28]. Experimental results in
[27] also confirm that the proposed TFDNet could resist severe nonlinear distortions and achieve a 0.1 Gbps and 0.2 Gbps data rate gain for VLC system compared to other nonlinear compensators such as Volterra and DNN.
2.2.5. Postequalization DBMLP
To further improve the utility of NN equalizers, researchers had proposed a modified double-branch multilayer perceptron (DBMLP) postequilibrium algorithm
[15] to further reduce the consumption of energy and computational resources. DBMLP reconstructed the MLP postequalization algorithm using the structure of the Volterra series postequalization algorithm as a template. DBMLP combines the advantages of linear adaptive filters and MLP, which can improve the BER performance of the algorithm while reducing the complexity of the algorithm by 74.1%. The core structure of DBMLP is two branches of linear and nonlinear ones. In the DBMLP structure, a CNN with a convolutional layer and a dense-layer structure to simulate the linear distortion in the signal bandwidth is the first branch. In addition, a hollow MLP with an airlift layer and two dense-layer structures to simulate the nonlinear distortion outside the signal bandwidth is the second branch. The nonlinearity of the output of the first branch is corrected by the output of the second branch, and the hollow layer can ignore the effect of the intermediate signal on the signals on both sides.
To further reduce power consumption and complexity, a pruning algorithm based on DBMLP is proposed
[29]. The algorithm performs the operation of pruning by setting the smaller absolute value of weights of the connections to be pruned to 0 based on sparsity. The weights of the linear branch are not prunable while the nonlinear ones are prunable. The experimental results confirm the superiority of this approach.
2.2.6. Post-Equalization PCVNN
To improve the SNR in Underwater Visible Light Communication (UVLC) system, high LED power must be encouraged due to the LED’s incoherent characteristic and the water medium’s considerable attenuation. The nonlinearity grows more severe as the signal amplitude increases. Consequently, symbols on the outside of the constellation sustain a more nonlinear distortion than those on the inside. Based on complex-valued neural network (CVNN)
[30], an adaptive partition equalizer (PCVNN)
[31] has been presented, which reduces the complexity and has superior performance.
In PCVNN, the constellation is segmented into two areas by a proper threshold to distinguish between large-amplitude signals and small-amplitude signals. Then, the large- and small-amplitude signals are fed into two complex-valued neural networks. Finally, a fully connected neural network is then used to combine the signals into a complete one. Since large and small signals experience different nonlinear impairments, such a network structure can recover the signal more accurately and can greatly reduce the complexity of the model for small signals. The final experimental results also verified this conjecture
[31]. PCVNN achieves up to 56.1% computational complexity reduction compared with the standard CVNN at the same performance.
2.2.7. Postequalization LSTM-Equalizer
High-speed VLC is limited by inherent nonlinear effects. Linear equalizers with limited taps seem powerless, and the Volterra series schemes suffer from high computational complexity when the high-order taps are required. With the rise of ML in solving nonlinear problems, long short-term memory (LSTM) networks are studied for VLC systems.
In
[32], researchers proposed a memory-controlled LSTM NN equalizer for both linear and nonlinear compensation, which outperforms the conventional Volterra-based and FIR-based equalizers. LSTM carries out channel equalization as a pattern classifier where the output of LSTM cells is activated by a specially designed function. Training data with high priority would be assigned by LSTM to the latest training sequence. The proposed LSTM equalizer in
[32] contains an input layer, a logical hidden layer with long and short-term memory, a classification layer, and an output layer with a merge node. A standard LSTM cell structure is used for long/short-term memory links. Moreover, a batch random resequencing procedure is adopted to control the memory effect.
Recently, the variants of LSTM have also drawn the attention of researchers because the simple LSTMs have a slow convergence speed. This is because the LSTM unit’s inner parameters prolong the training period. A convolution-enhanced LSTM (CE-LSTM) equalizer, which extracts the features by using a convolutional layer, is proposed in
[33] to shrink the complexity of the LSTM network and speed up the convergence progress. The experimental results also confirmed the feasibility of the proposed CE-LSTM equalizer.
2.2.8. Postequalization MPANN
Although the ML-based equalizers for mitigating both the linear and nonlinear distortions in VLC systems have been booming recently, the computational complexity is still a problem that needs to be further solved. Therefore, an ML-based equalizer with relatively optimal equalization performance while still maintaining a low complexity is needed in the field of VLC. One promising way is to greatly relieve the equalizer’s complexity by moderately sacrificing partial performance.
Researchers have developed a simplified ML-based equalizer, namely the memory-polynomial artificial neural network (MPANN)
[34], to prune the network structure and still maintain similar equalization performance as MLP or other NNs. Likewise, the input data to be fed into MPANN could be obtained by windowing the received time-serial signal. The length of the window is usually called the memory length, which also represents the dimensions of the features. The major characteristic of MPANN is that its input layer, namely the memory-polynomial layer (MP layer), would expand the input features by one certain function, which is memory polynomial expansion. In addition, the Gaussian, Fourier basis, and other trigonometric polynomials (e.g., Legendre, Chebyshev, etc.) could be the function in the input layer. It is believed that the demanded nodes of the modified NN structure could be significantly decreased if one could provide a prior knowledge of the nonlinear model. Therefore, the memory polynomial expansion is adopted to map the input features to higher dimensional data space. Then the output pattern of the MP layer is multiplied by the corresponding weights and fed into the following hidden layer of the NN. A regular activating (ReLU) and weighting process are conducted in the hidden layer and back propagation (BP) algorithm is utilized to update the parameters. Then, finally, the output layer is utilized to output the equalized symbol. The experimental results confirmed that the MPANN could achieve the same equalization performance as the regular MLPs and only requires less than a quarter of the complexity
[34].
2.2.9. Conclusions
As can be seen from the above presentation, the application of neural networks in channel equalization has become more than a simple application. The integration of neural networks with communication systems is starting to emerge. Different branches of neural networks are beginning to emerge, and many more choose to extract communication-specific features from the input data. Beyond that, fast development of computational power resources make it promising to implement ML-based modules in the field of VLC. ML-based methods with powerful nonlinear phenomenon modeling ability open a new gate to solving the inherent nonlinear problems in VLC system. However, further optimization and improvement would be needed for those ML-based equalizers in terms of computational complexity, convergence speed, and generalization. Table 1 compares the equalizers mentioned above.
Table 1. Summarization of machine learning algorithms for channel equalization.