Multiple Input Multiple Output (MIMO) systems have been gaining significant attention from the research community due to their potential to improve data rates.
1. Introduction
Higher data rates and reliability are a few challenges faced by 5G and beyond technologies. MIMO systems are becoming one the significant pillars in wireless communication due to their spectral efficiencies and diversity gains
[1]. In addition, MU-MIMO systems further exploit both spatial multiplexing and spatial diversity for reliable communication links and higher data rate
[2]. MU-MIMO systems performance is superior to that of single-user MIMO communication networks due to its ability to serve more than one user in a Transmission Time Slot (TTS) and frequency band
[3].
Moreover, MU-MIMO systems provide a notable advantage over conventional communication systems (e.g., its multiplexing gain is proportional to the number of transmit antennas, even though the user device doesn’t need to have many antennas). This minimizes the burden because of limitations of size and user equipment cost
[3]. The other benefit is a fewer impact on propagation problems, i.e., antenna correlation and channel rank, since multi-user diversity address propagation issues. MU-MIMO is on the other hand, sensitive to Channel Stae Information (CSI) accuracy due to inter-user interference and may be mitigated as in
[4][5].
Recent advances in Depp Neural Network (DNN) have made RL
[6] the most important and attractive Artificial Intelligence (AI) technology. RL is a machine learning branch where an agent interacts with a given environment (mostly unknown), chooses actions, and gradually explores the environment’s characteristics. RL and DNN have been used in diverse research and real-life areas such as computer vision, the configuration of resources, self-organized systems, games, natural language processing, communication and networking, robotics, and autonomous control
[7][8][9][10][11].
Machine Learning (ML) techniques have been used recently in different aspects of communication systems, i.e., Resource Allocation (RA). For example, deep learning has been applied in
[12] for RA in massive MIMO systems. Similarly, applications of RL in RA management of MIMO systems have been found
[13][14][15][16], but RL-based methods have not been used for scheduling multiple users in uplink MU-MIMO systems.
The proposed work aims at investigating the potential use of RL in MIMO communications with attention on MU-MIMO systems. The motivation behind this work stems from relevant aspects, such as (1) the importance of allocating radio resources using a scheduling mechanism at the base station in future wireless communication networks; (2) the limitations of existing state-of-the-art user scheduling techniques; and (3) the robustness of RL methods in alleviating these shortcomings and giving an efficient performance.
Moreover, the RL, being one of the most appealing technology, learns the dynamics of an environment MU-MIMO scheduling in this case) with its experience. One does not need to set values of different parameters as required in standard scheduling algorithms. Instead, the agent can learn the optimal combination of wireless channel parameters and optimally select the group of users out of all candidate users for transmission due to its decision-making ability.
2. Multiple Input Multiple Output Systems
Optimal use of radio resources is essential to enhance the system capacity, and user schedule can play a crucial role
[17]. Researchers reviewed several user selection schemes adopted in the literature using various selection mechanisms. These methods can be broadly categorized into three types. The first group of tools adopts a certain system parameter for the selection of users (e.g., include SLNR and SINR, etc.)
[18]. The second category of the scheduling algorithm considers the scenario where the CSI is not available at the Base Station (BS). The third technique addresses the issue of fairness; i.e., consider fairness as the only performance metric to ensure the quality of service
[19].
From the game theory perspective, a user-centric access point scheduling for cell-free massive MIMO systems has been investigated in
[20]. Authors have developed a user-centric access-point cluster model as a local altruistic game. Moreover, a maximum non-neighbor-set-based concurrent spatial adaptive play technique is to obtain the Nash equilibrium.
Similarly, a user-selection mechanism for MU-MIMO systems in uplink mode is presented in
[21], where the authors used antennas and the ZF detector at the receiver in the BS. They consider the scenario of imperfect channel estimation with AWGN and Rician fading channels. The objective of the user selection is to maximize SNR.
Another user scheduling framework for a cooperative nonorthogonal multiple access scenario is developed in
[22]. Deep learning technology has been employed to recognize and classify the channels of imperfect CSI. Deep learning was used to enhance the accuracy of CSI. While authors in
[23] consider an end-to-end design of MU-MIMO systems in a downlink scenario, including precoding, limited feedback, and pilot sequences. Then, DL method is used to jointly optimize the precoder design at a base station BS and generate user feedback information. The neural network is used at BS to produce pilot sequences and assists the users in obtaining CSI.
A beam-user selection based on machine learning and low complexity hybrid beamforming infrastructure for the multiuser massive MIMO downlink system is presented in
[24]. The householder reflectors are employed to produce the orthogonal analog beamforming matrix. The proposed scheme also uses a feedforward neural network and shows reasonable performance in terms of energy efficiency in the ill-conditioned massive MIMO environment, while the joint user selection and optimal transmit power and antenna selection have been discussed in
[25]. The problem of joint user selection and optimal transmit power, and antenna selection is formulated to address inter-cell interference in multi-cell massive MIMO networks. A novel power consumption technique is also used to analyze precise power consumption.
The problems of max-product and max–min power allocation have been formulated in
[26][27] by using SINR and SLNR mechanism for linear precoder design. DNN is deployed to predict the optimal power allocation based on each user’s location and helps to minimize the system’s processing time in identifying the optimal power allocation.
An SLNR-based user scheduling approach is presented in
[28], where a user’s leakage power to other users is considered the major parameter to decide whether the user should be chosen. Another similar approach is proposed in
[29] that also addresses user selection. A modification to the leakage-based method regarding the selection of the first user was presented in
[30]. Block diagonalization-based technique is proposed in
[31], and the authors claim to achieve reasonable fairness and capacity among users. The authors of
[2] developed a data detection receiver and joint maximum likelihood modulation classification of the co-scheduled users. In
[32], the authors have considered fairness and sum-rate performance metrics by proposing a near-optimal scheduling algorithm.
A resource allocation mechanism is developed in
[33] by using a POMDP method for downlink transmit beamforming at BS equipped with multi-antennas. The authors have used the myopic policy in designing the scheme to prevent the high computational complexity of the value iteration technique. A binary FPA is used in
[34] for both antenna and user scheduling to obtain sum-rate performance with reduced computational complexity. An ML-based joint infrastructure for a hybrid precoder and user scheduling is proposed in
[35] for MU-MIMO to improve the sum rate. The first component is based on cross-entropy, while the latter is based on the Correlation factor. Keysight’s electronic system-level software is used to produce a channel matrix.