Single-Agent Reinforcement Learning and Multi-Agent Reinforcement Learning

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Yunjun Han	--	1266	2024-01-08 03:59:42	\|
2	layout	Jessie Wu	+ 25 word(s)	1291	2024-01-08 04:18:38	\|

This entry is adapted from the peer-reviewed paper 10.3390/machines12010008

Flexible job shop scheduling (FJSP) is regarded as an effective measure to deal with the challenge of mass personalized and customized manufacturing in the era of Industry 4.0, and is widely extended to many real applications. Single-Agent Reinforcement Learning (SARL) is the algorithm only contains one agent that makes all the decisions for a control system. Multi-Agent Reinforcement Learning (MARL) is the algorithm comprises multiple agents that interact with the environment through their respective policies.

production planning and scheduling multi-agent reinforcement learning flexible job shop path flexibility

1. Single-Agent Reinforcement Learning for Scheduling

Single-agent reinforcement learning (SARL) virtualizes an agent interacting with the scheduling environment, learning a scheduling policy, and then making decisions. The early paper applying SARL to job shop scheduling (JSP) may be traced back to Zhang and Dietterich (1995) to learn a heuristic evaluation function over states ^[1]. Subsequently, Aydin and Öztemel (2000) ^[2] applied reinforcement learning to choose dispatching rules depending on the current state of a production system. Since the proposal of Deep Q-Network (DQN), using SARL to solve JSP has attracted more and more attention.

1.1. Single-Agent Reinforcement Learning with Value Iteration

Waschneck et al. (2018) ^[3] applied the DQN algorithm to solve a dynamic and flexible production problem with the objective of maximizing plant throughput. The proposed model took machine availability and processing characteristics as states and mapped the states to the station selection. Luo (2020) ^[4] developed an optimization algorithm based on Double DQN (DDQN) for dynamic FJSP with order insertion. The algorithm can select appropriate scheduling rules according to the job state and obtain a plan better than the general scheduling rules. Lang et al. (2020) ^[5] combined the DQN algorithm with discrete event simulation to solve a flexible job shop problem with process planning. Two independent DQN agents are trained. One agent selects operation sequences, while the other assigns jobs to machines. Du et al. (2021) ^[6] considered an FJSP with time-of-use electricity price constraint and dual-objective optimization for the makespan and total price and proposed a hybrid multi-objective optimization algorithm of estimation of distribution algorithm and DQN to solve the problem. Li et al. (2022) ^[7] presented dynamic FJSPs with insufficient traffic resources (DFJSP-ITR). They proposed a hybrid DQN (HDQN) that includes double Q-learning, prioritized replay, and a soft target network update policy to minimize the maximum duration and total energy consumption. Gu et al. (2023) ^[8] integrated DQN method into a scalp swarm algorithm (SWA) framework to dynamically tune the population parameters of SWA for JSP solving.

1.2. Single-Agent Reinforcement Learning with Policy Iteration

Wang et al. (2021) ^[9] considered the uncertainties, such as the mechanical failure of the job shop, and proposed a dynamic scheduling method based on the proximal policy optimization (PPO) to find the optimal scheduling policy. The states are defined by the job processing state matrix, the designated machine matrix, and the processing time matrix of the operations. An action set is defined as the operation selected from the candidate operation set, and the reward is related to machine utilization. The results showed that the proposed approach based on reinforcement learning obtained comparative solutions and achieved adaptive and real-time scheduling.

The application of SARL to solve the scheduling problem has some limitations. Firstly, FJSP-DT occurs in an uncertain environment, and the information obtained by the agent is likely to be incomplete, which is difficult for SARL to handle since SARL depends on global information. Secondly, SARL does not tackle communication and collaboration between jobs or machines, resulting in the loss of important scheduling information ^[10]. Thirdly, the action space of SARL expands with the number of jobs or machines ^[11], and a high action dimension can pose a challenge to policy learning. Research shows that the performance of policy gradient methods gradually lags as the action dimension increases ^[12].

2. Multi-Agent Reinforcement Learning for Scheduling

Multi-agent reinforcement learning (MARL) aims to model complex worlds where each agent can make adaptive decisions, realizing competition and cooperation with humans and other agents, and is attracting more and more attention in academia and industry ^[13].

From the perspective of the multi-agent system’s training paradigm, agents’ training can be broadly divided into distributed and centralized schemes.

Distributed Training Paradigm (DTP): In the distributed Paradigm, agents learn independently of other agents and do not rely on explicit information exchange.
Centralized Training Paradigm (CTP): The centralized paradigm allows agents to exchange additional information during training, which is then abandoned during tests. Agents receive only the locally observable information and independently determine actions according to their policies during execution.

2.1. Multi-Agent Reinforcement Learning with Distributed Training Paradigm

Regarding works on MARL with DTP, Aissani et al. (2009) ^[14] applied MARL for adaptive scheduling in multi-site companies. The supervisor agent sends requests to inventory agents and resource agents at different sites for a solution in the company’s multi-agent system. The inventory agent asks the resource agent to propose a solution. The resource agent then starts its decision-making algorithm based on the SARSA (state–action–reward–state-action) algorithm using the data system (the resource state, task duration, etc.) and sends back a solution. Martínez et al. (2020) ^[15] proposed a MARL tool for JSPs where machines are regarded as agents. This tool allows the user to either keep the best schedule obtained by a Q-learning algorithm or modify it by fixing some operations to satisfy certain constraints. The tool then optimizes the modified solution, taking into account the user’s preferences and using the possible alternatives. Hameed et al. (2020) ^[16] presented a distributed reinforcement learning approach for JSPs. The innovation of this research is that the authors modeled various relationships within the manufacturing environments (robot manufacturing cells) as graph neural networks (GNN). Zhou et al. (2021) ^[17] proposed a new distributed architecture with multiple artificial intelligence (AI) schedulers for the online scheduling of orders in smart factories. Each machine is taken as a scheduler agent, which collects the scheduling states of all machines as input for training separately and executes the scheduling policy, respectively. Popper et al. (2021) ^[18] proposed a distributed MARL scheduling method for the multi-objective optimization problem of minimizing energy consumption and delivery delay in the production process. The basic algorithm is solved by PPO, which regulates the joint behavior of each agent through a common reward function. The algorithm can schedule any number of operations. Burggräf et al. (2022) ^[19] presented a deep MARL approach with distributed actor-critic architecture to solve dynamic FJSPs. The novelty of this research lies in parameterizing the state and action spaces. Zhang et al. (2022) ^[20] constructed a multi-agent manufacturing system (MAMS) with the capability of online scheduling and policy optimization for the dynamic FJSP. Various machine tools are modeled as agents capable of environmental perception, information sharing, and autonomous decision making.

2.2. Multi-Agent Reinforcement Learning with Centralized Training Paradigm

Wang et al. (2021) ^[21] proposed a flexible and hybrid production scheduling problem and introduced a multi-agent deep reinforcement learning (MADRL) scheduling method called Multi-Agent Deep Deterministic Policy Gradient (MADDPG). Similarly, Wang et al. (2022) ^[22] introduced a modeling method of the decentralized partially Markov decision process (Dec-POMDP) for the resource preemption working environment (PRE), and applied QMIX to solve the PRE scheduling problem, where each job is an agent who selects its action according to current observation. Jing (2022) ^[23] designed a MARL scheduling framework based on GCN, namely a graph-based multi-agent system (GMAS), to solve the FJSP. First, a probabilistic model of the directed acyclic graph of the FJSP is constructed from the product processing network and workshop environment. Then, the author modeled the FJSP as a topological graph prediction process and adjusted the scheduling policy by predicting the connection probabilities between the edges.

In contrast to DTP, CTP shares information among agents through a centralized evaluation function, making learning more stable and leading to fast convergence. Therefore, regarding the solving method for FJSP-DT, MARL integrated with the CTP training paradigm bears the great potential of agility, adaptability, and accuracy, and is the focus of study therewith.

References

Zhang, W.; Dietterich, T.G. A reinforcement learning approach to job-shop scheduling. In Proceedings of the IJCAI, Citeseer, Montreal, QU, Canada, 20–25 August 1995; Volume 95, pp. 1114–1120.
Aydin, M.E.; Öztemel, E. Dynamic job-shop scheduling using reinforcement learning agents. Robot. Auton. Syst. 2000, 33, 169–178.
Waschneck, B.; Reichstaller, A.; Belzner, L.; Altenmüller, T.; Bauernhansl, T.; Knapp, A.; Kyek, A. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP 2018, 72, 1264–1269.
Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput. 2020, 91, 106208.
Lang, S.; Behrendt, F.; Lanzerath, N.; Reggelin, T.; Müller, M. Integration of deep reinforcement learning and discrete-event simulation for real-time scheduling of a flexible job shop production. In Proceedings of the 2020 Winter Simulation Conference (WSC), Orlando, FL, USA, 14–18 December 2020; pp. 3057–3068.
Du, Y.; Li, J.Q.; Chen, X.L.; Duan, P.Y.; Pan, Q.K. Knowledge-based reinforcement learning and estimation of distribution algorithm for flexible job shop scheduling problem. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 1036–1050.
Li, Y.; Gu, W.; Yuan, M.; Tang, Y. Real-time data-driven dynamic scheduling for flexible job shop with insufficient transportation resources using hybrid deep Q network. Robot. Comput.-Integr. Manuf. 2022, 74, 102283.
Gu, Y.; Chen, M.; Wang, L. A self-learning discrete salp swarm algorithm based on deep reinforcement learning for dynamic job shop scheduling problem. Appl. Intell. 2023, 53, 18925–18958.
Wang, L.; Hu, X.; Wang, Y.; Xu, S.; Ma, S.; Yang, K.; Liu, Z.; Wang, W. Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Comput. Netw. 2021, 190, 107969.
Wang, X.; Zhang, L.; Ren, L.; Xie, K.; Wang, K.; Ye, F.; Chen, Z. Brief review on applying reinforcement learning to job shop scheduling problems. J. Syst. Simul. 2021, 33, 2782.
Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 2022, 55, 895–943.
Chandak, Y.; Theocharous, G.; Kostas, J.; Jordan, S.; Thomas, P. Learning action representations for reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 941–950.
Liu, Y.; Li, Z.; Jiang, Z.; He, Y. Prospects for multi-agent collaboration and gaming: Challenge, technology, and application. Front. Inf. Technol. Electron. Eng. 2022, 23, 1002–1009.
Aissani, N.; Trentesaux, D.; Beldjilali, B. Multi-agent reinforcement learning for adaptive scheduling: Application to multi-site company. IFAC Proc. Vol. 2009, 42, 1102–1107.
Martínez Jiménez, Y.; Coto Palacio, J.; Nowé, A. Multi-agent reinforcement learning tool for job shop scheduling problems. In Proceedings of the International Conference on Optimization and Learning, Cadiz, Spain, 17–19 February 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 3–12.
Hameed, M.S.A.; Schwung, A. Reinforcement learning on job shop scheduling problems using graph networks. arXiv 2020, arXiv:2009.03836.
Zhou, T.; Tang, D.; Zhu, H.; Zhang, Z. Multi-agent reinforcement learning for online scheduling in smart factories. Robot. Comput.-Integr. Manuf. 2021, 72, 102202.
Popper, J.; Motsch, W.; David, A.; Petzsche, T.; Ruskowski, M. Utilizing multi-agent deep reinforcement learning for flexible job shop scheduling under sustainable viewpoints. In Proceedings of the 2021 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Mauritius, 7–8 October 2021; pp. 1–6.
Burggräf, P.; Wagner, J.; Saßmannshausen, T.; Ohrndorf, D.; Subramani, K. Multi-agent-based deep reinforcement learning for dynamic flexible job shop scheduling. Procedia CIRP 2022, 112, 57–62.
Zhang, Y.; Zhu, H.; Tang, D.; Zhou, T.; Gui, Y. Dynamic job shop scheduling based on deep reinforcement learning for multi-agent manufacturing systems. Robot. Comput.-Integr. Manuf. 2022, 78, 102412.
Wang, S.; Li, J.; Luo, Y. Smart scheduling for flexible and hybrid production with multi-agent deep reinforcement learning. In Proceedings of the 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 17–19 December 2021; Volume 2, pp. 288–294.
Wang, X.; Zhang, L.; Lin, T.; Zhao, C.; Wang, K.; Chen, Z. Solving job scheduling problems in a resource preemption environment with multi-agent reinforcement learning. Robot. Comput.-Integr. Manuf. 2022, 77, 102324.
Jing, X.; Yao, X.; Liu, M.; Zhou, J. Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling. J. Intell. Manuf. 2022, 1–19.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Shaoming Peng

Gang Xiong

Jing Yang

Zhen Shen

Tariku Sinshaw Tamir

Zhikun Tao

Yunjun Han

Fei-Yue Wang

View Times: 206

Update Date: 08 Jan 2024

Table of Contents

Video Upload Options

Confirm