Conversational Chatbots: Comparison
Please note this is a comparison between Version 2 by Sirius Huang and Version 1 by Chien-Chang Lin.

A conversational chatbot or dialogue system is a computer program designed to simulate conversation with human users, especially over the Internet. These chatbots can be integrated into messaging apps, mobile apps, or websites, and are designed to engage in natural language conversations with users.

  • conversational chatbot
  • chatbot
  • dialogue system
  • dialogue response
  • dialogue strategies
  • dialogue generation
  • machine learning
  • conversational agents

1. Overview of Conversational Chatbots

To explore the institutions implementing conversational chatbots, Table 1 shows the number of papers published by country. Most of the institutions were found in the United States, with the others being in the UK, China, Germany, etc. Conversational chatbots can generally be divided into closed-domain and open-domain chatbots. Of the 32 screened papers, 9 and 23 papers were related to conversational chatbots with closed domains and open domains, respectively. Typically, closed-domain chatbots are primarily built to support specific conversational goals. A typical example of a closed-domain chatbot is found in a customer contact center when a customer requests support from customer service [5][1].
Table 1.
Country distribution of surveyed papers.
Since chatbots are mostly built using objectives, Table 2 shows three categories of objectives retrieved from the 32 screened papers, including (1) technical improvement, (2) context maintenance, (3) business support, (4) educational objectives, and (5) other specific objectives. There might be multiple objectives achieved by one paper. The objective of performance improvement means providing correct information, as well as being closer to a human-like conversation style. Accurate responses are a practical and basic requirement of chatbots since users may not care about whether the other side is a human or machine as long as users can obtain the information they need. Closed-domain conversational chatbots have achieved quite good performance [4[1][4][28],5,30], so accurate responses are primarily the key objective of open-domain chatbot research. The key technology is used to modify the objective function [21][14] or to integrate external domain knowledge into the responses [2][3].
Table 2.
Objectives of conversational chatbot research.
To respond with correct and appropriate answers, chatbots have to identify users’ statements well; the objective of context maintenance, which aims to find and maintain a good dialogue strategy, is another major objective that chatbots, especially open-domain chatbots, need to achieve. Of the surveyed papers, it was found that most conversational chatbots in recent years have adopted the following four directions to find and maintain the dialogue context through a good strategy and policy, including dialogue context identification, dialogue strategy optimization, word embedding enhancement, and user engagement or connection maintenance. Some researchers focused on identifying the dialogue context [22[15][25],28], and some focused on optimizing the strategy to keep this context [12,18][12][18]. There is always a theme around a conversation to make the conversation meaningful, and this is what this type of chatbot aims to do. For open-domain chatbots, unfortunately, the dialogue may change during the conversation, and some of the papers focus on not only identifying the context, but also detecting the change in the context. Normally, dialogue can use a probability model such as MDP [20,29][13][27] or deep learning [13,14,22][9][10][15] technologies to predict the direction of the dialogue. The purpose of this is to continuously engage with users’ interests so that the conversation can be continued. Moreover, many authors are working on optimizing dialogue strategies and policies [12,18,20][12][13][18] to make the responses less conservative, thus increasing the variability in the dialogue.
In educational objective areas, the common practice of using a chatbot is to simulate a learning companion during the learning phase [8,9][7][20]. One of the main reasons for this goal is that it is almost impossible for a teacher to take care of every single student’s learning progress based on their proficiency. Alternatively, the researcher can leverage a chatbot to simulate a student so that a preservice teacher can be trained [6][5].
In recent years, in the field of artificial intelligence, emotional topics have gradually become the focus of future research. For this reason, several studies have since begun to explore emotional aspects in conversational chatbots [12[18][29],19], either trying to detect users’ feelings or becoming a cognitive, user-friendly, interactive, and empathetic system. Other specific goals, such as emotion, focus on enabling conversational chatbots to not only serve as problem-solving tools, but also chat with humans like friends, thereby meeting human emotional needs. For example, through communication with chatbots, researchers hope not only to eliminate the loneliness of the elderly living alone, but also to stimulate their brains.
In summary, in terms of the objectives of building conversational chatbots, there is a lot of focus on improving the response accuracy of the dialogue system in order to provide more human-like conversations. In terms of open-domain dialogue systems, there are also papers focusing on how to identify the key dialogue context and maintain it so that users’ interests can be kept.

3. Methods and Datasets of Conversational Chatbots

To explore how to build chatbots to reply to the question: "What are the methods and datasets used to build a conversational chatbot?", Table 3 and Table 4 show the methods and datasets used in the surveyed papers for building conversational chatbots, respectively. In terms of the methods applied herein, reinforcement learning was the most frequently used method [1,12,13,14,18,28][2][9][10][12][18][25]; this is one of three basic machine learning paradigms alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in that it does not need to be presented with labeled input/output pairs, nor does it need to explicitly correct suboptimal actions. Instead, the focus is on finding a balance between exploring uncharted territory and developing current knowledge. One of the most successful cases recently published is ChatGPT [1][2], which has gained a lot of attention, and is basically trained by reinforcement learning. This is why reinforcement learning is commonly used to determine the conversation context, as well as keep the dialogue consistent. In the beginning, the reinforcement learning model was used to teach the machine to play computer games, where the researcher learned a lot about the policy adjustment needed to reach the goal.
Table 3.
Methods of conversational chatbot research.
Table 4.
Datasets used by conversational chatbots.
][7][20],9], the methods focus more on how to conduct the experiment rather than addressing the system architecture of how the chatbots are built.
Table 4 presents the datasets used to support conversational chatbot research. The way researchers leverage datasets depends on the purpose of the chatbot. Obviously, if the chatbot is built for general conversation purposes, the popular public internet dataset is the best choice. However, if the chatbot is a closed-domain one and made for a unique purpose only, researchers may collect and use their own dataset.
Many of the datasets are Twitter-related, including conversation data, FireHouse, Persona, dialogue, and posts, since social media undoubtedly include tons of valuable daily social information. The second most used dataset is the subtitles and scripts of movies or dramas. Although they are written by screenwriters, they still have a high reference value. In addition, social media usually contain information focused on the latest popular topics, while movies and dramas can cover historical stories or past classics.
Other popular datasets are Wikipedia, Book Corpus, and Amazon review and QA; as mentioned in the above paragraph, some research tried to include domain knowledge in the response [15,29][27][30]. In this case, these informative sites will greatly benefit it. Moreover, two special datasets, CamRest676 and KVRET, were collected from the Amazon Mechanical Turk platform. After, there were also a few types of data used only in a few studies, including SEMEVAL15, Foursquare, CoQA, and other specific application datasets (ATIS, NJ System, TOOT, ELVIS, ALE OXE, Contact Center).
As mentioned earlier, an application-specific dataset is normally used by closed-domain chatbots because these kinds of chatbots have a very clear goal when a conversation is happening. A good example is SuperAgent [4] and a customer contact system [5][1]; they leverage pre-defined or historic system-generated data to train the system since it is unique to the chatbot. Another example is when a chatbot is built as a learning companion to improve users’ reading comprehension skills [8,9][7][20]; the dataset used to train the system includes the books the chatbot will use, which means that the conversation scope is fixed.
In summary, there was an obvious trend that matched artificial intelligence technology’s evolution in the last decade. The overall implementation started with specific closed-domain chatbot systems’ development and then changed to adopt modern machine learning models, including Seq2Seq RNN, LSTM, and BERT. Along with this trend, researchers are also trying to resolve the issue of providing more reasonable responses when the legacy RNN model mainly uses the greatest likelihood objective function to generate a response. This kind of discussion and enhancement is related to the main weak point of the Seq2Seq RNN model when using the greatest likelihood objective function, which causes a typical response such as ‘I don’t know’ when the dialogue system does not know how to respond. Some researchers enhanced the generative model [12][18] to enrich the output of the conversations. There are some chatbot systems that leverage reinforcement learning to maintain dialogue context consistency. The key point is to identify a strategy to find the dialogue context and engage with the users during the conversation. While the conversation context might change during a conversation, some researchers emphasize maintaining a consistent dialogue context.

4. Outcomes and Challenges of a Conversational Chatbot

In response to "What are the outcomes and challenges of conversational chatbots?", in terms of conversational chatbots, Table 5 shows the outcomes corresponding to different objectives, while Table 6 shows the challenges of building chatbots. The corresponding outcomes can be divided into three categories: (1) technical improvement, which focuses on output optimization; (2) context maintenance, which focuses on algorithm or model optimization; (3) educational support; (4) business support; and (5) other. There were a total of 15 papers focused on optimizing the output of chatbots. In terms of technical improvement as the construction objective, the optimization direction includes adding context [1[2][9][28],13,30], external knowledge [2][3] and skills content [3[26][30],15], in order to make the responses of chatbots more accurate. For the other objectives, optimization should be researched in a more humane and emotional direction [12,19][18][29].
Table 5.
Outcomes of conversational chatbots.
The second frequently used method is long short-term memory (LSTM), which improves the memory design of the original RNN by controlling four units (Input Gate, Output Gate, Memory Cell, and Forget Gate) to successfully process and predict significant events for intervals and time series delays. This is where the research started to aim to ensure that the dialogue system remembers the conversation context and that the response is consistent with earlier conversation goals [12,19,21,27][14][18][24][29]. The main reason why much research focuses on this area is simply that users will lose interest and become disengaged if the dialogue system is responding to something inconsistent with the original goal or context. Although LSTM is enhanced compared to RNN, the selected papers spanned approximately 20 years, and some older or special-purpose research studies still chose to use the original RNN or enhanced RNN as their method [2,22][3][15].
Several methods were used only in one or a few cases. Bidirectional Encoder Representations from Transformers (BERT), proposed by Google in 2018 as a pre-training technology for natural language processing (NLP), has become a ubiquitous baseline in NLP experiments in just over a year since its publication. The overall performance of BERT in NLP is outstanding in many areas, such as question answering and next-sentence prediction, which makes it popular for use in the dialogue system. It offers a pretty good baseline for the researchers to start with and then can be fine-tuned by specific domain knowledge [15,24,30][19][28][30]. Other methods adopted in these papers include ELMO, supervised learning, transfer learning, Seq2Seq, MDP, HRED, GAN, UNILM, GPT, Dialogflow, and NBT; most of these methods are also common algorithms in natural language processing. For some closed-domain chatbots, especially when used by educational supports [6,8[5
The other main closed-domain chatbot application is educational support, since this kind of chatbot has very clear conversation goals in terms of learning perspectives. In a closed-domain chatbot, the users care more about the information accuracy rather than if the response is more like a human. The open-domain chatbot is more focused on naturally chatting with users, so the goal is more to keep the consistency of the conversational context and the users’ engagement. This is why deep learning techniques such as reinforcement learning have been widely used to build conversational chatbots in recent years [9][7].

2. Objectives of Conversational Chatbots

Table 6.
Challenges of conversational chatbots.
The outcome of the algorithm or model optimization focuses on optimizing the algorithm techniques applied in chatbots, such as reinforcement learning, self-attention, transfer learning, or models such as LSTM, BERT, GPT-3, and NBT. The outcome of this portion is more focused on achieving the objectives of context maintenance in conversational chatbots. To improve reinforcement learning to find and maintain dialogue context, studies [1,14,18,28][2][10][12][25] have focused on system architecture enhancement or the machine learning model’s re-design; the ultimate objective is to prove that the enhanced dialogue system model will generate a better and human-like response. The recently published ChatGPT [1][2] is a good example of leveraging reinforcement learning, which has received lots of attention from researchers.
Other outcomes from the surveyed papers include educational support improvement, optimized inputs, and a dedicated model for e-commerce. Leveraging a chatbot as a learning companion [8,9][7][20] improves students’ reading skills and maintains the students’ engagement level continuously. When simulating the chatbot as a student to train preservice teachers in teaching school violence topics [6][5] or mathematics [7][6], only some areas were improved. The work in [5][1] focused on the input part of the model; the researchers tried to minimize the input noise through various noise removal algorithms to maximize the usability of the information input into the model. The case involved a customer contact center [5][1] removing unnecessary information so that the dialogue system could be trained with meaningful information. Another special article created a dedicated model for e-commerce [4].
In response to the challenges faced by chatbots, Table 6 shows the challenges reported in research on conversational chatbots. Notably, up to seven papers faced challenges in model selection and modification. A typical challenge was that the researcher was not satisfied with the output, and they were looking for a better machine learning model or further enhancement [1,3,20,27][2][13][24][26]. From immature natural language processing technology used in the early stage of development, which limited the resources that researchers could rely on, to its relative maturity in recent years, the selection of algorithms has always been the most critical and challenging part.
How to select the model that best suits research with numerous algorithms for modification and optimization, or even combining multiple algorithms for design and training to obtain the highest accuracy, is a big problem. Luckily this challenge did not last for too long, as machine learning and natural language process technology have experienced a big breakthrough in the last decade. It has almost become a standard configuration to choose the most powerful NLP model along with reinforcement training and a better model such as BERT [15,24,30,31][19][28][30][31] or GPT-3 [1][2]; this provides a baseline of natural conversational response, and then adds any application or domain-specific knowledge.
In addition to model selection, inefficient pre-work is definitely another challenge: the data collected in reality must not be very neat, and researchers have to put in a lot of effort to make it trainable. A typical example is an application-specific dialogue system [4] or contact center [5][1], in which the training dataset might not be efficient for a machine learning model to mature enough to respond to a user’s request. Although leveraging public datasets can make the system respond naturally, the real purpose of the closed-domain dialogue system is to provide accurate service and information. A typical method to conquer this challenge is to use data augmentation, where one can translate the raw data to another language and translate them back to the language we need. This might help enrich the dataset in some way, but people still believe that the dialogue system will become more experienced after increasingly real conversation data are fed in.
When using chatbots to provide educational support during the learning cycle, one of the common challenges is making the chatbot dynamically adjust the difficulty level by itself during a conversation. When a chatbot is used as a learning companion [8,9][7][20], enabling the chatbot to dynamically detect the learner’s progress and adjust the profile to continuously push the learner to the next level is a potential area for enhancement. On the other hand, a similar challenge was also addressed when using a chatbot as a simulated student [6][5]; researchers also expect to improve the chatbot’s profile dynamically.
After selecting an algorithm and finishing all pre-work, the next challenge to be conquered is the extraction and classification of useful data. This was also the third most commonly encountered challenge among the selected papers. As mentioned in the previous session, researchers tried to remove noise through various algorithms to prevent unnecessary data from being input into the model and cause interference. When most of the noise is filtered out, data classification is also key. This part of the challenge may also be related to information slotting issues [26][23], where the main purpose of the dialogue system is to extract key topics and feedback to the system for conversation response generation.
These challenges are likely to be important factors when resolving the difficulties of conversational chatbots. As a consequence, there are still some other lesser challenges listed in the table below, such as inefficient pre-work, a lack of diversity and quantity of training data, objective function formulation, feature selection, and enhancement in humanization and morality. Some papers also mentioned multi-lingual support [24][19], which might be a potentially interesting area. Although the translation system is quite mature, when introducing the Seq2Seq model, there might be some additional factors that need to be considered, such as cultural differences, when expressing ideas in conversation.

References

  1. Pawlik, Ł.; Płaza, M.; Deniziak, S.; Boksa, E. A method for improving bot effectiveness by recognising implicit customer intent in contact centre conversations. Speech Commun. 2022, 143, 33–45.
  2. ChatGPT: Optimizing Language Models for Dialogue—OpenAI. Available online: https://openai.com/blog/chatgpt/ (accessed on 28 December 2022).
  3. Ghazvininejad, M.; Brockett, C.; Chang, M.W.; Dolan, B.; Gao, J.; Yih, W.T.; Galley, M. A knowledge-grounded neural conversation model. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32.
  4. Cui, L.; Huang, S.; Wei, F.; Tan, C.; Duan, C.; Zhou, M. Superagent: A customer service chatbot for e-commerce websites. In Proceedings of the ACL 2017, System Demonstrations, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 97–102.
  5. Song, D.; Oh, E.Y.; Hong, H. The Impact of Teaching Simulation Using Student Chatbots with Different Attitudes on Preservice Teachers’ Efficacy. Educ. Technol. Soc. 2022, 25, 46–59.
  6. Lee, D.; Yeo, S. Developing an AI-based chatbot for practicing responsive teaching in mathematics. Comput. Educ. 2022, 191, 104646.
  7. Hollander, J.; Sabatini, J.; Graesser, A. How Item and Learner Characteristics Matter in Intelligent Tutoring Systems Data. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practi-tioners’ and Doctoral Consortium: 23rd International Conference, AIED 2022, Durham, UK, 27–31 July 2022, Proceedings, Part II; Springer International Publishing: Cham, Switzerland, 2022; pp. 520–523.
  8. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186.
  9. Li, J.; Monroe, W.; Ritter, A.; Galley, M.; Gao, J.; Jurafsky, D. Deep reinforcement learning for dialogue generation. arXiv preprint 2016, arXiv:1606.01541.
  10. Singh, S.; Kearns, M.; Litman, D.; Walker, M. Reinforcement learning for spoken dialogue systems. Adv. Neural Inf. Process. Syst. 1999, 12, 956–962.
  11. Zhou, L.; Gao, J.; Li, D.; Shum, H.Y. The design and implementation of xiaoice, an empathetic social chatbot. Comput. Linguist. 2020, 46, 53–93.
  12. Singh, S.; Litman, D.; Kearns, M.; Walker, M. Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. J. Artif. Intell. Res. 2002, 16, 105–133.
  13. Levin, E.; Pieraccini, R.; Eckert, W. A stochastic model of human–machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Process. 2000, 8, 11–23.
  14. Li, J.; Galley, M.; Brockett, C.; Gao, J.; Dolan, B. A diversity-promoting objective function for neural conversation models. arXiv preprint 2015, arXiv:1510.03055.
  15. Sordoni, A.; Galley, M.; Auli, M.; Brockett, C.; Ji, Y.; Mitchell, M.; Dolan, B. A neural network approach to context-sensitive generation of conversational responses. arXiv preprint 2015, arXiv:1506.06714.
  16. Jianfeng, G.; Michel, G.; Lihong, L. Neural approaches to conversational AI. Found. Trends Inf. Retr. 2019, 13, 127–298.
  17. Serban, I.; Sordoni, A.; Bengio, Y.; Courville, A.; Pineau, J. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30.
  18. Sun, X.; Chen, X.; Pei, Z.; Ren, F. Emotional human machine conversation generation based on SeqGAN. In Proceedings of the 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), Beijing, China, 20–22 May 2018; pp. 1–6.
  19. Dong, L.; Yang, N.; Wang, W.; Wei, F.; Liu, X.; Wang, Y.; Hon, H.W. Unified language model pre-training for natural language understanding and generation. Adv. Neural Inf. Process. Syst. 2019, 33, 13063–13075.
  20. Liu, C.C.; Liao, M.G.; Chang, C.H.; Lin, H.M. An analysis of children’interaction with an AI chatbot and its impact on their interest in reading. Comput. Educ. 2022, 189, 104576.
  21. Lin, C.J.; Mubarok, H. Learning analytics for investigating the mind map-guided AI chatbot approach in an EFL flipped speaking classroom. Educ. Technol. Soc. 2021, 24, 16–35.
  22. Sato, S.; Yoshinaga, N.; Toyoda, M.; Kitsuregawa, M. Modeling situations in neural chat bots. In Proceedings of ACL 2017, Student Research Workshop, Vancouver, Canada, 30 July–4 August 2017; pp. 120–127.
  23. Lei, W.; Jin, X.; Kan, M.Y.; Ren, Z.; He, X.; Yin, D. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1, pp. 1437–1447.
  24. Anki, P.; Bustamam, A.; Al-Ash, H.S.; Sarwinda, D. High Accuracy Conversational AI Chatbot Using Deep Recurrent Neural Networks Based on BiLSTM Model. In Proceedings of the 2020 3rd International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 24–25 November 2020; pp. 382–387.
  25. Keerthana, R.R.; Fathima, G.; Florence, L. Evaluating the Performance of Various Deep Reinforcement Learning Algorithms for a Conversational Chatbot. In Proceedings of the 2021 2nd International Conference for Emerging Technology (INCET), Belgaum, India, 21–23 May 2021; pp. 1–8.
  26. Mrkšić, N.; Séaghdha, D.Ó.; Wen, T.H.; Thomson, B.; Young, S. Neural Belief Tracker: Data-Driven Dialogue State Tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 1777–1788.
  27. Gasic, M.; Breslin, C.; Henderson, M.; Kim, D.; Szummer, M.; Thomson, B.; Young, S. POMDP-based dialogue manager adaptation to extended domains. In Proceedings of the SIGDIAL 2013 Conference, Metz, France, 22–24 August 2013; pp. 214–222.
  28. Henderson, M.; Vulić, I.; Gerz, D.; Casanueva, I.; Budzianowski, P.; Coope, S.; Su, P.H. Training neural response selection for task-oriented dialogue systems. arXiv preprint 2019, arXiv:1906.01543.
  29. Li, J.; Galley, M.; Brockett, C.; Spithourakis, G.P.; Gao, J.; Dolan, B. A persona-based neural conversation model. arXiv preprint 2016, arXiv:1603.06155.
  30. Kanodia, N.; Ahmed, K.; Miao, Y. Question Answering Model Based Conversational Chatbot using BERT Model and Google Dialogflow. In Proceedings of the 2021 31st International Telecommunication Networks and Applications Conference (ITNAC), Sydney, Australia, 24–26 November 2021; pp. 19–22.
  31. Syed, Z.H.; Trabelsi, A.; Helbert, E.; Bailleau, V.; Muths, C. Question answering chatbot for troubleshooting queries based on transfer learning. Procedia Comput. Sci. 2021, 192, 941–950.
  32. Tegos, S.; Demetriadis, S.; Karakostas, A. MentorChat: Introducing a configurable conversational agent as a tool for adaptive online collaboration support. In Proceedings of the 2011 15th Panhellenic Conference on Informatics, Kastoria, Greece, 30 September–2 October 2011; pp. 13–17.
More
Video Production Service