Conversational Chatbots | Encyclopedia MDPI

Conversational Chatbots: Comparison

Please note this is a comparison between Version 1 by Chien-Chang Lin and Version 2 by Sirius Huang.

A conversational chatbot or dialogue system is a computer program designed to simulate conversation with human users, especially over the Internet. These chatbots can be integrated into messaging apps, mobile apps, or websites, and are designed to engage in natural language conversations with users.

conversational chatbot
chatbot
dialogue system
dialogue response
dialogue strategies
dialogue generation
machine learning
conversational agents

1. Overview of Conversational Chatbots

To explore the institutions implementing conversational chatbots, Table 1 shows the number of papers published by country. Most of the institutions were found in the United States, with the others being in the UK, China, Germany, etc. Conversational chatbots can generally be divided into closed-domain and open-domain chatbots. Of the 32 screened papers, 9 and 23 papers were related to conversational chatbots with closed domains and open domains, respectively. Typically, closed-domain chatbots are primarily built to support specific conversational goals. A typical example of a closed-domain chatbot is found in a customer contact center when a customer requests support from customer service ^[1][5].

Table 1.

Country distribution of surveyed papers.

Region	Country	Count	Reference

Since chatbots are mostly built using objectives, Table 2 shows three categories of objectives retrieved from the 32 screened papers, including (1) technical improvement, (2) context maintenance, (3) business support, (4) educational objectives, and (5) other specific objectives. There might be multiple objectives achieved by one paper. The objective of performance improvement means providing correct information, as well as being closer to a human-like conversation style. Accurate responses are a practical and basic requirement of chatbots since users may not care about whether the other side is a human or machine as long as users can obtain the information they need. Closed-domain conversational chatbots have achieved quite good performance ^[1][4][28][4,5,30], so accurate responses are primarily the key objective of open-domain chatbot research. The key technology is used to modify the objective function ^[14][21] or to integrate external domain knowledge into the responses ^[3][2].

Table 2.

Objectives of conversational chatbot research.

Category	Item Description	Count	References
North America	USA	15
North America	Technical Improvement	Response accuracy	^[	²	7^]	^[2]^[3][8]^[4][9]^[5][10]^[6][19]^[7][24]^[8][28]^[9]^[10]^[[1^11][,11^{12][13][14][15][16]}[1,2,4,6,7,9,11,13,14,17,18,20,,13 21,22,23]
,		14	,	24,27,30]	Canada	1	^[17
Integrate domain knowledge into responses		4	^][16]
^[		²	^]^[3]^][1^[8,2^][30][31,11,15,31]	MovieDic or Cornell Movie Dialog Corpus	3	^[17]^[24]^[25][16,27,28]
Wikipedia and Book Corpus	5	^[2]^[3]^[^[^30][1^4][8],2,4,11,15]
Television Series Transcripts	2	^[11]^[29][17,19]

^[20][6,8,9], the methods focus more on how to conduct the experiment rather than addressing the system architecture of how the chatbots are built.

Table 4 presents the datasets used to support conversational chatbot research. The way researchers leverage datasets depends on the purpose of the chatbot. Obviously, if the chatbot is built for general conversation purposes, the popular public internet dataset is the best choice. However, if the chatbot is a closed-domain one and made for a unique purpose only, researchers may collect and use their own dataset.

Many of the datasets are Twitter-related, including conversation data, FireHouse, Persona, dialogue, and posts, since social media undoubtedly include tons of valuable daily social information. The second most used dataset is the subtitles and scripts of movies or dramas. Although they are written by screenwriters, they still have a high reference value. In addition, social media usually contain information focused on the latest popular topics, while movies and dramas can cover historical stories or past classics.

Other popular datasets are Wikipedia, Book Corpus, and Amazon review and QA; as mentioned in the above paragraph, some research tried to include domain knowledge in the response ^[27][30][15,29]. In this case, these informative sites will greatly benefit it. Moreover, two special datasets, CamRest676 and KVRET, were collected from the Amazon Mechanical Turk platform. After, there were also a few types of data used only in a few studies, including SEMEVAL15, Foursquare, CoQA, and other specific application datasets (ATIS, NJ System, TOOT, ELVIS, ALE OXE, Contact Center).

As mentioned earlier, an application-specific dataset is normally used by closed-domain chatbots because these kinds of chatbots have a very clear goal when a conversation is happening. A good example is SuperAgent [4] and a customer contact system ^[1][5]; they leverage pre-defined or historic system-generated data to train the system since it is unique to the chatbot. Another example is when a chatbot is built as a learning companion to improve users’ reading comprehension skills ^[7][20][8,9]; the dataset used to train the system includes the books the chatbot will use, which means that the conversation scope is fixed.

In summary, there was an obvious trend that matched artificial intelligence technology’s evolution in the last decade. The overall implementation started with specific closed-domain chatbot systems’ development and then changed to adopt modern machine learning models, including Seq2Seq RNN, LSTM, and BERT. Along with this trend, researchers are also trying to resolve the issue of providing more reasonable responses when the legacy RNN model mainly uses the greatest likelihood objective function to generate a response. This kind of discussion and enhancement is related to the main weak point of the Seq2Seq RNN model when using the greatest likelihood objective function, which causes a typical response such as ‘I don’t know’ when the dialogue system does not know how to respond. Some researchers enhanced the generative model ^[18][12] to enrich the output of the conversations. There are some chatbot systems that leverage reinforcement learning to maintain dialogue context consistency. The key point is to identify a strategy to find the dialogue context and engage with the users during the conversation. While the conversation context might change during a conversation, some researchers emphasize maintaining a consistent dialogue context.

4. Outcomes and Challenges of a Conversational Chatbot

In response to "What are the outcomes and challenges of conversational chatbots?", in terms of conversational chatbots, Table 5 shows the outcomes corresponding to different objectives, while Table 6 shows the challenges of building chatbots. The corresponding outcomes can be divided into three categories: (1) technical improvement, which focuses on output optimization; (2) context maintenance, which focuses on algorithm or model optimization; (3) educational support; (4) business support; and (5) other. There were a total of 15 papers focused on optimizing the output of chatbots. In terms of technical improvement as the construction objective, the optimization direction includes adding context ^[2][9][28][1,13,30], external knowledge ^[3][2] and skills content ^[26][30][3,15], in order to make the responses of chatbots more accurate. For the other objectives, optimization should be researched in a more humane and emotional direction ^[18][29][12,19].

Table 5.

Outcomes of conversational chatbots.

Category	Outcome	#	References

Challenge	Count	References
Technical Improvement (Output Optimization)	Include Context	6
	Best (or Better) Models Selection and Modification	8	^[2]^[11]^[17]^[23]^[24^[²⁹^][1,16^][,17^27][,19²⁸,26^],27,29,30]	^[2]^[3]^[8]^[9]^[26]^[28][1,2,3,11,13,30]
	Skills	4	^[3]^[14]^[26]^[30][2,3
	More Efficient Pre-work for System Training	4	,	^[15,21]
¹	^]	^[^4]^[12]^[13][4,5,18,20]	APAC	China	2	External Knowledge	3	^[
More Efficient Information Extraction and Classification	3	^3]		^[18]^[19][12,24]
Produce content-based responses	3	^[2]		^[13]^[3]^[8]^[26][1,2,3,^[3011]	^]^[^31][2,15,^[31]
¹⁴	^]	^[²³^][20,21,26]		Taiwan	Personality and Emotion2	^[20]
Model objective functional enhancement	2	2		^[18
Good Diversity and Quantity of Training Data	^[	¹		6^]]^[29][12,	^[19]
⁴	^]	^[^21][8,10]
^[	¹⁴	^][5,21]	^[^6]^[18]^[20]^[26]^[32][3,	Japan	1	^[22
Context Maintenance	Identify dialogue context (or opinion)	^][25]
	6	^[	Context Maintenance (Algorithm or Model Optimization)	^2]^[	Reinforcement Learning⁹	4	^[2]^][10]^[10]^[15]^[23]^[25][1,13,14,22,^[26,28]	¹²^]^[^25]	Singapore	1	^[
	[	1		,	14,18,28	Optimize dialogue strategies (policy)	^23]	5[26]
	]	^[		²	SEMEVAL15^]	2	^[4^[
BERT	4	^]		^[8]^12][28]^[13]^[18][1[4^[,12,30]
^[	¹⁹	^]^[^28]^[30][11²⁷,,15 18,,30 20^],29]		,24]	Indonesia	Amazon Reviews and Amazon QA	2	^[3]^[28][2
GPT-3	1	,		^[30]
²	^]	[1]		Amazon Mechanical Turk Platform	2	^[23]^[26][3,26]
Foursquare	1	^[3][2]
4	,	7,8,12,32]
More Dynamic Profile/Strategy Adjustment	3	^[5]^[7]^[21][6,9,10]	CoQA
Defining Best Objective Function Formulation	1	^[13][20]	1
Enhance word embedding (syntactic to semantic)	^[	²⁴	1^][27]
^[	¹⁵	^][22]	India	1	^[25][28]
Maintain users’ engagement or connection	1	^[7][9]	Europe	UK	3	^[
Business Support	Support entertainment	1		^[25]
Business Support	Better Feature Selection	1		^[12][18]	LSTM[28]
1
Humanization and Moral Enhancement	1	^[29][19]
^[	¹⁸	^][12]		²⁶	2^]	^[3]^[	Self-Attention	2	^[8]^[19][11,24]^27][28][3,29,30]
^[	²⁶	^]	[2,3]	Germany	1	^[29][19]
Increase potential business revenue	2	^[1]^[4][4,5]	Australia	1	Improve comprehension skills	^[30][15]
3	^[	⁷	^]^[20]^[21][8,9,10]	France	1	[31]
Enhance teaching efficacy	2	^[	Greece	1	[32]
Educational Support	Poland	1	^[1][5]

The other main closed-domain chatbot application is educational support, since this kind of chatbot has very clear conversation goals in terms of learning perspectives. In a closed-domain chatbot, the users care more about the information accuracy rather than if the response is more like a human. The open-domain chatbot is more focused on naturally chatting with users, so the goal is more to keep the consistency of the conversational context and the users’ engagement. This is why deep learning techniques such as reinforcement learning have been widely used to build conversational chatbots in recent years ^[7][9].

2. Objectives of Conversational Chatbots

⁵
^]
^[
⁶
^]
[
6
,
7
]
Support collaborative learning
1	[	32]
Other specific Objectives	Integrate emotion (human feeling) into responses	2	^[18]^[29][12,19]
Other specific Objectives	Be cognitive, user-friendly, interactive, and empathetic	2	^[17]^[30][15,16]

To respond with correct and appropriate answers, chatbots have to identify users’ statements well; the objective of context maintenance, which aims to find and maintain a good dialogue strategy, is another major objective that chatbots, especially open-domain chatbots, need to achieve. Of the surveyed papers, it was found that most conversational chatbots in recent years have adopted the following four directions to find and maintain the dialogue context through a good strategy and policy, including dialogue context identification, dialogue strategy optimization, word embedding enhancement, and user engagement or connection maintenance. Some researchers focused on identifying the dialogue context ^[15][25][22,28], and some focused on optimizing the strategy to keep this context ^[12][18][12,18]. There is always a theme around a conversation to make the conversation meaningful, and this is what this type of chatbot aims to do. For open-domain chatbots, unfortunately, the dialogue may change during the conversation, and some of the papers focus on not only identifying the context, but also detecting the change in the context. Normally, dialogue can use a probability model such as MDP ^[13][27][20,29] or deep learning ^[9][10][15][13,14,22] technologies to predict the direction of the dialogue. The purpose of this is to continuously engage with users’ interests so that the conversation can be continued. Moreover, many authors are working on optimizing dialogue strategies and policies ^[12][13][18][12,18,20] to make the responses less conservative, thus increasing the variability in the dialogue.

In educational objective areas, the common practice of using a chatbot is to simulate a learning companion during the learning phase ^[7][20][8,9]. One of the main reasons for this goal is that it is almost impossible for a teacher to take care of every single student’s learning progress based on their proficiency. Alternatively, the researcher can leverage a chatbot to simulate a student so that a preservice teacher can be trained ^[5][6].

In recent years, in the field of artificial intelligence, emotional topics have gradually become the focus of future research. For this reason, several studies have since begun to explore emotional aspects in conversational chatbots ^[18][29][12,19], either trying to detect users’ feelings or becoming a cognitive, user-friendly, interactive, and empathetic system. Other specific goals, such as emotion, focus on enabling conversational chatbots to not only serve as problem-solving tools, but also chat with humans like friends, thereby meeting human emotional needs. For example, through communication with chatbots, researchers hope not only to eliminate the loneliness of the elderly living alone, but also to stimulate their brains.

In summary, in terms of the objectives of building conversational chatbots, there is a lot of focus on improving the response accuracy of the dialogue system in order to provide more human-like conversations. In terms of open-domain dialogue systems, there are also papers focusing on how to identify the key dialogue context and maintain it so that users’ interests can be kept.

3. Methods and Datasets of Conversational Chatbots

To explore how to build chatbots to reply to the question: "What are the methods and datasets used to build a conversational chatbot?", Table 3 and Table 4 show the methods and datasets used in the surveyed papers for building conversational chatbots, respectively. In terms of the methods applied herein, reinforcement learning was the most frequently used method ^{[2][9][10][12][18][25]}[1,12,13,14,18,28]; this is one of three basic machine learning paradigms alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in that it does not need to be presented with labeled input/output pairs, nor does it need to explicitly correct suboptimal actions. Instead, the focus is on finding a balance between exploring uncharted territory and developing current knowledge. One of the most successful cases recently published is ChatGPT ^[2][1], which has gained a lot of attention, and is basically trained by reinforcement learning. This is why reinforcement learning is commonly used to determine the conversation context, as well as keep the dialogue consistent. In the beginning, the reinforcement learning model was used to teach the machine to play computer games, where the researcher learned a lot about the policy adjustment needed to reach the goal.

Table 3.

Methods of conversational chatbot research.

Category	Type of Method	Count
Machine Learning Training Techniques	Reinforcement Learning	7
	10
	,
32
]

Table 4.

Datasets used by conversational chatbots.

Dataset	Count	References
Twitter-Related Dataset	4	^[14]^[15]^[22]^[29][19,21,22,25]	^[2]^[9]12^[10][12]^[13],13^[,14¹⁸,18^][,20^25][1,,28]
Supervised Learning	1	^[13][20]
OpenSubtitles Dataset	3	^[9]^[14]^[28][13,21,30	Transfer Learning	1	^[27][29]
Machine Learning Models	LSTM	4	^[14]^[18]^[24]^[29][12,19,21,27]
	BERT	5	^[8]^[19]^[28]^[30]^[31][11,15,24,30,31]
	]	RNN	3	^[3]^[15]^[17][2,16,22]
	ELMO	2	^[19]^[28][24,30]
	MDP	2	^[13]^[27][20,29]
	GPT-3	1	^[2][1]
	Seq2Seq RNN	1	^[22][25
Transfer Learning	]
1	[	31]	Others	Specific Systems	6	Specific Application Historic Dataset	6	^[1]^[10]12^[12][13],14^[18],18^[31][5,,20,31]^{[4][11][16][17][26][30]}[3,4,15,16,17,
NBT (Neural Belief Tracker)	1	^[26]	Others	23]
Experiment based	6	^[5]^[6]^[7]^[20]^[21]^[32][6,7,8,9,
Others (e.g., course materials)	11	^[5]^[6]^[7]^[11]^[16]^[19]^[20]^[21]^[27]^[28]^[32][6,7,8,9,10,17,23,24,29,30,32]

The second frequently used method is long short-term memory (LSTM), which improves the memory design of the original RNN by controlling four units (Input Gate, Output Gate, Memory Cell, and Forget Gate) to successfully process and predict significant events for intervals and time series delays. This is where the research started to aim to ensure that the dialogue system remembers the conversation context and that the response is consistent with earlier conversation goals ^{[14][18][24][29]}[12,19,21,27]. The main reason why much research focuses on this area is simply that users will lose interest and become disengaged if the dialogue system is responding to something inconsistent with the original goal or context. Although LSTM is enhanced compared to RNN, the selected papers spanned approximately 20 years, and some older or special-purpose research studies still chose to use the original RNN or enhanced RNN as their method ^[3][15][2,22].

Several methods were used only in one or a few cases. Bidirectional Encoder Representations from Transformers (BERT), proposed by Google in 2018 as a pre-training technology for natural language processing (NLP), has become a ubiquitous baseline in NLP experiments in just over a year since its publication. The overall performance of BERT in NLP is outstanding in many areas, such as question answering and next-sentence prediction, which makes it popular for use in the dialogue system. It offers a pretty good baseline for the researchers to start with and then can be fine-tuned by specific domain knowledge ^[19][28][30][15,24,30]. Other methods adopted in these papers include ELMO, supervised learning, transfer learning, Seq2Seq, MDP, HRED, GAN, UNILM, GPT, Dialogflow, and NBT; most of these methods are also common algorithms in natural language processing. For some closed-domain chatbots, especially when used by educational supports ^[5][7]

[
3
]
Educational Support

	English Skills Improvement
3
^[
⁷	^]	^[	²⁰^]^[21][8,9,10]
Teaching Efficacy Improvement	2	^[5]^[6][6,7]
Learning Result Improvement	1	[32]
Business Support	Optimized Input (Noise Removal)	1	^[1][5]
	Dedicated Model for E-commerce	1	[4]
	Entertainment and Fun Support	2	^[3]^[26][2,3]

Table 6.

Challenges of conversational chatbots.

The outcome of the algorithm or model optimization focuses on optimizing the algorithm techniques applied in chatbots, such as reinforcement learning, self-attention, transfer learning, or models such as LSTM, BERT, GPT-3, and NBT. The outcome of this portion is more focused on achieving the objectives of context maintenance in conversational chatbots. To improve reinforcement learning to find and maintain dialogue context, studies ^{[2][10][12][25]}[1,14,18,28] have focused on system architecture enhancement or the machine learning model’s re-design; the ultimate objective is to prove that the enhanced dialogue system model will generate a better and human-like response. The recently published ChatGPT ^[2][1] is a good example of leveraging reinforcement learning, which has received lots of attention from researchers.

Other outcomes from the surveyed papers include educational support improvement, optimized inputs, and a dedicated model for e-commerce. Leveraging a chatbot as a learning companion ^[7][20][8,9] improves students’ reading skills and maintains the students’ engagement level continuously. When simulating the chatbot as a student to train preservice teachers in teaching school violence topics ^[5][6] or mathematics ^[6][7], only some areas were improved. The work in ^[1][5] focused on the input part of the model; the researchers tried to minimize the input noise through various noise removal algorithms to maximize the usability of the information input into the model. The case involved a customer contact center ^[1][5] removing unnecessary information so that the dialogue system could be trained with meaningful information. Another special article created a dedicated model for e-commerce [4].

In response to the challenges faced by chatbots, Table 6 shows the challenges reported in research on conversational chatbots. Notably, up to seven papers faced challenges in model selection and modification. A typical challenge was that the researcher was not satisfied with the output, and they were looking for a better machine learning model or further enhancement ^{[2][13][24][26]}[1,3,20,27]. From immature natural language processing technology used in the early stage of development, which limited the resources that researchers could rely on, to its relative maturity in recent years, the selection of algorithms has always been the most critical and challenging part.

How to select the model that best suits research with numerous algorithms for modification and optimization, or even combining multiple algorithms for design and training to obtain the highest accuracy, is a big problem. Luckily this challenge did not last for too long, as machine learning and natural language process technology have experienced a big breakthrough in the last decade. It has almost become a standard configuration to choose the most powerful NLP model along with reinforcement training and a better model such as BERT ^{[19][28][30][31]}[15,24,30,31] or GPT-3 ^[2][1]; this provides a baseline of natural conversational response, and then adds any application or domain-specific knowledge.

In addition to model selection, inefficient pre-work is definitely another challenge: the data collected in reality must not be very neat, and researchers have to put in a lot of effort to make it trainable. A typical example is an application-specific dialogue system [4] or contact center ^[1][5], in which the training dataset might not be efficient for a machine learning model to mature enough to respond to a user’s request. Although leveraging public datasets can make the system respond naturally, the real purpose of the closed-domain dialogue system is to provide accurate service and information. A typical method to conquer this challenge is to use data augmentation, where one can translate the raw data to another language and translate them back to the language we need. This might help enrich the dataset in some way, but people still believe that the dialogue system will become more experienced after increasingly real conversation data are fed in.

When using chatbots to provide educational support during the learning cycle, one of the common challenges is making the chatbot dynamically adjust the difficulty level by itself during a conversation. When a chatbot is used as a learning companion ^[7][20][8,9], enabling the chatbot to dynamically detect the learner’s progress and adjust the profile to continuously push the learner to the next level is a potential area for enhancement. On the other hand, a similar challenge was also addressed when using a chatbot as a simulated student ^[5][6]; researchers also expect to improve the chatbot’s profile dynamically.

After selecting an algorithm and finishing all pre-work, the next challenge to be conquered is the extraction and classification of useful data. This was also the third most commonly encountered challenge among the selected papers. As mentioned in the previous session, researchers tried to remove noise through various algorithms to prevent unnecessary data from being input into the model and cause interference. When most of the noise is filtered out, data classification is also key. This part of the challenge may also be related to information slotting issues ^[23][26], where the main purpose of the dialogue system is to extract key topics and feedback to the system for conversation response generation.

These challenges are likely to be important factors when resolving the difficulties of conversational chatbots. As a consequence, there are still some other lesser challenges listed in the table below, such as inefficient pre-work, a lack of diversity and quantity of training data, objective function formulation, feature selection, and enhancement in humanization and morality. Some papers also mentioned multi-lingual support ^[19][24], which might be a potentially interesting area. Although the translation system is quite mature, when introducing the Seq2Seq model, there might be some additional factors that need to be considered, such as cultural differences, when expressing ideas in conversation.