Predicting Online Complaining Behavior in the Hospitality Industry: Comparison
Please note this is a comparison between Version 2 by Nora Tang and Version 1 by Raksmey Sann.

Sensitivity analysis revealed that Hotel Size is the most important online complaint attribute, while Service Encounter and Room Space emerged as the second and third most important factors in each of the four decision tree models. The CHAID analysis findings also revealed that guests at higher-star-rating hotels are most likely to leave online complaints about (i) Service Encounter, when staying at large hotels; (ii) Value for Money and Service Encounter, when staying at medium-sized hotels; (iii) Room Space and Service Encounter, when staying at small hotels. Additionally, the guests of lower-star-rating hotels are most likely to write online complaints about Cleanliness, but not Value for Money, Room Space, or Service Encounter, and to stay at small hotels. 

  • online complaining behavior
  • decision trees
  • hotel class
  • online complaining attributes
  • big data analytics
  • data mining algorithms

1. Online Review and Complaining Behavior in the Hospitality Industry

According to a 2011 survey by the Tourism Industry Wire Organization, 60% of U.S. travelers take online suggestions into account when booking a vacation in part. One main reason behind this is that travel websites provide a means for customers to readily discover what other consumers think about hotels and restaurants, as well as other tourism products or services (as cited in [1][2]). Previous studies have argued that compared to positive reviews, customers give more weight to negative reviews in both reputation-building and decision-making tasks [3][4][5][6][7]. Customer complaints behavior is most often considered to be a set of multiple responses emerging as a result of purchase dissatisfaction [8][9]. However, the “locus of causality” or complaint attribution is one of the least-studied topics within customer complaint behavior-related research in the hospitality and tourism industries [10]. Practitioners will benefit from understanding these causes of guests’ complaints in terms of problem-solving, guest satisfaction enhancement, and service quality improvement [11][12].
Furthermore, negative reviews or comments may lead to a negative impact on all aspects of the business [13]. For instance, guests complain about their hotels on issues ranging from poor service delivery to dated or inadequate décor [7]. Fernandes and Fernandes (2018) reported that hotel guests tend to complain more than once. This indicates that chains of complaints do occur in the hospitality industry. An unsatisfied customer often makes a series of complaints. This study aims to investigate how an online review negatively impacts consumer behavior so that hoteliers can improve service quality, referencing a complaint route through electronic word-of-mouth (eWOM). 

2. Big Data

Big data analytics, as a research paradigm, uses a variety of data sources to make inferences and predictions about reality [14]. Textual data or content from the web offers a huge shared cognitive and cultural context, and advanced language processing and machine learning capabilities have been applied to this web data to analyze various domains [15]. Big data analytics is defined as “the extraction of hidden insight about consumer behavior from big data and the exploitation of that insight through advantageous interpretations” [16], p. 897. The immensity of data generated, its relentless rapidity, and diverse richness are all transforming marketing decision-making [16]. The aforementioned dimensions help define big data via three distinctive features: volumevelocityvariety [16][17], and two additional essential characteristics when collecting, analyzing, and extracting insights from big data: veracity and value [16]Volume refers to the quantity of data, velocity describes the speed of data processing, and variety means the type of data [17]. Meanwhile, veracity refers to data quality (e.g., accuracy), and value describes clean, useful data that excludes or eliminates unimportant and irrelevant data [16]. By utilizing big data analytics as research methodology, researchers are able to work backward, starting with data collection, then analyzing it in order to gain insights. Despite the advantages and potential of big data analytics, very few recently published studies apply this approach to the tourism and hospitality industry [17]. This study  will try to fill that gap.

3. Data Mining

Recently, some researchers have been utilizing data mining (DM) procedures in conducting their studies on the tourism and hospitality industry. For instance, [18] Golmohammadi, Jahandideh, and O’Gorman (2012) studied the application of DM, specifically using DT to model tourists’ behavior in the online environment. DM has also been studied in terms of its importance and influence in the hotel marketing field, and how this approach can help companies to reach their potential customers by understanding their behavior [19]. Thus, DM techniques that focus on an analysis of the textual contents from travelers’ reviews/feedback have been used in a number of published papers [19]. With the help of a DM approach, hoteliers can receive invaluable information that enables them to gain better insight regarding customer behavior and to develop effective customer retention strategies [18].
While data retrieved from customer feedback is usually unstructured textual data, most DM approaches deal only with structured data. Retrieved data is often voluminous but of low value and has little direct usefulness in its raw form. It is the hidden information in the data that has value [20][21]. Retrieved data must be reorganized and stored according to clear field structures before DM can be carried out efficiently and accurately [22]. Using different techniques, DM can identify nuggets of information in bodies of data. It extracts information that can be used in areas such as decision support, prediction, forecasts, and estimation. Few researchers have studied new artificial intelligence (AI) algorithms and mining techniques for unstructured data and information, however, resulting in the frequent loss of valuable customer-related information [22]. DM’s advantage lies in combining researcher knowledge (or expertise) of the data with advanced, active analysis techniques in which algorithms identify the underlying relationships and features in the data.
The process of DM generates models from historical data that are later used for predictions, pattern detection, and more. DM techniques offer feasible methods by which to detect causal relationships, specify which variables have significant dependence on the problem of interest, and expand models that forecast the future [23]. DM can be classified into three types of modeling methods: (i) classification, (ii) association, and (iii) segmentation [24]. In some contexts, DM can be termed as knowledge discovery in databases (KDD) since “it generates hidden and interesting patterns, and it also comprises the amalgamation of methodologies from various disciplines, such as statistics, neural networks, database technology, machine learning and information retrieval, etc.” [25], p. 645. This study furthers its aims by applying DT algorithms. The following sub-sections briefly review the main components of the proposed method. 

4. Decision Tree

DT is one of the most popular DM techniques. With the objective of building classification models, DT can predict the value of a target attribute, based on the input attributes [26]. DT constructs a tree structure using three components: internal nodes, branches, and leaves [27]. Each internal node denotes one input variable [26], each branch is set to equal a number of possible values of the input variable [28], and each leaf node is the terminal node that holds a class label [29] or a value of a target attribute [26]. In DT, the influential factors in determining the value of the target attribute are the primary splitters that are connected with the leaf nodes [26]. Due to DT’s many advantages—for instance, it is easy to understand and interpret, needs little data preparation, can handle both numerical and categorical data, performs very well with a large dataset in a short time, and, most importantly, can create excellent visualizations of results and their relationships—DT has become increasingly prevalent in DM [30]. There are many specific DT algorithms; however, the C5.0, C&R tree, QUEST, and CHAID algorithms are the most widely used.

4.1. C5.0 Algorithm

Developed by Quinlan in 1993, C5.0 is one of the most popular DT inducers, based on the ID3 (iterative dichotomiser 3) classification algorithm [31]. The C5.0 model splits the sample based on the field that provides the maximum information gain at each level [24]. The input can be either categorical or continuous, but the output or target field must be categorical. C5.0 is significantly faster, has superior memory performance than other DT algorithms, and can also produce more accurate rules [32]. It also uses a pruning strategy (e.g., pre-pruning and post-pruning methods) in which a branch is pruned to establish a DT, starting from the top level of the tree [30][31].

4.2. CHAID Algorithm

The chi-squared automatic interaction detector (CHAID) is “a powerful technique for partitioning data into more homogeneous groups” [33], p. 125. CHAID is a highly efficient statistical technique for segmentation, or tree growing, developed by Kass in 1980 [34]. CHAID makes predictions in the same way for regression analysis and classification, as well as detecting interactions between variables [30]. CHAID uses multi-level splits [35], which can generate nonbinary trees, meaning that some trees have more than two branches [24]. It works for every type of variable due to its acceptance of both case weights and frequency variables [34]. More importantly, CHAID handles missing values by treating them all as a single valid category.

4.3. QUEST Algorithm

The quick, unbiased, efficient statistical tree algorithm (QUEST) is a relatively new binary tree-growing algorithm for classification and DM [34]. QUEST is similar to the classification and regression trees (C&RT) algorithm [30]; however, it is designed to reduce the processing time required for large C&RT analyses, while also reducing the tendency found in classification tree methods to favor inputs that allow more splits [24]. QUEST deals with field selection and split-point selection separately. The univariate split in QUEST performs unbiased field selections; that is, all predictor fields are equally informative with respect to the target field. QUEST selects any of the predictor fields with equal probability [34]. It produces unmanageable trees, but by applying automatic cost-complexity pruning, it minimizes their size [30]. Input fields in QUEST can be numeric ranges (when continuous), but the target field must be categorical, and all splits are binary [24].

4.4. C&RT Algorithm

The C&RT algorithm splits the tree on a binary level into only two subgroups [35] and generates a DT that allows researchers to predict or classify future observations [24]. The C&RT algorithm was created by Breiman, Friedman, Olshen, and Stone in 1984 [36]. The method uses recursive partitioning: the data is partitioned into two subsets so that the records within each subset are more homogeneous than in the previous subset. Then, each of those two subsets is split again, and the process repeats until the homogeneity criterion is reached or until some other stopping criterion is satisfied (or considered “pure”) [37]. The same predictor field may be used many times at different levels in the tree. The most essential aim of splitting is to determine the right variable associated with the right threshold to maximize the homogeneity of the sample subgroups [30]. C&RT uses surrogate splitting to make the best use of data with missing values [34]. In the C&RT model, target and input fields can be numeric ranges or categorical (nominal, ordinal, or flags) [24]. C&RT allows unequal misclassification costs to be considered in the tree-growing process and allows researchers to specify the prior probability distribution in a classification problem. Applying automatic cost-complexity pruning to a C&RT tree yields a more generalizable tree [30][34].

5. Conclusions

DT techniques can be used to effectively predict the OCB of different hotel-class guests. The CHAID approach is superior to other statistical methods, in that one dependent variable with two or more levels is directly connected to independent variables with two or more levels, forming one tree that explicitly accounts for relationships among variables [38]. Moreover, CHAID offers cumulative statistics [31] and is adept at finding OCAs by taking the best segments of the sample. Importantly, since the CHAID results are presented graphically, they are easier to understand and interpret [38]. The rules generated by such trees reveal the most influential factors affecting OCB by guests from various classes of hotels, thus helping hoteliers to identify the most likely complaint areas and subsequently take the required measures to manage them effectively.

References

  1. Casaló, L.V.; Flavián, C.; Guinalíu, M.; Ekinci, Y. Avoiding the dark side of positive online consumer reviews: Enhancing reviews’ usefulness for high risk-averse travelers. J. Bus. Res. 2015, 68, 1829–1835.
  2. Casaló, L.V.; Flavián, C.; Guinalíu, M.; Ekinci, Y. Do online hotel rating schemes influence booking behaviors? Int J. Hosp. Manag. 2015, 49, 28–36.
  3. Ladhari, R.; Michaud, M. eWOM effects on hotel booking intentions, attitudes, trust, and website perceptions. Int J. Hosp. Manag. 2015, 46, 36–45.
  4. Tsao, W.C.; Hsieh, M.T.; Shih, L.W.; Lin, T.M.Y. Compliance with eWOM: The influence of hotel reviews on booking intention from the perspective of consumer conformity. Int J. Hosp. Manag. 2015, 46, 99–111.
  5. Sparks, B.A.; Browning, V. The impact of online reviews on hotel booking intentions and perception of trust. Tour. Manag. 2011, 32, 1310–1323.
  6. Zhang, Z.; Zhang, Z.; Yang, Y. The power of expert identity: How website-recognized expert reviews influence travelers′ online rating behavior. Tour. Manag. 2016, 55, 15–24.
  7. Dinçer, M.Z.; Alrawadieh, Z. Negative word of mouse in the hotel industry: A content analysis of online reviews on luxury hotels in Jordan. J. Hosp. Mark. Manag. 2017, 26, 785–804.
  8. Yuksel, A.; Kilinc, U.; Yuksel, F. Cross-national analysis of hotel customers’ attitudes toward complaining and their complaining behaviours. Tour. Manag. 2006, 27, 11–24.
  9. Ngai, E.W.T.; Heung, V.C.S.; Wong, Y.H.; Chan, F.K.Y. Consumer complaint behaviour of Asians and non-Asians about hotel services—An empirical analysis. Eur. J. Mark. 2007, 41, 1375–1391.
  10. Koc, E. Service failures and recovery in hospitality and tourism: A review of literature and recommendations for future research. J. Hosp. Mark. Manag. 2019, 28, 513–537.
  11. Arora, S.D.; Chakraborty, A. Intellectual structure of consumer complaining behavior (CCB) research: A bibliometric analysis. J. Bus. Res. 2021, 122, 60–74.
  12. Tosun, P.; Sezgin, S.; Uray, N. Consumer complaining behavior in hospitality management. J. Hosp. Mark. Manag. 2021, 31, 247–264.
  13. Cantallops, A.S.; Salvi, F. New consumer behavior: A review of research on eWOM and hotels. Int J. Hosp. Manag. 2014, 36, 41–51.
  14. Xiang, Z.; Du, Q.; Ma, Y.; Fan, W. A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tour. Manag. 2017, 58, 51–65.
  15. Xiang, Z.; Fesenmaier, D.R.; Werthner, H. Knowledge Creation in Information Technology and Tourism: A Critical Reflection and an Outlook for the Future. J. Travel Res. 2021, 60, 1371–1376.
  16. Erevelles, S.; Fukawa, N.; Swayne, L. Big Data consumer analytics and the transformation of marketing. J. Bus. Res. 2016, 69, 897–904.
  17. Cheng, M.; Jin, X. What do Airbnb users care about? An analysis of online review comments. Int J. Hosp. Manag. 2019, 76, 58–70.
  18. Golmohammadi, A.R.; Jahandideh, B.; O′Gorman, K.D. Booking on-line or not: A decision rule approach. Tour. Manag. Perspect. 2012, 2, 85–88.
  19. Moro, S.; Rita, P.; Coelho, J. Stripping customers′ feedback on hotels through data mining: The case of Las Vegas Strip. Tour. Manag. Perspect. 2017, 23, 41–52.
  20. Khade, A.A. Performing customer behavior analysis using big data analytics. Procedia Comput. Sci. 2016, 79, 986–992.
  21. Christodoulou, E.; Gregoriades, A.; Pampaka, M.; Herodotou, H. Combination of Topic Modelling and Decision Tree Classification for Tourist Destination Marketing; Springer International Publishing: Cham, Switzerland, 2020; pp. 95–108.
  22. Liu, J.W. Using big data database to construct new GFuzzy text mining and decision algorithm for targeting and classifying customers. Comput. Ind. Eng. 2019, 128, 1088–1095.
  23. Nourani, V.; Molajou, A. Application of a hybrid association rules/decision tree model for drought monitoring. Glob. Planet. Change 2017, 159, 37–45.
  24. SPSS. IBM SPSS Modeler 16 User′s Guide; SPSS: Chicago, IL, USA, 2013.
  25. Bhandari, A.; Gupta, A.; Das, D. Improvised Apriori Algorithm Using Frequent Pattern Tree for Real Time Applications in Data Mining. Procedia Comput. Sci. 2015, 46, 644–651.
  26. Taamneh, M. Investigating the role of socio-economic factors in comprehension of traffic signs using decision tree algorithm. J. Saf. Res. 2018, 66, 121–129.
  27. Lee, P.J.; Hu, Y.H.; Lu, K.T. Assessing the helpfulness of online hotel reviews: A classification-based approach. Telemat. Inform. 2018, 35, 436–445.
  28. Lee, W.H.; Cheng, C.C. Less is more: A new insight for measuring service quality of green hotels. Int J. Hosp. Manag. 2018, 68, 32–40.
  29. Lan, T.; Zhang, Y.N.; Jiang, C.H.; Yang, G.B.; Zhao, Z.Y. Automatic identification of Spread F using decision trees. J. Atmos. Sol. Terr. Phys. 2018, 179, 389–395.
  30. Delen, D.; Kuzey, C.; Uyar, A. Measuring firm performance using financial ratios: A decision tree approach. Expert Syst. Appl. 2013, 40, 3970–3983.
  31. Chae, Y.M.; Ho, S.H.; Cho, K.W.; Lee, D.H.; Ji, S.H. Data mining approach to policy analysis in a health insurance domain. Int J. Med. Inform. 2001, 62, 103–111.
  32. Yu, F.; Li, G.; Chen, H.; Guo, Y.; Yuan, Y.; Coulton, B. A VRF charge fault diagnosis method based on expert modification C5.0 decision tree. Int. J. Refrig. 2018, 92, 106–112.
  33. Kass, G.V. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Appl. Stat. 1980, 29, 119–127.
  34. SPSS. IBM SPSS Modeler 18.0 Algorithms Guide; SPSS: Chicago, IL, USA, 2016.
  35. Hung, C. Tree Model: CHAID, C&RT, Boosted Trees & Random; The Data-Shack Limited: London, UK, 2018d; pp. DM0002–DM0003 & DM0005–DM0006.
  36. Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Cart. Classification and Regression Trees; Wadsworth and Brooks/Cole: Monterey, CA, USA, 1984.
  37. SPSS. IBM SPSS Modeler 18.0 Modeling: Nodes; SPSS: Chicago, IL, USA, 2016.
  38. Kim, S.S.; Timothy, D.J.; Hwang, J.S. Understanding Japanese tourists’ shopping preferences using the Decision Tree Analysis method. Tour. Manag. 2011, 32, 544–554.
More
ScholarVision Creations