Topic Classification in the Tourism Field: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

In the past, the process of classifying, deconstructing, and analyzing relevant documents in tourism area required significant time and resources from domain experts. As this body of work continues to grow and diversify, text mining technology can better comprehend and promote the leisure industry, such that the general public becomes willing to understand, recognize, support, and participate in achieving the goals of sustainable development. Text mining can provide valuable knowledge from a large number of unstructured texts. Early text mining techniques were used in file classification 

  • cluster analysis
  • text mining
  • word cloud
  • co-word analysis
  • strategic diagram

1. Introduction

In recent years, smart technology, including artificial intelligence, big data, and the sharing economy, has become an important trend leading to the development of the global smart industry. In particular, big data and artificial intelligence have become dominant in various industries, especially knowledge-intensive ones such as tourism [1]. In the era of big data, firms use artificial intelligence to analyze the huge amounts of messy data they capture to identify useful knowledge that can help them innovate business models and value propositions. By utilizing big data analysis and artificial intelligence, the tourism and catering industry can provide real-time feedback, as well as improved transparency, market segmentation, decision-making, and product and service innovation, among other aspects [2], and thereby increase the value of the industry.
Tourism is generally defined as persons traveling to and staying in places outside their usual environment, for not more than one consecutive year, for leisure, business, or other purposes [3]. The 2030 Agenda for Sustainable Development SDG target 8.9 states the following target to achieve by 2030: “devise and implement policies to promote sustainable tourism that creates jobs and promotes local culture and products” [4]. The connotation of sustainable tourism is frequently enriched: initial attention focused on environmental issues, while more definitions denote the importance of working towards the balanced development of economic, social, and environmental aspects. Most people believe that sustainable tourism emphasizes the connection between tourism activities and society with respect to the long-term coordinated development of the economy, resources, and the environment [5]. Goals are aimed at both economic development and a reduction in the negative impact of tourism activities, which includes continued development of the tourism industry while protecting natural and cultural resources. It is vital to coordinate and balance the relationships between different stakeholders in the process of tourism development [6].
Information technology is part of the lifeblood of the tourism industry [7]. Combining knowledge gathered through statistics and domain experts from the tourism industry can help verify the results of visualization analysis. Information technology can apply an automatic topic classification to natural language processing documents to classify representative documents quickly and objectively; co-word analysis and association rule analysis can then be used to analyze the importance and relevance of specific words. There are four main research aims for this article: (1) carry out the subject classification process of academic articles in the tourism field to assess the consistency and characteristics of the topic classification; (2) assess the characteristics of the subject classification and confirm its consistency; (3) use co-word analysis and strategic diagram to understand the importance and relevance of specific marketing strategy vocabulary; and (4) recognize the research tendencies of distinct topics in tourism field.

2. Topic Classification in the Tourism Industry

In the past, the process of classifying, deconstructing, and analyzing relevant documents in this area required significant time and resources from domain experts. As this body of work continues to grow and diversify, text mining technology can better comprehend and promote the leisure industry, such that the general public becomes willing to understand, recognize, support, and participate in achieving the goals of sustainable development. Text mining can provide valuable knowledge from a large number of unstructured texts. Early text mining techniques were used in file classification [8]. As the various types of text information keep increasing, including e-books, web pages, online news pages, blog articles, images, sounds, and videos, manual capture becomes impossible, and the need for topic models for automatic classification becomes apparent. Regarding the application of text mining in the tourism field, Okumus et al. [9] investigated the catering and tourism industry from 1976 to 2016: they analyzed the evolution of food and gastronomy research and identified emerging research topics, methods, and areas of national or interdisciplinary cooperation. Most of the 462 articles centered on gourmet, quantitative, and practical topics. Sainaghi et al. [10] used a cross-reference network analysis to evaluate the literature on hotel performance published between 1996 and 2015 to identify the most cross-cited papers, authors, and journals. Their sample analysis included 734 papers and demonstrated a spectacular growth of outputs, with the last time period (2011–2015) contributing 56% of output; in total, 1% of the sample accounted for 14% of the cross-references.
The topic model can use text mining algorithms, keyword libraries, and keyword occurrence ratios from a large amount of unstructured text data to define the subjective or objective category of the documents. Topic classification algorithms commonly used in text mining include cluster analysis, logistic regression, boost tree regression, hierarchical K-Means, K-means, latent Dirichlet allocation (LDA), and support vector machine (SVM) [11,12]. The hierarchical K-Means is very commonly applied in tourism management, mainly for classification issues, such as the attribute classification of tourists [13,14,15] and motivation classification of tourists [16,17]. With the hierarchical K-means cluster analysis, Suni and Komppula [16] used 30 motivational statements to classify respondents into one of five groups: controllers, indifferent, nostalgia, comfort seekers, and novelty seekers. Lee and Kim [14] divided the older adult volunteers in an international sporting event into two distinct segments of serious leisure characteristics, while Michèle et al. [15] analyzed the activity profiles of social and leisure activities among older adults and divided the respondents into seven clusters. Finally, Jiao et al. [17] classified cruise ship tourists into four main categories: psychometric tourists, traditional tourists, pioneers, and sightseers.
With the help of the LDA model, Jia [18] randomly selected 100 yoga centers in Shanghai, identified 15 topics with the top 10 words from review comments. Vu et al. [19] utilized topic modeling to perform a travel itinerary analysis using the LDA model to clarify information on itineraries and tourist preferences. The optimal number of topics was decided as 24 using validation perplexity and computation times for different topic numbers with a total of 12,446 daily itineraries; each topic was visualized using the word cloud method and a heat map diagram. Shafqat and Byun [20] applied the LDA model to a travel blog database, extracted the top 150 blogs on tourism in Jeju, Korea, and identified the top 11 topics: location, timing, food, weather, entertainment, environment, accommodations, transportation, expense, services, and rental cars. Sutherland and Kiatkawsin [21] applied the LDA approach to identify 43 topics of interest that drive customer experience and satisfaction within a dataset of 1,086,800 Airbnb reviews; they grouped them into four topics: evaluation, location, unit, and management characteristics. The number of suitable classifications and sound interpretations of the topic categories are important for topic modeling.
Pleumarom [22] studied the tourism industry in the Mekong region and stated that the local government must adopt a cohesive management approach to achieve sustainable tourism in areas where multiculturism and government institutions coexist. Sustainable tourism is an important topic in the tourism and hospitality industries; it can improve organizational performance, help gain competitive advantage, and be used as a commercial marketing topic. There are many research themes and influencing factors of sustainable tourism, such as sense of place, pro-environmental behaviors [23,24,25], and human health [26]. The interpretation ability of the local tourism industry could increase the income of sustainable tourism, and local interpreters could meet customer needs and create local employment, promote economic sustainability, and also act as on-site supervisors of visitors to influence their understanding of local perspectives, social protection, and environmental issues [27,28].

3. Marketing Strategy in the Tourism Industry

Marketing strategy plays a very important role in tourism. The traditional 4P marketing strategy proposed by McCarthy [29], focused on promotion, place, product, and price, has been widely applied in various business fields. Booms and Bitner [30] revised McCarthy’s [29] 4P to 7P, adding people, physical evidence, and process. Kolter [31] revised McCarthy’s [29] 4P to 6P, adding politics and public opinion to provide marketing strategies for a complex and diverse society. Pomering, Noble, and Johnson [32] applied 10 marketing foundations to the marketing model of sustainable tourism: promotion, place, product, price, physical evidence, process, packaging, participation, programming, and partnership while considering economic, environmental, and social issues. Dudensing et al. [33] indicated that the different marketing strategies of stakeholders in the tourism industry led to heterogeneity or conflict in terms of stakeholder expectations. Wray [34] emphasized the interactive nature of sustainable tourism—ensuring that profits remain with local operators and sites. Lozano-Oyola et al. [35] also supported the aforementioned view, advocating that those stakeholders review economic indicators before making sustainable tourism decisions. Different expectations of various stakeholder groups may cause conflicts: destination marketing must take into account the views of numerous stakeholders [36]. Generally, marketing and sustainability can work together through the development of a sustainable tourism marketing model, such as managing the travel route of the tourism industry through an ecological footprint.
Big data plays a catalytic role in the process of determining consumer preferences. By obtaining correct data, meaningful analysis can take place, leading to structural changes in consumer behavior models and marketing strategies [37]. Through appropriately identified disseminations, Samara et al. [38] noted that the benefits of adopting big data and artificial intelligence strategies include increased efficiency, productivity, and profitability for tourism suppliers, combined with an extremely rich and personalized experience for travelers. Katsikari et al. [39] proposed that the rapid expansion of the Internet and social media provides marketers with simple and cost-effective ways and opportunities to reach potential tourists; their study investigated which destination elements are deemed attractive by tourists who use social media.
Through the text mining results of a large number of academic articles, important words related to tourism marketing can be obtained and, at the same time, their relevance and relative importance can be understood. Based on the existing information technology, the tourism industry can expand consumerism-based IT and tourism research in order to participate in a wider dialogue; emphasize the use of technology to achieve a better quality of life, economic prosperity, social well-being, and sustainability; and use open data and shared social knowledge as a basis for tourism experience and innovation [40].

4. Co-Word Analysis and Strategic Diagram

Co-word Analysis is a quantitative technique for scanning document content to denote when and where a defined specific word co-occurs. Through co-word analysis, various co-occurrence relationships in a specific object can be expressed, such as co-quotation, co-author, and co-word characteristics [41]. A strategic diagram is developed from the analysis, which assists in identifying evolutionary trends and relationships between thematic groups [42].
In the application of co-word analysis, Guo et al. [43] surveyed 1138 articles and reviews from 1980 to 2016 and used 52 high-frequency keywords related to company restrictions to investigate the current situation and trends of company restrictions. The central terms were “restrictions”, “learning”, “institutions”, and “behavior”; the results show that “restrictions” had the highest degree of importance. The aforementioned 52 high-frequency keywords could be divided into six categories, and the indicators of company development (such as innovation, supply chain, decision-making, performance, sustainability, and employee behavior) were significantly related to company restrictions. Khasseh et al. [44] used co-word analysis to describe the topic characteristics within two journals, Scientometrics and Journal of Informetrics, from 1978 to 2014; they then divided them into 11 representative topics using hierarchical cluster analysis and utilized a strategy diagram to illustrate the structure, maturity, and cohesion of each topic. Corrales-Garay et al. [45] applied co-word analysis to create a map of the main themes identified in the knowledge areas and determined their importance and relevance.
Leung et al. [46] sampled 406 publications related to social media from 2007 to 2016 across 16 business and hospitality/tourism journals and applied co-word analysis to identify the evolution of research themes over time. Shen et al. [47] collected 29 years of online database data of academic journals and then used co-word analysis and bibliographic analysis techniques to analyze trends, core authors, degrees of cooperation, core journal analysis, and distribution of publishing institutions. This allowed them to establish 10 important and unevenly distributed research trends and then propose a new potential research theme of information search and information security. De la Hoz-Correa et al. [48] utilized co-word analysis to denote six clusters of themes in published research listed in the Web of Science (WoS) and Scopus database; this type of analysis offers powerful insights into the conceptual structure of medical tourism research. The co-word analysis is an effective manner to identify the content, importance, and relevance of different themes from the aforementioned references.
Rodríguez-López et al. [49] applied a strategic diagram to present the importance of topic themes from a bibliometric analysis of published academic research dealing with restaurants in the fields of hospitality, leisure, sport, and tourism ring the period from 2000 to 2018. Muñoz-Leiva et al. [50] conducted data mining on 759 papers related to blockchain technology in the financial field by employing co-word analysis and strategic diagrams to explore hot topics and predict future development trends. Rodríguez-López et al. [51] selected documents whose titles included specific terms from two online databases, Web of Science (WoS) and Scopus, and utilized a keyword strategic diagram to determine the importance of keywords and their levels of development. Terán-Yépez et al. [52] sampled 216 articles from sustainable entrepreneurship and identified the most significant research tendencies, enabling the proposal of several future research directions through graphic mapping of strategic diagrams. Finally, Jiménez-García et al. [53] applied bibliometric techniques to investigate research trends in 214 articles related to sports tourism and sustainability and used strategic diagrams to identify the most significant research tendencies across distinct topics.

This entry is adapted from the peer-reviewed paper 10.3390/su14074053

This entry is offline, you can click here to edit this entry!
ScholarVision Creations