Human Behavior Analysis by Data

Human Behavior Analysis by Data: Comparison

Please note this is a comparison between Version 2 by Bruce Ren and Version 1 by Edgar Gutierrez.

The goal of this study was to conduct a literature review of current approaches and techniques for identifying, understanding, and predicting human behaviors through mining a variety of sources of textual data with a focus on enabling classification of psychological behaviors regarding emotion, cognition, and social empathy.

text mining
human behavior
sentiment analysis
physiological profiling

Note: The following contents are extract from your paper. The entry will be online only after author check and submit it.

1. Introduction

At present, the vast amount of textual data being generated from myriad sources (e.g., formal or informal reports, interviews, call logs, emails, performance documents, blogs, tweets, comments, or social media entries) is rapidly increasing ^[1]. Although this increase in textual data allows for large repositories to be analyzed, summarized, and deciphered, using these data to make insightful decisions has become much more challenging. Thus, in this study, we sought to explore the current approaches through which the unstructured textual data can be analyzed by extracting valuable information to support decision-making for various purposes. Consequently, we conducted a systematic literature review of the techniques and methods used to identify, understand, and predict human behaviors by mining various textual data sources.

The problem of mining textual data has received substantial attention, owing to the proliferation of social networks that allow the distribution of opinions and sharing sentiment on diverse subject matters. This literature review is focused on the methods for understanding human psychological behavior through the use of textual data. Mining textual data can provide deep insights into an individual’s views, attitudes, sentiments, and emotions toward other individuals and help predict future social behaviors ^[2]. Such human behaviors can be identified and understood by extracting textual data with meaningful semantic properties, including metadata such as concepts, events, keywords, categories, including symmetric and asymmetric relationships. Such knowledge can facilitate improved decision-making (e.g., personnel selection and training) or intelligence analyses ^[3]. According to Bornstein et al. ^[4], human behavior is described as “the potential and expressed capacity for physical, mental, and social activity during the phases of human life.” Regarding the identification of behaviors by text mining, Tausczik et al. ^[5] stated that “by drawing on massive amounts of text, researchers can begin to link everyday language use with behavioral and self-reported measures of personality, social behavior, and cognitive styles.” Furthermore, Pennebaker and Stone ^[6] classified the use of language in the following categories: emotional experience, social relationships, time orientation, and cognitive abilities.

2. Human Behavior

This literature review proposes three main categories to classify human behavior in the context of text mining: cognitive, emotional, and social behaviors. Most of the literature conveys how textual data are analyzed to understand the activities, mental skills, and social interactions among people, with the goal of identifying emotional, social, and cognitive behaviors, whose characteristics are depicted in Figure 1 [55]^[7].

Figure 1. Classification of psychological behaviors.

Emotional behavior is correlated with mental health issues (e.g., stress, depression, anger, or violence), and the monitoring and treatment of mental disorders can be achieved by extracting textual data from communication devices. Several studies have explicitly shown that assumptions can be made about a given person’s current mood by analyzing variations in mobile usage patterns, texting, and calling [9]^[8]. Moreover, studies by Tausczik et al. ^[5] and Rutland et al. [70]^[9] described machine-learning methods to analyze the content of messages sent by short message service (SMS) to scan words from texts and link them to psychologically meaningful terms that can be used to asses emotions and changes in mood. In addition, audio data from mobile calls can be analyzed and translated to text, and the person’s mood can be extracted to detect emotional signatures [35]^[10].

Social behavior is associated with issues of social interaction, such as empathy or loneliness. Social networks, such as Twitter, LinkedIn, or Facebook, were used to study human social behavior. Textual comments were analyzed to identify sentiments to extract information about attitudes, social activity, and interactions with other users [27]^[11]. For example, the number of ongoing or outgoing comments or conversations can reflect the current mood of a given individual. In addition, people with mental health conditions tend to increase the number of texts sent during maniac episodes, whereas low levels of texts sent can be correlated with depression [14]^[12]. In terms of societies, governments, and leader actions, interesting advances using hybrid approaches such as complexity science, symmetry, and information systems (text) presented by Helbing et al. [91]^[13] demonstrate that they can contribute to areas such as understanding geopolitical tensions by analyzing an extensive data set from newspaper articles. The published news was used to search the text of the article for mentions of a given country along with a set of keywords typically associated with tensions (for example, crisis, conflict, antagonism, clash, contention, discord, fight, attack, combat) and have predictions about the subsequent actions.

Cognitive behavior is associated with the performance of mental processes such as thinking and casual reasoning [55]^[7]. The information expressed as textual data can be used to monitor an individual’s skills in different activities. Some authors divide cognitive functions into categories (e.g., perception, attention, memory, language skills, and executive functioning), which can be monitored by assessing performance in specific tasks within these categories. For example, textual analytics was used in SMS text messages from people with schizophrenia to help identify cognitive impairment [104]^[14].

2.1. Emotional Category

In this category, the included papers focused on using smartphones, written information, opinion mining, and customer feedback. Gravenhorst et al. [9]^[8] recognize smartphones as a promising technology for use in the treatment of mental disorders through the implementation of sensor devices to monitor illnesses. By using human–computer interfaces to support therapy and by collecting data from patients’ daily lives, smartphones can be beneficial for treating people with mental disorders. Grünerbl et al. ^[15] explored the use of mobile phones to recognize depressive and manic states in people with bipolar disorder. This sensor-based smartphone system can support the treatment of patients with bipolar disorder as a supplementary tool for health care professionals ^[15]. Muaremi et al. [35]^[10] demonstrated the applicability of phone calls to assess episodes of bipolar disorder in patients. Statistics were extracted from various phone call conversations by using speech cues, and different features, social signals, and emotional properties were identified [35]^[10]. Li and Qian [44]^[16] identified how long-term memory helps classify information by analyzing different emotions in texts. This method helps classify different sentences with a corresponding emotion and may be used to project possible trends in preferences [44]^[16].

Wang et al. [80]^[17] used emotion evolution law for emotion analysis. This method evaluates natural language text from web news by using one-step and limited-step shifts as well as path transfer; it was validated on a data set of titles, bodies, and comments from news articles. This method can identify feelings such as love-anger, sadness-anger, and joy, thus providing insight into applications regarding affective interaction in network public sentiment, social media communication, and human–computer interaction. Swain et al. [83]^[18] proposed a method for detecting suicide ideation by using sentiment analysis from tweets via supervised learning. By using Python language modules and machine learning models for opinion mining, the research using this method suggests that machine sentiment analysis can aid in timely detection and act as an alert system for suicidal tendencies. Similar work was recently presented by Bayram and Benhiba in [88]^[19], where with machine learning techniques, it was possible to identify a person’s suicide risk based on the short-term history of their tweets. Fareri et al. [86]^[20], in 2020, focused on the development of a data-driven approach using text mining techniques to analyze job profiles and quantify the readiness of employees of a large firm to adopt the Industry 4.0 paradigm. This approach provides a framework for estimating the Industry 4.0 readiness of enterprises.

Mahendran et al. [10]^[21] proposed classifying written information as positive, negative, or neutral to efficiently study raw data by using traditional approaches such as Bag of Words, Naïve Bayes classifier, and frequency distribution. Tausczik et al. ^[5] used the computerized text analysis program Linguistic Inquiry and Word Count (LIWC) to determine the physiological meaning of textual information. In this program, words are categorized into different psychological classes to assess peoples’ thought processes, emotional states, intentions, and motivations ^[5]. Turney [21]^[22] categorized data according to an analysis of the meanings of different words by using algorithms. For example, positive reviews (thumbs up) are determined if the review contains positive words, whereas negative reviews are determined by negative words (thumbs down). Nasukawa and Yi [37]^[23] applied a semantic analysis and achieved 70–95% precision in relating sentiments to positive or negative words in text documents from web pages and articles. Extracting information by using NLP can help determine sentiments expressed online [37]^[23]. Thakur and Han [105]^[24] presented an attractive approach for analyzing the acceptance of interaction with virtual assistants throughout different interactive devices with sentiment analysis using Natural Language Processing to explore the views, expressions, and beliefs expressed by older adults.

Pennebaker et al. [42]^[25] studied the software LIWC, which processes textual information to capture the beliefs, preferences, and sentiments of people expressed in words. This study provides evidence that the words that people use have psychological value [42]^[25]. Emotions play a critical role in the studies of human knowledge and behavior. These emotions can be determined by the environmental events of the individual or by their cognitive abilities and social skills. Knowledge management (KM) research considers them from specific angles, and, to date, a comprehensive understanding of the emotions that dominate KM and their prediction has been lacking. To offer a holistic view, this study investigated the presence of emotions in knowledge management publications by applying sentiment analysis [87]^[26].

Liu ^[2] introduced different aspects of sentiment analysis and opinion mining because these two fields have become the most critical approaches in analyzing people’s opinions, sentiments, emotions, and attitudes through the collection of textual language. Miedema [17]^[27] explored how sentiment classification can be used to arrange documents according to sentiments. This method was used to organize feelings gathered through movie reviews for the long short-term memory. Bo Pang et al. [18]^[28] indicated that some machine learning techniques have not performed correctly in classifying texts by sentiment. This aspect has become a concern because it makes sentiment analyses more challenging. Othman et al. [24]^[29] explored approaches for opinion mining and sentiment analysis to gather and analyze information about the opinions of the public. Machine learning can help collect the responses posted on different social media platforms so that data can be used for various purposes in the industry. Acheampong et al. [78]^[30] focused on sentiment analysis through emotional detection via text mining. With the ease of sourcing for data, the analysis of text mining has led to different approaches in the design of text-based emotional detection systems as well as different proposals regarding the concepts of contributions, approaches used, datasets used, results obtained, and strengths and weaknesses.

Vinodhini and Chandrasekaran [26]^[31] established that sentiment analysis and opinion mining can help predict future behavioral trends by elucidating the preferences of customers according to what they write. This capability is valuable for economic and marketing studies. Usability in logistics and supply chain management was used recently to examine customer perceptions of companies’ services. For example, Siby et al., in [90]^[32], presented an interesting application in last-mile logistics. The research used customer reviews about their delivery experience regarding quality, service quality, product return, refund policy, information sharing issues, etc. This work recommended suggestions for redesigning processes related to last-mile logistics by introducing artificial intelligence technology. Pang and Lee [25]^[33] compared traditional analyses and sentiment-aware applications that process information about the sentiments and opinions of people. Different techniques, benchmarking, future work, and resources were also studied. Salloum et al. [20]^[34] proposed a different classification system for the different aspects of opinion mining because the challenges of correctly detecting the meanings and interpretations of different opinions can complicate opinion mining (i.e., an understanding of the domain-specific opinion is required) [20]^[34].

Greco and Polli [77]^[35] focused on the abundance and use of textual data as a source of valuable information regarding opinions and feelings and discussed the use of emotional text mining in brand management. This method is used to profile social media users’ representations and sentiments about a topic by extracting information from a collection of texts such as Twitter. Raeesi Vanan [82]^[36] performed a study in which 3 million inbound tweets and outbound brand responses (tweets) were collected for brand sentiment analysis. Steps of CRIP-DM were used as a reference guide for business and data understanding, preparation, text mining, validation, and discussion of its contributions. The analytical conclusions regarding the sentiment trends were that the sentiments of customers toward a brand are significantly correlated with the brand’s proper response to a brand community over social media as well as providing customers with a deep feeling of reciprocal understanding of needs.

Pang and Lee [25]^[33] presented the importance of opinion mining and sentiment analysis, which has led to the development of several techniques and machines to gather and process information about the opinions and moods of people. The challenge is to seek better approaches to sentiment-aware applications. Haddi et al. [43]^[37] used support-vector machines (SVM) to explore the importance of text pre-processing in sentiment analysis because understanding the relevance of product opinion can be very challenging, owing to the diversity and quantity of unstructured data in existence. [43]^[37].

Binali et al. [11]^[38] indicated that determining the emotional experiences of e-learning students can be difficult; however, through mining techniques, analyses can detect emotion in online students. In addition, identifying the different emotions of e-learning students can help model more suitable educational programs. Another study [16]^[39] proposed summarizing customer reviews by choosing product features on which they commented, classifying whether the opinion was positive or negative, and summarizing the results. This analysis is important because extremely high numbers of reviews prevent potential customers from reviewing every single opinion. Mate [23]^[40] proposed a ranking of essential product features from the online reviews of consumers. These aspects were identified by the number of times the product features appeared in reports and how these aspects influenced the overall opinions of consumers.

Estrada et al. [79]^[41] performed a comparison of sentiment analysis classifying techniques, machine learning, deep learning, and EvoMSA to classify education opinions in an Intelligent Learning Environment called ILE-Java. The development of two corpora expressions, sentiTEXT, which has polarity positive and negative labels, and eduSERE, which has positive and negative learning-centered emotion labels, reflected students’ emotional states regarding teachers, exams, homework, and projects. EvoMSA produced the best results among the classifying techniques, with a 93% accuracy rating for the sentiTEXT corpus and an 84% accuracy rating for the eduSERE corpus. Two expressions in the programming language domain reflect the emotional states of students and their feelings regarding teachers’ exams, homework, and academic projects: sentiTEXT (positive and negative labels) and eduSERE (positive and negative learning-centered emotions labels. Misuraca et al., 2021 [81]^[42] discussed OM as a combination of statistics, linguistics, and computer science that evaluates sentiments of individual opinions and highlights semantic orientation. The discussion includes the induction of OM as a statistical text analysis tool in a learning environment to process student feedback from natural language producing useful analytics, and to explore text collections from a quantitative viewpoint.

Wu et al. [28]^[43] studied how information shared on Facebook pages can be beneficial in determining whether a company is correctly reaching its customers or the desired requirements are met. By analyzing the interactions of Facebook users and the reactions to their posts, companies can gather information, apply statistical analyses, and model behavioral trends [28]^[43]. Kaur and Bansal [34]^[44] introduced opinion mining as a powerful tool for e-commerce because it gathers information about how customers feel about different products. This collection of opinions can help companies make better decisions and align their efforts with what customers really want. The classification of e-commerce users represents an appealing area of study for marketers seeking to align their efforts to capture more consumers. [34]^[44]. Gamon [41]^[45] used large feature vectors and feature reduction to demonstrate that large, noisy data regarding customer feedback can be analyzed and classified. Feedback received from customers can present many challenges, and classifying these data is necessary to retrieve only the important information [41]^[45].

Bollen et al. [12]^[46] highlighted that many Twitter users express their emotions through this social media platform. With the use of a psychometric instrument, different social events were found to profoundly affect changes in public mood. The identification of these sentiments reflects personality trends, as well as the atmosphere and emotions of Twitter users. Basari et al. [22]^[47] examined how tweets can contain information about users’ preferences regarding movies. SVM can analyze natural language to determine patterns via opinion mining. Online reviews can help predict the possible preferences of the movie audience [22]^[47]. Zengin Alp and Gündüz Öğüdücü [38]^[48] introduced a method called Personalized PageRank, which integrates the information retrieved from network topology and the information of Twitter users regarding their actions and activities. This capability has become appealing for marketers because Twitter is an online platform where users share their preferences.

Saire and Cruz [84]^[49] focused on the use of text mining of data collected from social media and search trends to analyze the effects of COVID-19 on the population of Paris, France, from 23 April 2020 to 18 June 2020. The primary findings revealed a decreasing pattern of publication/interest in the health crisis and the health and economic effects on the population resulting from the effects of COVID-19. Chire-Saire [85]^[50] used analysis of social media through complex network representation and text mining to compare the effects of COVID-19 in other countries. Focusing on South American countries, the analysis of texts via Twitter indicated the existence of patterns similar to those in complex systems and confirmed the idea of system and visualization of adjacency matrices, which may potentially identify posts made by robots as opposed to humans.

Frost et al. [14]^[12] studied the system MONARCA 2.0 to collect relevant information from bipolar patients, with an aim to provide insight into the disease for both patients and clinicians by processing subjective and objective data about patient mood. This system helps identify patterns in behaviors and factors affecting the disease [14]^[12]. Lachmar et al. [27]^[11] gathered information shared by individuals with sentiments of depression on Twitter through the hashtag #MyDepressionLooksLike. These tweets presented dysfunctional thoughts, hopeless feelings, and unlovability characteristics, thus revealing how people with depression talk about their symptoms via social networking. Pijnenborg et al. [40]^[51] discussed the benefits of using SMS to decrease the effects of cognitive impairments in patients who have schizophrenia. Because schizophrenia also involves delusions and hallucinations, improvements in the status of patients using SMS can be very modest.

Bespalov et al. [13]^[52] proposed an approach to modeling higher-order sentences to a lower order to make the classification of data viable. Supervised latent n-gram analyses can help classify sentiments that are extracted from textual information. Davis et al. [29]^[53] determined how analytical models can enhance public safety with the help of probabilistic and parametric methods, as well as different nonlinear algebraic models, by analyzing uncertain data and identifying threats and false alarms, and detecting possible terrorist profiles [29]^[53].

Gill [33]^[54] illustrated the relationship between the language used and the personality projected by word choice. The personality traits of extraversion, neuroticism, and psychoticism can be determined by analyzing text from emails [33]^[54]. Boyd and Pennebaker [39]^[55] studied the language used by people to identify personality patterns. Rather than focusing on responses to self-reported questionnaires, language-based measures represent a new approach to model personality trends. A.S. Cohen et al. ^[3] applied computerized lexical analyses to determine positive or negative affectivity dimensions through natural speech. Measuring personality was possible because people with positive affectivity demonstrate high levels of positive emotions, whereas those with negative affectivity show high levels of negative emotions.

Brynielsson et al. [31]^[56] used different techniques for analyzing data to detect “lone wolf” terrorists with the goal of preventing possible attacks. Analytical models were created by using a platform to harvest and capture online information and trace possible lone wolves [31]^[56]. K. Cohen et al. [32]^[57] established the challenges of detecting lone wolves by using traditional police methods and introduced new tools and technologies that can detect weak signals in the form of linguistic markers that facilitate the identification of lone wolves’ profiles [32]^[57].

Hung et al. [30]^[58] introduced a new framework and technology called INSiGHT (Investigative Search for Graph-Trajectories) that helps detect groups or individuals whose behavior suggests a potential for violence by identifying radicalization trajectories over time [30]^[58]. Paul K. Davis et al. [36]^[59] studied behavioral patterns and their usage to predict possible acts of violence.

2.2. Social Category

In this category, Alexander Semenov et al. [45]^[60] studied the identification of possible school shooters by analyzing the content shared by users on different social media platforms. Future shooters can be identified by analyzing the emails, chats, texts, and social media feeds of prior school shooters sharing similar behaviors [45]^[60]. Bartlett and Reynolds [46]^[61] presented how social media faces legal and ethical responsibilities, yet also can be useful to prevent terrorism and preserve public safety. Privacy can protect the public and prevent the use of social media for terrorism and propagandistic purposes [46]^[61]. Marrese-Taylor et al. [52]^[62] tested the software Opinion Zoom to gather online information about tourism opinions to propose solutions to problems in the industry. A modular tool was used because tourism opinions on the web can help predict possible traveling patterns as well as preferences of travelers.

Kastrati et al. [49]^[63] investigated the activities of users on online social networks to identify crimes by applying the objective metric SEMCON. By retrieving online posts, feeds, or users’ comments, this method can determine whether a user is a suspect [49]^[63].

Bollen et al. [12]^[46] analyzed how OpinionFinder and Google-Profile of Mood States (GPMOS) can help determine the mood patterns presented on social media regarding worldwide events. This analysis can also help companies predict the behavior of customers regarding the stock market and minimize the effects of fluctuations in the stock market. Bucur [47]^[64] established that opinion mining had become a key technique for extracting and collecting relevant information needed for companies to make better decisions and that the opinions of customers are fundamental input. Opinion mining has become an appealing area of study for many businesses [47]^[64].

Dave et al. [48]^[65] extracted textual information and classified online reviews as positive or negative according to different product attributes. Opinions can be classified through semantic analysis of online reviews [48]^[65]. Zha et al. [51]^[66] introduced a ranking system for product aspects by identifying that a) the most important aspects are described by more consumers, and b) these aspects directly affect the overall opinion of consumers. Product aspect ranking has many applications in various industries, and the main use is to gather relevant information to make better decisions.

Nahm and Mooney [50]^[67] examined how DiscoTEX can help extract data by combining data mining and information extraction. This method can locate data within documents and transform unstructured text into a structured database, as well as predict additional information for extraction from other documents. The integration of data mining and information extraction can help combine data in a more readable structure [50]^[67]. McCallum [56]^[68] investigated how unstructured data present a challenge in interpreting information. Therefore, the aim of information extraction is to create a database by gathering loosely formatted texts in which patterns can be identified by data mining [56]^[68].

Diehl [53]^[69] examined not only the structural but also the cultural aspects of social networks. Relational sociology studies have tended to examine and retrieve information from text data, whereas the importance of the implications of face-to-face interactions when analyzing network information has largely been ignored. A. Semenov et al. [54]^[70] proposed three modules for long-term monitoring of different social networks: the crawler, the repository, and the analyzer. By crawling, storing, and analyzing different sites, longitudinal data from social media sites can be examined.

Pennebaker [55]^[7] analyzed the words that people use in emails, Twitter feeds, and Facebook posts to determine their emotions, thoughts, social relationships, and personalities. The focus was on word use rather than on how people were speaking. Mind mapping can help explore social and psychological trends. Ibrahim and Ahmad [57]^[71] researched how Requirements Analysis and Class Diagram Extraction (RACE) can expedite textual extractions and improve the analysis of the data requirements that are currently performed manually. Many NLP techniques were developed to extract relevant information from textual data.

2.3. Cognition Category

Eichinger et al. [58]^[72] introduced Affinity, a system that can assess similarities among the text message histories of users while preserving private information. A latent format is used, which does not allow for the reconstruction of the comparison words. Chung and Pennebaker [61]^[73] distinguished the adjectives most commonly used by college students by applying computerized text analytic tools. This study has established the strengths of analyzing open-ended texts to extract information from the natural language used by different participants. This method enables the examination of cultural patterns as well as personality characteristics.

Bond and Pennebaker [59]^[74] experimented with changing pronouns to moderate the health benefits of expressive writing by alternating the focus of participants. Expressive writing can therefore affect people’s physical and psychological health. Pennebaker and Stone ^[6] developed two projects showing the relationship between language use and aging: as people age, they tend to use more positive affect words than negative affect words and to use fewer self-references and fewer past-tense verbs.

Rajman and Besançon [62]^[75] established that text mining is a powerful technique to extract important information from a dataset by applying probabilistic associations of keywords because unstructured data can be challenging to interpret.

Fishhoff and Chauvin [106]^[76] investigated how intelligence analysis helps clear difficult situations and enhance valuable information for better decision-making by evaluating and integrating pertinent information. Intelligence analysis can help determine behavioral profiles and social conduct.

2.4. Other Studies

Kosala and Blockeel [65]^[77] explored the use of web mining by dividing it into three different categories—web content mining, web structure mining, and web usage mining—and studying representation issues, recess, and learning algorithms. Balazs and Velásquez [63]^[78] studied how information fusion seeks to correctly transform and compress data to transform them into a more understandable representation. Fusion processes and the development of surveys to extract relevant data can be helpful as the use of opinion mining steadily increases. Nigam et al. [67]^[79] evaluated maximum entropy techniques to establish how a uniform distribution can benefit the classification of data. More studies must be performed, but this technique appears promising.

Continuous efforts have been undertaken worldwide to propose new classification algorithms such as Tsetlin Machine [107]^[80] or Dendritic Neuron Models [108]^[81]. Rutland et al. [70]^[9] evaluated how the use of SMS can be measured with the SMS Problem Use Diagnostic Questionnaire (SMS-PUDQ) to determine behavioral addiction to SMS use. The time spent using SMS and other measures of mobile phone use were detected during the study. Aggarwal and Zhai [71]^[82] explored the importance of mining text data, an appealing research topic, given that the amount of web-enabled data has increased and facilitates the exploration of vast quantities of textual data. A comparison of the classical and modern aspects of text mining was also described. Berry and Kogan [72]^[83] studied the contributions of text mining, as well as major topics associated with text mining, by categorizing text into three different components to explore keyword extraction, classification, and the clustering of information presented in textual data. Akilan [73]^[84] investigated the field of text mining to extract unstructured data and identify interesting and non-trivial patterns from text documents. An exploration of the current challenges and projected directions of this field was described [73]^[84]. Chakraborty et al. [64]^[85] prepared various case studies and performed text mining and analysis to extract important information from textual data. Different scenarios were created wherein SAS was used to perform comprehensive text analytics to help industries leverage the textual data [64]^[85].

Shahbaz et al. [68]^[86] proposed a solution to the analysis of textual information by developing a system, Sentiment Miner, to process and classify text files according to opinions stated in various sentences by using NLP techniques and opinion mining algorithms. Weiss et al. [69]^[87] introduced methods to predict and analyze unstructured information presented on textual data. Methods used for data mining could be adapted to be applied to text. Chakraborty et al. [64,109]^[85][88] collected insightful information from customers by analyzing textual data from various documents to improve business operations and performance. Analyses of unstructured data are possible by extracting important information when performing text analysis and sentiment mining. Weerdt et al. [74]^[89] described the importance of retrieving data to benefit business process management by applying process mining, which uses techniques to analyze and extract knowledge and information from system event logs. Manning and Schutze [66]^[90] established the value of using statistical NLP to extract and interpret textual data, not only for businesses but also for government agencies and individuals who could benefit from extracting information from a large amount of data. The theory and practice of these techniques are also explored. [66]^[90] Moraes et al. [75]^[91] compared SVM and artificial neural networks to determine the differences between these two approaches in performing sentiment analysis and determined that artificial neural networks perform better than SVMs. Fraley [76]^[92] presented guidelines on how to construct web-based surveys to conduct behavioral research. Strengths and limitations of online surveys are highlighted, as well as the factors affecting the design of internet-based research.

References

Ahram, T.Z.; McCauley-Bush, P.; Karwowski, W. Estimating Intrinsic Dimensionality Using the Multi-Criteria Decision Weighted Model and the Average Standard Estimator. Inf. Sci. 2010, 180, 2845–2855.
Liu, B. Sentiment Analysis and Opinion Mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167.
Cohen, A.S.; Minor, K.S.; Baillie, L.E.; Dahir, A.M. Clarifying the Linguistic Signature: Measuring Personality From Natural Speech. J. Pers. Assess. 2008, 90, 559–563.
Bornstein, M.H. Human Behavior|Definition, Theories, Characteristics, Examples, Types, & Facts. Available online: https://www.britannica.com/topic/human-behavior (accessed on 21 March 2021).
Tausczik, Y.R.; Pennebaker, J. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. J. Lang. Soc. Psychol. 2009, 29, 24–54.
Pennebaker, J.W.; Stone, L.D. Words of wisdom: Language use over the life span. J. Pers. Soc. Psychol. 2003, 85, 291–301.
Pennebaker, J.W. Mind Mapping: Using Everyday Language to Explore Social & Psychological Processes. Procedia Comput. Sci. 2017, 118, 100–107.
Gravenhorst, F.; Muaremi, A.; Bardram, J.; Grünerbl, A.; Mayora, O.; Wurzer, G.; Frost, M.; Osmani, V.; Arnrich, B.; Lukowicz, P.; et al. Mobile phones as medical devices in mental disorder treatment: An overview. Pers. Ubiquitous Comput. 2015, 19, 335–353.
Rutland, J.B.; Sheets, T.; Young, T. Development of a Scale to Measure Problem Use of Short Message Service: The SMS Problem Use Diagnostic Questionnaire. Cyberpsychol. Behav. 2007, 10, 841–844.
Muaremi, A.; Gravenhorst, F.; Grünerbl, A.; Arnrich, B.; Tröster, G. Assessing Bipolar Episodes Using Speech Cues Derived from Phone Calls. In Proceedings of the Pervasive Computing Paradigms for Mental Health; Cipresso, P., Matic, A., Grünerbl, A., Lopez, G., Tröster, G., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 103–114.
Lachmar, E.M.; Wittenborn, A.K.; Bogen, K.W.; McCauley, H.L.; Cravens, J.; Berry, N.; Radovic-Stakic, A. #MyDepressionLooksLike: Examining Public Discourse About Depression on Twitter. JMIR Ment. Health. 2017, 4, e43.
Frost, M.; Doryab, A.; Faurholt-Jepsen, M.; Kessing, L.V.; Bardram, J.E. Supporting Disease Insight through Data Analysis: Refinements of the Monarca Self-Assessment System. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, 8–12 September 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 133–142.
Helbing, D.; Brockmann, D.; Chadefaux, T.; Donnay, K.; Blanke, U.; Woolley-Meza, O.; Moussaid, M.; Johansson, A.; Krause, J.; Schutte, S. Saving Human Lives: What Complexity Science and Information Systems Can Contribute. J. Stat. Phys. 2015, 158, 735–781.
Schmidt, C.; Collette, F.; Cajochen, C.; Peigneux, P. A Time to Think: Circadian Rhythms in Human Cognition. Cogn. Neuropsychol. 2007, 24, 755–789.
Grunerbl, A.; Muaremi, A.; Osmani, V.; Bahle, G.; Ohler, S.; Troster, G.; Mayora, O.; Haring, C.; Lukowicz, P. Smartphone-Based Recognition of States and State Changes in Bipolar Disorder Patients. IEEE J. Biomed. Health Inform. 2015, 19, 140–148.
Li, D.; Qian, J. Text Sentiment Analysis Based on Long Short-Term Memory. In Proceedings of the 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI), Wuhan, China, 13–15 October 2016; pp. 471–475.
Wang, X.; Kou, L.; Sugumaran, V.; Luo, X.; Zhang, H. Emotion Correlation Mining through Deep Learning Models on Natural Language Text. IEEE Trans. Cybern. 2020.
Swain, D.; Khandelwal, A.; Joshi, C.; Gawas, A.; Roy, P.; Zad, V. A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis and Machine Learning. In Machine Learning and Information Processing: Proceedings of ICMLIP 2020; Springer: Berlin/Heidelberg, Germany, 2021.
Bayram, U.; Benhiba, L. Determining a Person’s Suicide Risk by Voting on the Short-Term History of Tweets for the CLPsych 2021 Shared Task. In Proceedings of the Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access, Mexico City, Mexico, 11 June 2021; pp. 81–86.
Fareri, S.; Fantoni, G.; Chiarello, F.; Coli, E.; Binda, A. Estimating Industry 4.0 Impact on Job Profiles and Skills Using Text Mining. Comput. Ind. 2020, 118, 103222.
Mahendran, A.; Duraiswamy, A.; Reddy, A.; Gonsalves, C. Opinion Mining for Text Classification. Int. J. Sci. Eng. Technol. 2013, 2, 589–594.
Turney, P.D. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. arXiv 2002, arXiv:cs/0212032.
Nasukawa, T.; Yi, J. Sentiment Analysis: Capturing Favorability Using Natural Language Processing. In Proceedings of the Proceedings of the 2nd International Conference on Knowledge Capture; Association for Computing Machinery: New York, NY, USA, 2003; pp. 70–77.
Thakur, N.; Han, C.Y. An Approach to Analyze the Social Acceptance of Virtual Assistants by Elderly People. In Proceedings of the 8th International Conference on the Internet of Things, Santa Barbara, CA, USA, 15–18 October 2018; pp. 1–6.
Pennebaker, J.W.; Boyd, R.L.; Jordan, K.; Blackburn, K. The Development and Psychometric Properties of LIWC2015; University of Texas at Austin: Austin, TX, USA, 2015.
Fteimi, N.; Hornung, O.; Smolnik, S. When Emotions Rule Knowledge: A Text-Mining Study of Emotions in Knowledge Management Research. Int. J. Knowl. Manag. IJKM 2021, 17, 1–16.
Miedema, F. Sentiment Analysis with Long Short-Term Memory Networks; Vrije Universiteit Amsterdam: Amsterdam, The Netherlands, 2018.
Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up? Sentiment Classification Using Machine Learning Techniques. arXiv 2002, arXiv:cs/0205070.
Othman, M.; Hassan, H.; Moawad, R.; El-Korany, A. Opinion Mining and Sentimental Analysis Approaches: A Survey. Life Sci. J. 2014, 11, 321–326.
Acheampong, F.A.; Wenyu, C.; Nunoo-Mensah, H. Text-Based Emotion Detection: Advances, Challenges, and Opportunities. Eng. Rep. 2020, 2, e12189.
Vinodhini, G.; Chandrasekaran, R.M. Sentiment Analysis and Opinion Mining: A Survey. Int. J. 2012, 2, 282–292.
Siby, S. An Exploration about the Last Mile Logistic Efficiency in Indian E-Commerce Sector—A Text Mining Approach. In Proceedings of the International Conference on Innovative Computing & Communications (ICICC), New Delhi, India, 21–23 February 2020; Available online: https://ssrn.com/abstract=3563089 (accessed on 21 March 2021).
Pang, B.; Lee, L. Opinion Mining and Sentiment Analysis. Found. Trends® Inf. Retr. 2008, 2, 1–135.
Salloum, S.A.; Al-Emran, M.; Monem, A.A.; Shaalan, K. A Survey of Text Mining in Social Media: Facebook and Twitter Perspectives. Adv. Sci. Technol. Eng. Syst. J. 2017, 2, 127–133.
Greco, F.; Polli, A. Emotional Text Mining: Customer Profiling in Brand Management. Int. J. Inf. Manag. 2020, 51, 101934.
Raeesi Vanani, I. Text Analytics of Customers on Twitter: Brand Sentiments in Customer Support. J. Inf. Technol. Manag. 2019, 11, 43–58.
Haddi, E.; Liu, X.; Shi, Y. The Role of Text Pre-processing in Sentiment Analysis. Procedia Comput. Sci. 2013, 17, 26–32.
Binali, H.H.; Wu, C.; Potdar, V. A new significant area: Emotion detection in E-learning using opinion mining techniques. In Proceedings of the 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, Lake Ohrid, Macedonia, 16–19 June 2009; pp. 259–264.
Hu, M.; Liu, B. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, DC, USA, 22–25 August 2004; pp. 168–177.
Mate, C. Product Aspect Ranking Using Sentiment Analysis: A Survey. Int. Res. J. Eng. Technol. 2015, 3, 126–127.
Estrada, M.L.B.; Cabada, R.Z.; Bustillos, R.O.; Graff, M. Opinion Mining and Emotion Recognition Applied to Learning Environments. Expert Syst. Appl. 2020, 150, 113265.
Misuraca, M.; Scepi, G.; Spano, M. Using Opinion Mining as an Educational Analytic: An Integrated Strategy for the Analysis of Students’ Feedback. Stud. Educ. Eval. 2021, 68, 100979.
Wu, H.; Liu, K.; Trappey, C. Understanding Customers Using Facebook Pages: Data Mining Users Feedback Using Text Analysis. In Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD); IEEE: Piscataway, NJ, USA, 2014; pp. 346–350.
Kaur, J.; Bansal, M. Hierarchical Sentiment Analysis Model for Automatic Review Classification for E-commerce Users. In Hybrid Intelligence for Social Networks; Banati, H., Bhattacharyya, S., Mani, A., Köppen, M., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 249–267. ISBN 978-3-319-65139-2.
Gamon, M. Sentiment Classification on Customer Feedback Data: Noisy Data, Large Feature Vectors, and the Role of Linguistic Analysis. In Proceedings of the COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23–27 August 2004; pp. 841–847.
Bollen, J.; Mao, H.; Pepe, A. Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena. Proc. Int. AAAI Conf. Web Soc. Media 2011, 5, 1.
Basari, A.S.H.; Hussin, B.; Ananta, I.G.P.; Zeniarja, J. Opinion Mining of Movie Review using Hybrid Method of Support Vector Machine and Particle Swarm Optimization. Procedia Eng. 2013, 53, 453–462.
Alp, Z.Z.; Öğüdücü, Ş.G. Identifying topical influencers on twitter based on user behavior and network topology. Knowl. Based Syst. 2018, 141, 211–221.
Saire, J.E.C.; Cruz, J.F.O. Study of Coronavirus Impact on Parisian Population from April to June Using Twitter and Text Mining Approach. In 2020 International Computer Symposium; IEEE: Piscataway, NJ, USA, 2020.
Chire-Saire, J.E. Characterizing Twitter Interaction during COVID-19 Pandemic Using Complex Networks and Text Mining. arXiv Prepr. 2020, arXiv:2009.05619.
Pijnenborg, G.H.M.; Withaar, F.K.; Brouwer, W.H.; Timmerman, M.E.; Bosch, R.J.V.D.; Evans, J.J. The efficacy of SMS text messages to compensate for the effects of cognitive impairments in schizophrenia. Br. J. Clin. Psychol. 2010, 49, 259–274.
Bespalov, D.; Bai, B.; Qi, Y.; Shokoufandeh, A. Sentiment Classification Based on Supervised Latent N-Gram Analysis. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow Scotland, UK, 24–28 October 2011; Association for Computing Machinery: New York, NY, USA, 2011; pp. 375–382.
Davis, P.K.; Manheim, D.; Perry, W.L.; Hollywood, J. Using causal models in heterogeneous information fusion to detect terrorists. In Proceedings of the 2015 Winter Simulation Conference (WSC); IEEE: Piscataway, NJ, USA, 2015; pp. 2586–2597.
Gill, A.J. Personality and Language: The Projection and Perception of Personality in Computer-Mediated Communication. Ph.D. Thesis, University of Edinburgh, Edinburgh, UK, 2003.
Boyd, R.; Pennebaker, J. Language-based personality: A new approach to personality in a digital world. Curr. Opin. Behav. Sci. 2017, 18, 63–68.
Brynielsson, J.; Horndahl, A.; Johansson, F.; Kaati, L.; Mårtenson, C.; Svenson, P. Harvesting and analysis of weak signals for detecting lone wolf terrorists. Secur. Inform. 2013, 2, 1–15.
Cohen, K.; Johansson, F.; Kaati, L.; Mork, J.C. Detecting Linguistic Markers for Radical Violence in Social Media. Terror. Polit. Violence 2013, 26, 246–256.
Hung, B.W.K.; Jayasumana, A.P.; Bandara, V.W. INSiGHT: A System for Detecting Radicalization Trajectories in Large Heterogeneous Graphs. In Proceedings of the 2017 IEEE International Symposium on Technologies for Homeland Security (HST), Waltham, MA, USA, 25–26 April 2017; pp. 1–7.
Davis, P.K.; Perry, W.L.; Brown, R.A.; Yeung, D.; Roshan, P.; Voorhies, P. Using Behavioral Indicators to Help Detect Potential Violent Acts; RAND Corporation: Santa Monica, CA, USA, 2013.
Semenov, A.; Veijalainen, J.; Kyppo, J. Analysing the presence of school-shooting related communities at social media sites. Int. J. Multimed. Intell. Secur. 2010, 1, 232–268.
Bartlett, J.; Reynolds, L. The State of the Art 2015: A Literature Review of Social Media Intelligence Capabilities for Counter-Terrorism; Demos London; Demos: London, UK, 2015.
Marrese-Taylor, E.; Velásquez, J.D.; Bravo-Marquez, F. Opinion Zoom: A Modular Tool to Explore Tourism Opinions on the Web. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT); IEEE: Piscataway, NJ, USA, 2013; Volume 3, pp. 261–264.
Kastrati, Z.; Imran, A.S.; Yildirim-Yayilgan, S.; Dalipi, F. Analysis of Online Social Networks Posts to Investigate Suspects Using SEMCON. In Proceedings of the Social Computing and Social Media; Meiselwitz, G., Ed.; Springer International Publishing: Cham, Switzerland, 2015; pp. 148–157.
Bucur, C. Opinion Mining Platform for Intelligence in Business. Econ. Insights Trends Chall. 2014, 3, 99–108.
Dave, K.; Lawrence, S.; Pennock, D.M. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In Proceedings of the 12th International Conference on World Wide Web; Association for Computing Machinery: New York, NY, USA, 2003; pp. 519–528.
Zha, Z.-J.; Yu, J.; Tang, J.; Wang, M.; Chua, T.-S. Product Aspect Ranking and Its Applications. IEEE Trans. Knowl. Data Eng. 2013, 26, 1211–1224.
Nahm, U.Y.; Mooney, R.J. A Mutually Beneficial Integration of Data Mining and Information Extraction. In Proceedings of the AAAI/IAAI, Austin, TX, USA, 1–3 August 2000; pp. 627–632.
McCallum, A. Information Extraction: Distilling Structured Data from Unstructured Text. Queue 2005, 3, 48–57.
Diehl, D.K. Language and Interaction: Applying Sociolinguistics to Social Network Analysis. Qual. Quant. 2019, 53, 757–774.
Semenov, A.; Veijalainen, J.; Boukhanovsky, A. A Generic Architecture for a Social Network Monitoring and Analysis System. In Proceedings of the 2011 14th International Conference on Network-Based Information Systems, Tirana, Albania, 7–9 September 2011; pp. 178–185.
Ibrahim, M.; Ahmad, R. Class Diagram Extraction from Textual Requirements Using Natural Language Processing (NLP) Techniques. In Proceedings of the 2010 Second International Conference on Computer Research and Development, Kuala Lumpur, Malaysia, 7–10 May 2010; pp. 200–204.
Eichinger, T.; Beierle, F.; Khan, S.U.; Middelanis, R. Affinity: A System for Latent User Similarity Comparison on Texting Data. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7.
Chung, C.K.; Pennebaker, J.W. Revealing Dimensions of Thinking in Open-Ended Self-Descriptions: An Automated Meaning Extraction Method for Natural Language. J. Res. Personal. 2008, 42, 96–132.
Bond, M.; Pennebaker, J.W. Automated Computer-Based Feedback in Expressive Writing. Comput. Hum. Behav. 2012, 28, 1014–1018.
Rajman, M.; Besançon, R. Text Mining-Knowledge Extraction from Unstructured Textual Data. In Proceedings of the Advances in Data Science and Classification; Rizzi, A., Vichi, M., Bock, H.-H., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 473–480.
Fischhoff, B.; Chauvin, C. Intelligence Analysis. Behav. Soc. 2011. Available online: https://www.nap.edu/read/13062/chapter/1#ii (accessed on 21 March 2021).
Kosala, R.; Blockeel, H. Web Mining Research: A Survey. ACM SIGKDD Explor. Newsl. 2000, 2, 1–15.
Balazs, J.A.; Velásquez, J.D. Opinion Mining and Information Fusion: A Survey. Inf. Fusion 2016, 27, 95–110.
Nigam, K.; Lafferty, J.; McCallum, A. Using Maximum Entropy for Text Classification. In Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, Stockholom, Sweden, 1 August 1999; Volume 1, pp. 61–67.
Granmo, O.-C. The Tsetlin Machine–A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic. arXiv Prepr. 2018, arXiv:1804.01508.
Gao, S.; Zhou, M.; Wang, Y.; Cheng, J.; Yachi, H.; Wang, J. Dendritic Neuron Model with Effective Learning Algorithms for Classification, Approximation, and Prediction. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 601–614.
Aggarwal, C.C.; Zhai, C. An introduction to text mining. In Mining Text Data; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–10.
Berry, M.W.; Kogan, J. Text Mining: Applications and Theory; John Wiley & Sons: West Sussex, UK, 2010.
Akilan, A. Text Mining: Challenges and Future Directions. In Proceedings of the 2015 2nd International Conference on Electronics and Communication Systems (ICECS), Coimbatore, India, 26–27 February 2015; pp. 1679–1684.
Chakraborty, G.; Pagolu, M.; Garla, S. Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS; SAS Institute: Cary, NC, USA, 2014.
Shahbaz, M.; Guergachi, A.; Rehman, R.T. ur Sentiment Miner: A Prototype for Sentiment Analysis of Unstructured Data and Text. In Proceedings of the 2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE), Toronto, ON, Canada, 4–7th May 2014; pp. 1–7.
Weiss, S.M.; Indurkhya, N.; Zhang, T.; Damerau, F. Text Mining: Predictive Methods for Analyzing Unstructured Information; Springer Science & Business Media: Berlin, Germany, 2010.
Chakraborty, G.; Krishna, M. Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining. In Proceedings of the SAS Global Forum, Washington, DC, USA, 23–26 March 2014; pp. 1288–2014.
Weerdt, J.D.; vanden Broucke, S.K.; Vanthienen, J.; Baesens, B. Leveraging Process Discovery with Trace Clustering and Text Mining for Intelligent Analysis of Incident Management Processes. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation, Brisbane, Australia, 10–15 June 2012; pp. 1–8.
Manning, C.; Schutze, H. Foundations of Statistical Natural Language Processing; MIT Press: Cambridge, MA, USA, 1999.
Moraes, R.; Valiati, J.F.; Gavião Neto, W.P. Document-Level Sentiment Classification: An Empirical Comparison between SVM and ANN. Expert Syst. Appl. 2013, 40, 621–633.
Fraley, R.C. How to Conduct Behavioral Research over the Internet: A Beginner’s Guide to HTML and CGI/Perl; Guilford Press: New York, NY, USA, 2004.