Text Mining Applications in the Construction Industry

Text Mining Applications in the Construction Industry: Comparison

Please note this is a comparison between Version 2 by Jason Zhu and Version 1 by Na Xu.

With the advent of the Industry 4.0 era, information technology has been widely developed and applied in the construction engineering field. Text mining (TM) refers to the process of extracting interesting and non-trivial information hidden in plain text, in order to obtain useful insights. It has been applied to a wide range of industries, including biography, medicine, and manufacturing. Studies in these areas have shown that TM techniques can offer valuable information for decision making and improving industrial productivity.

construction industry
text mining
bibliometric analysis

1. Introduction

Text mining (TM) refers to the process of extracting interesting and non-trivial information hidden in plain text, in order to obtain useful insights [1,2]^[1][2]. It has been applied to a wide range of industries, including biography, medicine, and manufacturing [3,4,5]^[3][4][5]. Studies in these areas have shown that TM techniques can offer valuable information for decision making and improving industrial productivity.

Due to the one-time feature of construction projects and organization, practitioners and academics tend to use experience-based information to make decisions. Experience-based methods, such as brainstorming, Delphi method, questionnaires, interviews, literature studies, and their combination, are primarily used to collect data and information [6,7,8]^[6][7][8]. This may lead to biased judgements, and the result will be limited by the number and knowledge of the experts. In particular, with the development of the construction industry, the density of information is increasing. The large amount of information (over 80%) stored in daily project management documents or standardized texts makes the use of traditional methods for information retrieval and management difficult [9,10]^[9][10]. There is a lot of potentially valuable information contained in text documents, but it is difficult to transform this information into knowledge by individuals or researchers. Therefore, in the context of Industry 4.0, there is a need to introduce emerging information technologies for the mining and research of text-based information in the construction field.

TM has been recently applied in the construction industry with the aim of extracting useful information from unstructured text documents that was not previously known and is not easily revealed. However, it should be noted that studies related to TM technology in the construction industry are still in the early stages, with research being very fragmented. There have been some review studies related to TM techniques based on qualitative analysis ^[11]. However, few studies have examined TM technology in the construction industry from a bibliometric and visualization perspective. Most existing reviews have focused on a certain aspect of the construction field (e.g., construction management or safety management), thereby lacking a review of the overall status of text mining applications in the construction industry.

2. Text-Based Analysis Methods

Text analysis techniques typically involve the following phases when used in construction-related text mining research: corpus acquisition, pre-processing, text representation, and model training. Here, the corpus refers to paper or electronic text (e.g., Word, PDF, Excel) and image files.

2.1. Text Pre-Processing

Text pre-processing, including text cleaning, error correction, formatting, word separation, lexical annotation, and de-activation filtering, is the preliminary work performed on the original text in order to adapt it to a machine-readable form. For text separation and lexical annotation, more advanced open-source NLP tools are currently available; see Table 1. The ICTCLAS Chinese word separation system has been employed in the field of construction engineering, in order to divide words and lexical annotation for construction quality acceptance requirements and documents in the area of coal mine safety [25]^[12]. Urban rail transit construction safety risk management has been conducted by utilizing the LTP method [26]^[13]. Xue and Zhang [27]^[14] have pointed out that generic lexicons are limited and the performance of open-source pre-processing tools may be degraded when dealing with domain-specific documents. Therefore, future studies will also require the manual building of dictionaries and ontologies that are relevant to the construction domain [28]^[15].

Table 1.

NLP open-source tools and implemented functions.

Country	Tool Name	Function
Foreign	Apache OpenNLP	A Java-based machine learning work package that also supports ME (maximum entropy) machine learning
	NLTK	Lexical tagging, chunking, sentiment analysis, classification, semantic reasoning, and so on
Domestic	FudanNLP	Chinese word separation, lexical tagging, named entity recognition, text classification, news clustering, and other information retrieval functions
	Pkuseg	Chinese word separation (high accuracy)
	LTP	Chinese word segmentation, lexical tagging, named entity recognition, dependent syntactic analysis, semantic role annotation, and so on
	ICTCLAS	Chinese word separation, lexical tagging, text clustering, sentiment analysis, summary entities, code-switching, and so on

2.2. Text Representation

Text representation (i.e., text feature generation) enables the digitization of text with the help of data structures, such as vectors or matrices, characterized as machine-readable. Table 2 provides a brief overview of current feature generation methods based on modern developments. Traditional NLP techniques extract features from text data by analyzing the syntactic structure. A literature search revealed that the vector space model (VSM) is relatively simple and dominates feature generation methods. While TF has been historically popular as a metric for identifying key features, TF-IDF, first proposed by Jones [29]^[16], has become the main method for determining feature weights in documents. With the development of computer technology, deep learning algorithms based on neural networks began to appear, including Word2Vec, ELMO, and BERT.

Table 2.

Feature generation methods and introduction.

Method	Description	Feature
BOW/VSM	A discrete representation of text, where text objects are numerically represented as discrete features	The model is easy to understand and simple to implement, but only focuses on the text content, not the word order, and there is a semantic gap
TF/TF-IDF	If a word or phrase appears frequently in one text and rarely in others, the word or phrase is considered to have good category differentiation and is suitable for classification	Information retrieval and text classification based on frequency, but without consideration of semantic information of words
LSA/LDA	This approach focuses on the text vocabulary, counting the relationships between text topic words to achieve a textual representation.	The numerical modelling of text objects is based on their context, considering the connections between words is a continuous textual representation
Pre-trained language models	A neural network model that obtains semantic similarity by considering surrounding terms and embedding them in a high-dimensional vector space (e.g., Word2Vec, ELMO, Transformer, ERNIE, and BERT)	Continuous-type representation based on neural networks, wherein the model automatically realizes the learning of lexical, syntactic, and semantic features; and realizes text representation learning and the effective combination of subsequent natural language processing tasks

2.3. Model Training

The last phase in text mining is model training, which uses the previously created features to carry out various tasks such as document classification, incident analysis, and compliance evaluation. Several algorithmic models that arose often in the literature analysis are listed in Table 3. The majority of earlier studies employed conventional machine learning techniques, such as SVM, KNN, and CRF, with SVM models outperforming the others in terms of performance. Convolutional neural networks (CNN), recurrent neural networks (RNN), bidirectional long and short-term memory (Bi-LSTM), and other neural network architectures have received significant attention in recent years. The BERT model was put forth by Google in 2018, and since then, self-attention mechanisms have been used in the construction field. In the years to come, the number of publications on self-attention mechanism-based approaches for construction text analysis is anticipated to rise [30]^[17].

Table 3.

Training model and introduction.

Models	Advantages	Disadvantages
SVM	Small sample classification is better, the model is not very computationally intensive, and the generalization accuracy is higher	The model is sensitive to parameter tuning and kernel function selection, takes up more memory and running time in storage and computation, and is deficient in large-scale sample training
CRF	Seen as the dominant model for named entity recognition; internal and contextual feature information can be used in the annotation process	Slow convergence and long training time
CNN	Automatic feature extraction by convolution kernel	Inability to solve long-distance dependencies, ignoring the relationship between the local and the whole
RNN	The model specializes in sequences with timing information and is able to capture the hidden relationships between sequence units	Long input sequences are prone to gradient disappearance or gradient explosion problems
Bi-LSTM	Memory with network structure, remembering information in full sentences	Not as well-utilized as CNN for parallel computing
Self-Attention	The algorithm is able to learn the internal structure of a sentence by calculating dependencies directly, regardless of the distance between words, and is relatively simple to implement	——

3. Current Theme and Topic Analysis

The results of the keyword analysis provided in Section 3.4 were slightly modified, based on a thorough reading of the 185 articles. The selected articles were grouped into four application directions: Document Management (DM), Automated Compliance Checking (ACC), Security Management (SM), and Risk Management (RM). In order to better structure the analysis of the selected articles, these four main categories were further sub-divided according to text mining tasks, as detailed in Table 4.

Table 4.

Statistical results of articles by category and task.

Construction Domains	Task	Data Source	Reference
Document Management	Knowledge extraction	Construction project documents, contract documents, engineering design plans, construction process documents, post-project review documents, tender documents, statements of work	[10,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]	^[10]^[18]^[19]^[20]^[21]^[22]^[23]^[24]^[25]^[26]^[27]^[28]^[29]^[30]^[31]^[32]^[33]^[34]^[35]^[36]^[37]
	Knowledge search
	Classification/clustering
Automated Compliance Checking	Compliance checking	Building/construction specification documents	[51,52,53,54,55,56]	^[38]^[39]^[40]^[41]^[42]^[43]
Security Management	Accident analysis	Accident reports, summary reports of disaster investigation	[22,57,58,59,60,70]	^[44]^[45]	61,	^[	62,63,	⁴⁶^]^[	64,	⁴⁷^]^[^[⁵⁰^]^[⁵¹	65,	⁴⁸^]^[^]^[⁵²^]^[	66,	⁴⁹	67,	⁵³^]^[⁵⁴^]	68,	^]	69,	^[⁵⁵^]^[⁵⁶^]^[⁵⁷^]^[⁵⁸^]
Risk Management	Factor analysis	Contractual texts, work reports	[23,30,71,72,73,74,75,]	^[17]⁶⁴^]	76,	^[^59]^[	77,	^[	78,	⁶⁵^]^[⁶⁶^]^[⁶⁷^]	79,	⁶⁰^]^[^61]^[62]^[63]^[^[⁶⁸^]	80	^[⁶⁹^]
Risk Management	Risk forecast	Contractual texts, work reports	[23,30,71,72,73,74,75,]	^[17]⁶⁴^]	76,	^[^59]^[	77,	^[	78,	⁶⁵^]^[⁶⁶^]^[⁶⁷^]	79,	⁶⁰^]^[^61]^[62]^[63]^[^[⁶⁸^]	80	^[⁶⁹^]

3.1. Document Management

The main research objectives of DM can be divided into the following three areas: knowledge extraction, knowledge retrieval, and document classification/clustering. In document knowledge extraction, Al Qady and Kandil [39]^[26] have used NLP techniques to parse contract documents into noun phrases (NP), verb phrases (VP), and prepositional phrases (PP). By identifying subject and object triads <subj, VP, obj>, they extracted contextually relevant semantic knowledge to improving functions such as document classification and retrieval, with an F-measure score of 90%. Ren, R. et al. [81]^[70] have proposed a semantic rule-based information extraction method to automatically extract construction execution steps from construction procedure documents, reducing the workload of manually collecting information from construction procedure documents while achieving an accuracy of 97.08% and a recall of 93.23%. Due to the availability of data and algorithms, document retrieval was the main topic of research, particularly in the beginning. NLP was progressively used for numerous applications up until 2015. This is often accomplished for document knowledge retrieval, by comparing the similarity of two representation vectors. Using TF-IDF and cosine similarity, Li and Ramani [82]^[71] have created an ontology-based design document query system that outperformed keyword-based search methods. By utilizing a Bayesian classifier to retrieve feature documents through similarity matching, Yu and Hsu [42]^[29] have developed a technique to reduce the dimensionality of VSM, enabling automatic and quick retrieval of CAD documents from 2094 Chinese annotated CAD drawings gathered from two actual building projects. In document classification/clustering, the general process of document classification is pre-processing, text representation, and classification modelling. In the early literature, Caldas and Soibelman [33]^[20] have implemented an automated document classification system that can automatically classify construction project documents according to project components, with an average classification accuracy of 92.05% for the three levels. The recent research of Hassan and Le [83]^[72] has classified contract language into requirement and non-requirement material using Word2Vec and SVM, in order to shorten reading times and enhance comprehension of the contract scope. As a supplement to analytical tools such as CiteSpace, some researchers [84]^[73] have recently used LDA topic modelling to automatically assign one or more topics to documents, in order to achieve document tagging, which is used to analyze historical documents and extract clustered subject terms.

3.2. Automated Compliance-Checking

Automated compliance-checking using NLP techniques is another hot topic in text mining applications. Automated compliance-checking requires understanding and extracting constraints from various building regulation documents, followed by converting them into a formal format that allows for checking/reasoning. Two authors—Zhang, J. and El-Gohary—have made significant contributions to this field. In 2015, Zhang, J. and El-Gohary [53]^[40] proposed the extraction of rules based on pattern matching and conflict resolution rules. The same year, they [51]^[38] proposed a bottom-up conversion method based on semantic mapping and conflict resolution rules, in order to extract constraints and convert them into first-order logic using Prolog syntax. Building on previous research, Zhang and El-Gohary [85]^[74] have extracted regulatory concepts and industry base category (IFC) concepts from compliance documents. They then identified the relationships between each pair of regulatory and IFC concepts to create extended IFC schemas. At present, NLP-based compliance checks are mainly used to assess architectural designs [54]^[41] and work process dependencies [86]^[75].

3.3. Safety Management

The scope of safety management includes scheduling, cost, construction process, and so on. Rupasinghe et al. [76]^[65] have used support vector machines (SVM), linear regression (LR), k-nearest neighbors (KNN), decision trees (DT), plain Bayesian (NB), and integrated models to analyze construction accident reports and classify the causes of accidents. Tixier et al. [63]^[51] have developed a manual rule-based NLP program to automatically extract attributes and results from injury reports with an F1 score of 96%. Chi et al. have combined TF-IDF, principal component analysis (PCA), and SVM to classify accident categories from documents.

3.4. Risk Management

Risk management is broadly defined as the measurement, assessment, and development of contingency strategies for all aspects of the construction production process. Current research in the field of text mining related to risk management focuses on risk factor identification and analysis, as well as risk prediction. Siu et al. [73]^[62] have applied NLP software to identify 16 new risk categories for engineering contracts from unstructured text descriptions of NEC projects in Hong Kong, and used decision trees to analyze risk ratings. Kim and Kim [74]^[63] have identified factors related to building fire accidents from news articles, and then analyzed the main factors causing fire accidents in different seasons by using principal component analysis (PCA). Li et al. [87]^[76] have developed four main safety accident data sets, where the documents were represented by doc2vec vectors. As new incident reports emerged, the most similar data sets were selected based on doc2vec similarity, in order to share key factors that predict injury levels. The data sets were then trained to recommend deep learning models, based on their meta-features (e.g., proportion of category factors), in order to maximize prediction performance. Xu, N et al. [88]^[77] have proposed an information entropy-weighted term frequency (TF-H) for term importance assessment regarding the case of a Chinese metro construction project, extracting 37 safety risk factors from 221 metro construction accident reports.

References

Cheng, C.W.; Leu, S.S.; Cheng, Y.M.; Wu, T.C.; Lin, C.C. Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan’s construction industry. Acid. Anal. Prev. 2012, 48, 214–222.
Miner, G.D.; Elder, J.; Nisbet, R.A. Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications; Academic Press: Cambridge, MA, USA, 2012.
Cohen, A.M.; Hersh, W.R. A survey of current work in biomedical text mining. Brief. Bioinform. 2005, 6, 57–71.
Van Driel, M.A.; Bruggeman, J.; Vriend, G.; Brunner, H.G.; Leunissen, J.A. A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 2006, 14, 535–542.
Ghose, A.; Ipeirotis, P.G. Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics. IEEE T Knowl. Data Eng. 2011, 23, 1498–1512.
Qazi, A.; Quigley, J.; Dickson, A.; Kirytopoulos, K. Project Complexity and Risk Management (ProCRiM): Towards modelling project complexity driven risk paths in construction projects. Int. J. Proj. Manag. 2016, 34, 1183–1198.
Soliman, E. Risk Identification for Building Maintenance Projects. Int. J. Constr. Manag. 2018, 10, 37–54.
Tembo-Silungwe, C.K.; Khatleli, N. Identification of Enablers and Constraints of Risk Allocation Using Structuration Theory in the Construction Industry. J. Constr. Eng. M 2018, 144, 116722000.
Ghosh, S.; Roy, S.; Bandyopadhyay, S.K. A tutorial review on Text Mining Algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 2012, 1, 16207659.
Ur-Rahman, N.; Harding, J.A. Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Syst. Appl. 2012, 39, 4729–4739.
Allahyari, M.; Pouriyeh, S.; Assefi, M.; Safaei, S.; Trippe, E.D.; Gutierrez, J.B.; Kochut, K. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. arXiv 2017, arXiv:1707.02919.
Hu, H.M. Construction Quality Acceptance Knowledge Modeling and Extraction; Huazhong University of Science and Technology: Wuhan, China, 2014.
Wang, Y. Event Ontology in Coal Mining Safety Field and Its Application in Query Expansion; Beijing University of Technology: Beijing, China, 2015.
Xue, X.R.; Zhang, J.S. Building Codes Part-of-Speech Tagging Performance Improvement by Error-Driven Transformational Rules. J. Comput. Civ. Eng. 2020, 34, 2723.
Zhou, P.; El-Gohary, N. Ontology-Based Multilabel Text Classification of Construction Regulatory Documents. J. Comput. Civ. Eng. 2016, 30, 530.
Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, 28, 11–21.
Fang, W.; Luo, H.; Xu, S.; Love, P.E.D.; Lu, Z.; Ye, C. Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv. Eng. Inform. 2020, 44, 101060.
Hammad, M.M. Managing project documents using virtual Web centers. In Proceedings of the Canadian Society for Civil Engineering-30th Annual Conference: 2002 Chellenges Ahead, Montreal, QC, Canada, 1 January 2002; pp. 691–699.
Caldas, C.H.; Soibelman, L.; Songer, A.D.; Miles, J.C. Implementing automated methods for document classification in construction management information systems. In Proceedings of the International Workshop on Information Technology in Civil Engineering: Computing in Civil Engineering, Washington, DC, USA, 1 January 2002; pp. 194–210.
Caldas, C.H.; Soibelman, L. Automating hierarchical document classification for construction management information systems. Autom. Constr. 2003, 12, 395–406.
Demian, P.; Fruchter, R. Measuring relevance in support of design reuse from archives of building product models. J. Comput. Civ. Eng. 2005, 19, 119–136.
Lee, T.S.; Lee, D.W.; Jee, S.B.; Tommelein, I.D. Development of Knowledge Document Management System (KDMS) for sharing construction technical documents. In Proceedings of the Construction Research Congress 2005: Broadening Perspectives-Proceedings of the Congress, San Diego, CA, USA, 1 January 2005; pp. 1183–1191.
Rezgui, Y. Ontology-centered knowledge management using information retrieval techniques. J. Comput. Civ. Eng. 2006, 20, 261–270.
Tserng, H.P.; Chang, C.H. Developing a project knowledge management framework for tunnel construction: Lessons learned in Taiwan. Can. J. Civ. Eng. 2008, 35, 333–348.
Nefti, S.; Oussalah, M.; Rezgui, Y. A modified fuzzy clustering for documents retrieval: Application to document categorization. J. Oper. Res. Soc. 2009, 60, 384–394.
Al Qady, M.; Kandil, A. Concept relation extraction from construction documents using natural language processing. J. Constr. Eng. M 2010, 136, 294–302.
Al Qady, M.; Kandil, A. Document discourse for managing construction project documents. J. Comput. Civ. Eng. 2013, 27, 466–475.
Jiang, S.; Zhang, H.; Dalian, J.Z. Research on BIM-based construction domain text information management. J. Netw. 2013, 8, 1455–1464.
Yu, W.; Hsu, J. Content-based text mining technique for retrieval of CAD documents. Autom. Constr. 2013, 31, 65–74.
Williams, T.P.; Gong, J. Construction project cost prediction using text and data mining. In Proceedings of the 14th International Conference on Civil, Structural and Environmental Engineering Computing, CC, Sardinia, Italy, 3–6 September 2013; p. 102.
Chi, N.W.; Lin, K.Y.; Hsieh, S.H. On effective text classification for supporting job hazard analysis. In Proceedings of the 2013 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2013, Los Angeles, CA, USA, 1 January 2013; pp. 613–620.
Williams, T.P.; Katsanis, C.J.; Bedard, C. Using text mining to predict construction project cost overruns. In Proceedings of the Annual Conference of the Canadian Society for Civil Engineering 2013: Know-How-Savoir-Faire, CSCE 2013, Moncton, NB Canada, 1 January 2013; pp. 1255–1262.
Al Qady, M.; Kandil, A. Automatic classification of project documents on the basis of text content. J. Comput. Civ. Eng. 2015, 29, 63.
Chi, N.W.; Lin, K.Y.; El-Gohary, N.; Hsieh, S.H. Evaluating the strength of text classification categories for supporting construction field inspection. Autom. Constr. 2016, 64, 78–88.
Hou, X.L.; Zeng, Y.; Cheng, C.B.; Zhang, H. Application of text mining in preprocessing of illness representation information of construction project. In Proceedings of the 5th International Symposium on Project Management, ISPM 2017, Wuhan, China, 1 January 2017; Aussino Academic Publishing House: Sydney, Australia, 2017; pp. 991–997.
Moon, S.; Shin, Y.; Hwang, B.G.; Chi, S. Document Management System Using Text Mining for Information Acquisition of International Construction. KSCE J. Civ. Eng. 2018, 22, 4791–4798.
Hassan, F.U.; Le, T. Computer-assisted separation of design-build contract requirements to support subcontract drafting. Autom. Constr. 2021, 122, 103479.
Zhang, J.; El-Gohary, N.M. Automated information transformation for automated regulatory compliance checking in construction. J. Comput. Civ. Eng. 2015, 29, B4015001.
Zhou, P.; El-Gohary, N. Domain-specific hierarchical text classification for supporting automated environmental compliance checking. J. Comput. Civ. Eng. 2016, 30, 2.
Zhang, J.; El-Gohary, N.M. Semantic NLP-Based Information Extraction from Construction Regulatory Documents for Automated Compliance Checking. J. Comput. Civ. Eng. 2016, 30, 04015014.
Zhang, J.; El-Gohary, N.M. Integrating semantic NLP and logic reasoning into a unified system for fully-automated code checking. Autom. Constr. 2017, 73, 45–57.
Xue, X.; Zhang, J. Part-of-speech tagging of building codes empowered by deep learning and transformational rules. Adv. Eng. Inform. 2021, 47, 1235.
Moon, S.; Lee, G.; Chi, S. Automated system for construction specification review using natural language processing. Adv. Eng. Inform. 2022, 51, 2.
Williams, T.P.; Gong, J. Predicting construction cost overruns using text mining, numerical data and ensemble classifiers. Autom. Constr. 2014, 43, 23–29.
Lipscomb, H.J.; Glazner, J.; Bondy, J.; Lezotte, D.; Guarini, K. Analysis of text from injury reports improves understanding of construction falls. J. Occup. Env. Med. 2004, 46, 1166–1173.
Zhu, Y.; Emre Bayraktar, M.; Chen, S.C. Application of metadata modeling to dispute review report management. J. Civ. Eng. Manag. 2010, 16, 491–498.
Elghamrawy, T.; Boukamp, F. Managing construction information using RFID-based semantic contexts. Autom. Constr. 2010, 19, 1056–1066.
Fan, H.; Li, H. Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques. Autom. Constr. 2013, 34, 85–91.
Chi, N.W.; Lin, K.Y.; Hsieh, S.H. Using ontology-based text classification to assist Job Hazard Analysis. Adv. Eng. Inf. 2014, 28, 381–394.
Zhao, D.; McCoy, A.P.; Kleiner, B.M.; Smith-Jackson, T.L. Control measures of electrical hazards: An analysis of construction industry. Saf. Sci. 2015, 77, 143–151.
Tixier, A.; Hallowell, M.R.; Rajagopalan, B.; Bowman, D. Application of machine learning to construction injury prediction. Autom. Constr. 2016, 69, 102–114.
Goh, Y.M.; Ubeynarayana, C.U. Construction accident narrative classification: An evaluation of text mining techniques. Accid. Anal. Prev. 2017, 108, 122–130.
Mahfouz, T.; Kandil, A.; Davlyatov, S. Identification of latent legal knowledge in differing site condition (DSC) litigations. Autom. Constr. 2018, 94, 104–111.
Zhang, F.; Fleyeh, H.; Wang, X.; Lu, M. Construction site accident analysis using text mining and natural language processing techniques. Autom. Constr. 2019, 99, 238–248.
Baker, H.; Hallowell, M.R.; Tixier, A.J.P. Automatically learning construction injury precursors from text. Autom. Constr. 2020, 118, 103145.
Cheng, M.Y.; Kusoemo, D.; Gosno, R.A. Text mining-based construction site accident classification using hybrid supervised machine learning. Autom. Constr. 2020, 118, 103265.
Yu, W.D.; Chang, H.K.; Lai, C.H. A knowledge management-based engineering design system for highway design projects. Int. J. Appl. Sci. Eng. 2021, 18, 1–13.
Goldberg, D.M. Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability. J. Saf. Res. 2022, 80, 441–455.
Zhang, F. A hybrid structured deep neural network with Word2Vec for construction accident causes classification. Int. J. Constr. Manag. 2019, 22, 1120–1140.
Jiang, S.; Zhang, J.; Zhang, H. Ontology-based semantic retrieval for risk management of construction project. J. Netw. 2013, 8, 1212–1220.
Lee, J.; Yi, J.S. Predicting project’s uncertainty risk in the bidding process by integrating unstructured text data and structured numerical data using text mining. Appl. Sci. 2017, 7, 1141.
Siu, M.F.F.; Leung, W.Y.J.; Chan, W.M.D. A data-driven approach to identify-quantify-analyse construction risk for Hong Kong NEC projects. J. Civ. Eng. Manag. 2018, 24, 592–606.
Kim, J.S.; Kim, B.S. Analysis of Fire-Accident Factors Using Big-Data Analysis Method for Construction Areas. KSCE J. Civ. Eng. 2018, 22, 1535–1543.
Li, J.; Wang, J.; Xu, N.; Hu, Y.; Cui, C. Importance degree research of safety risk management processes of urban rail transit based on text mining method. Information 2018, 9, 26.
Rupasinghe, N.K.A.H.; Panuwatwanich, K. Understanding construction site safety hazards through open data: Text mining approach. ASEAN Eng. J. 2021, 11, 160–178.
Faraji, A.; Rashidi, M.; Perera, S. Text Mining Risk Assessment-Based Model to Conduct Uncertainty Analysis of the General Conditions of Contract in Housing Construction Projects: Case Study of the NSW GC21. J. Arch. Eng. 2021, 27, 04021025.
Choi, S.J.; Choi, S.W.; Kim, J.H.; Lee, E.B. Ai and text-mining applications for analyzing contractor’s risk in invitation to bid (ITB) and contracts for engineering procurement and construction (EPC) projects. Energies 2021, 14, 4632.
Luo, X.; Liu, Q.; Qiu, Z. A Correlation Analysis of Construction Site Fall Accidents Based on Text Mining. Front. Built Environ. 2021, 7.
Chen, S.; Xi, J.; Chen, Y.; Zhao, J. Association Mining of Near Misses in Hydropower Engineering Construction Based on Convolutional Neural Network Text Classification. Comput. Intell. Neurosc. 2022, 2022, 1–16.
Ren, R.; Zhang, J. Semantic Rule-Based Construction Procedural Information Extraction to Guide Jobsite Sensing and Monitoring. J. Comput. Civ. Eng. 2021, 35, 20.
LI, Z.; Ramani, K. Ontology-based design information extraction and retrieval. Artif. Intell. Eng. Des. Anal. Manuf. 2007, 21, 137–154.
Hassan, F.U.; Le, T. Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing. J. Leg. Aff. Disput. Res. 2020, 12, 2.
Bilge, E.Ç.; Yaman, H. Research trends analysis using text mining in construction management: 2000–2020. Eng. Constr. Archit. Manag. 2021, 29, 3210–3233.
Zhang, J.S.; El-Gohary, N.M. Extending Building Information Models Semiautomatically Using Semantic Natural Language Processing Techniques. J. Comput. Civ. Eng. 2016, 30, 44.
Zhong, B.; Xing, X.; Luo, H.; Zhou, Q.; Li, H.; Rose, T.; Fang, W. Deep learning-based extraction of construction procedural constraints from construction regulations. Adv. Eng. Inform. 2020, 43, 101003.
Li, X.; Zhu, R.C.; Ye, H.; Jiang, C.X.; Benslimane, A. MetaInjury: Meta-learning framework for reusing the risk knowledge of different construction accidents. Saf. Sci. 2021, 140, 105315.
Xu, N.; Ma, L.; Liu, Q.; Wang, L.; Deng, Y. An improved text mining approach to extract safety risk factors from construction accident reports. Saf. Sci. 2021, 138, 105216.