2. Data Acquisition (DAQ)
Successful project management and delivery require control over all the aspects of the project, e.g., resource usage including labor hours, material, and equipment 
. For efficient project control, project management teams require an accurate data collection strategy to collect from the worksite and compare it with as-planned data to stay aware of the progress and be able to deliver the project within the planned cost and time 
. DAQ is the first sub-process of the CV-based CPM process and refers to the collection of vision datasets as inputs for the said process. Construction projects are complex and involve hundreds of activities, which create an unstructured and complex environment 
. Various activities are simultaneously performed at a construction site with hundreds of laborers, equipment, and materials all the time. The CV-based system requires an accurate vision dataset—the image and video datasets are collectively called vision datasets—to identify features or create a point cloud dataset. Owing to the construction site complexity, it is challenging to obtain a clear vision dataset of either photos or videos for efficient DAQ 
. Many studies have proposed several DAQ techniques using various digital cameras 
. The literature revealed that the following methods are being used to capture site data that can provide input for a CV-based CPM system.
2.1. Unmanned Aerial Vehicles (UAVs)
UAV is a generic aircraft design with no human pilot onboard to operate the aircraft 
. Recently, UAVs have rapidly entered the architecture, engineering, and construction industry, and their use is expected to grow in the future 
. For CV-based applications, UAVs are equipped with an optical sensor or a digital camera. Modern UAVs are also equipped with a communication system to transmit the captured vision dataset in real time 
. UAVs are also quick and cost-effective methods and allow for data collection at places inaccessible by ground-based or manned vehicles 
. To capture an accurate vision dataset, UAVs require an expert operator and a well-planned flight path with various data capturing angles. However, modern UAVs allow for a pre-planned flight path to be programmed into it, allowing a certain degree of automation in DAQ 
Mahami et al. attempted to reduce the number of photos required to create an accurate point cloud model and experimented in a physical construction environment. The high-quality camera was attached to a UAV which acquired the vision dataset to extract the measurements of as-built walls to calculate the volume of work achieved. The proposed method with the data acquired through UAVs reported a 99% accuracy for the volume of completed work 
. Similarly, Kielhauser et al. 
attempted to estimate the cost of UAV deployment for CPM and quality management and selected a mixed-use commercial building as a test project. The UAV was programmed for an automatic flight on a pre-determined path to target the external wall section and concrete slab only. The study acquired the volumetric as-built data and compared it with the as-planned model to estimate the percent completion of targeted activities. The study successfully demonstrated the use of UAVs for progress monitoring however it reported untidiness and cluttering of construction sites as a potential hindrance to data acquisition through UAVs. The usefulness of the UAVs for acquiring data to enable the CV-based CPM process is evident however studies reported several limitations to the adoption of UAVs, i.e., use is limited to mostly external construction features, the overall process is time-consuming and requires expert manpower, requires costly equipment and hence is costlier than traditional practices in the field 
Useful 3D point clouds can be generated if all features of the construction process are visible throughout the UAV’s flight path 
. The UAV-enabled vision DAQ has been compared with crane-mounted and terrestrial handheld digital cameras, showing that the UAV-enabled technique was more efficient and flexible and enabled better coverage 
. Most studies have explored and reported the benefits of UAVs for outside construction; hence, the use of UAVs in interior CPM is the least explored research area 
. As a result, UAVs are very useful tools to capture the vision dataset for an automated CV-based CPM process provided that there is a good-quality digital camera, global positioning system, communication system, and well-programmed automated flight path to cover all possible elements of a construction project 
2.2. Handheld Devices
A handheld device is any compact and portable device that can be held, carried, or used by one or both hands. The use of handheld imaging devices, such as smartphones and digital cameras, is common at present 
. Smartphone, digital single-lens reflex, mirrorless, film, and 360° cameras are well-known handheld devices to acquire vision datasets. From setting out acquisition geometry, collecting vision datasets to transmit data for further processing is a manual process 
. Various studies have explored the potential of handheld devices for vision DAQ to measure the construction progress based on feature detection, e.g., concrete walls, drywalls, and bricks 
. Daily site photologs captured by handheld devices are useful in generating point clouds, identifying various construction features, and estimating construction progress 
. For example, Golparvar-Fard 
identified that construction site staff usually take more than 500 photos a day using several handheld and off-the-shelf cameras and utilized the unordered daily site photologs to extract useful information to be compared with as-planned. The study successfully demonstrated the extraction of point cloud models for comparison and analysis after automatically ordering, calibrating, and removing occlusions. However, the study focused entirely on addressing the technical feasibility of the proposed concept rather than addressing the progress monitoring and tracking of various construction features. Mahami et al. 
photographed a real construction environment and extracted the measurements of external construction features. This used a handheld camera and the photographer moved around the entire site taking photos at specific intervals, varying angles, and fixed orientations making the process of data acquisition entirely manual and labor-intensive. Early research in CV-based CPM utilized unordered daily site photologs and other vision datasets captured specifically for extracting the point cloud models but the focus has shifted towards acquiring the vision datasets automatically without human intervention. Handheld devices provide certain flexibility during vision DAQ in adapting according to the site conditions and types of data required; however, the coverage is limited and not useful for an automated CV-based CPM process.
2.3. Fixed on Mounts
The term fixed on mounts indicates various camera systems mounted on camera stands, polls, formworks, cranes, robots, etc., for collecting required vision datasets. These systems can be designed to capture vision data on a short- or long-term basis. They are mounted at a specific place to capture the required elements at desired angles. These systems are sometimes connected with a wired or wireless communication system to transmit data for further processing 
Various studies have employed these systems for element recognition, 3D point cloud generation, and progress calculation 
. Few studies have also mounted these systems on a crane to cover a large area and provided less occluded 3D point clouds for construction progress estimation 
. One study 
addressed the technical challenge of multi-building extraction and alignment of as-built point clouds. The study utilized the data captured through two stereo cameras installed on a tower crane on a mi-used construction project including a shopping mall, hotel, housing, and offices. The researchers reported the successful acquisition of the vision datasets using crane-mounted cameras and subsequent analyses to estimate construction progress. Tuttas et al. argued that despite the effort of installation of a camera on a crane jib, associated maintenance, limited range of the motion of the crane, and its fixed position in a single plan of view; data acquisition from crane-mounted cameras can be designed for a fully automated process 
. A self-navigating robot-mounted camera system has been explored to create 3D point clouds for an interior of a building, and the usefulness of such systems for various construction management purposes has been reported 
. A fixed on-mount vision DAQ technique can be fully automated by equipping the camera system with a pan–tilt–zoom function and a pre-determined coverage area programmed into it 
. Despite several proofs-of-concept, the cameras fixed on mounts do not provide complete coverage of the construction site and cover all the construction features making the point clouds fragmented. Future research must undertake to explore the possibility of acquiring vision datasets using multiple cameras and integrating the output to get more detailed point cloud models hence a more accurate comparison between as-planned and as-built.
2.4. Surveillance Cameras
Surveillance cameras are video cameras installed to observe an area for multiple security and monitoring-related purposes. These camera systems transmit video and audio signals to a digital video recorder where the video data can be viewed, recorded, or processed for the required purpose.
Few studies have attempted CV-based CPM using the video feed or video data from surveillance cameras installed on a construction site, as opposed to most available studies that explored the viability of image data and retrieved information by image processing using various techniques 
. For example, Wu et al. 
recognize the work cycles for an earthmoving excavator by constructing its Stretching–Bending Sequential Patterns (SBSP). This utilized long video sequences and recognized the complete cycle of an excavator, i.e., digging, hauling, swinging, and dumping. This accurately recognizes the work cycle and estimated the progress of the equipment by multiplying the excavator’s capacity by the number of cycles counted 
. They presented a framework encompassing object detection, instance segmentation, and multiple object tracking to collect the location and temporal information of precast concrete wall installation on a construction site. However, the study reported that the movement of the camera and view range of the surveillance camera on a construction site significantly influence the effectiveness of the vision datasets. Video data from surveillance are successfully processed to obtain the progress of various prefabricated construction elements and the working of machinery at a construction site 
. Surveillance cameras can be potential DAQ techniques for the automated CV-based CPM process provided a well-planned layout and several cameras are installed throughout the vicinity of the construction site 
3. Information Retrieval
The acquired vision datasets contain vital as-built information from the construction environment. In the construction environment, the data collected from the worksite hold significant importance as they help in analyzing and reporting the progress of the project and enable project management teams to gain valuable insights regarding the actual status of the project in terms of physical progress, earned labor hours, material consumed, equipment utilized, etc. 
. Once the DAQ has been performed and data are transmitted or transferred to a storage medium, the next and most important sub-process is to extract useful information from the vision data. Information retrieval is performed through signal processing or, more precisely, image processing. Images from an image dataset or frames from a video dataset are the inputs, and the outputs are usually some characteristics or features associated with the inputs. For CPM, usually, the information retrieval sub-process aims to obtain an as-built model in the form of a 3D model or a 3D dataset, which is then compared with an as-planned model to estimate the progress of various activities of a construction process 
The research retrieved and selected have proposed various techniques to extract the required information from the data acquired. These can be grouped into four distinct categories, i.e., (1) classification, (2) edge detection, (3) quantification, and (4) object tracking 
. In addition, each category has several other techniques to process the associated vision datasets. The key techniques are discussed below.
3.1. Structure from Motion (SfM)
SfM is a technique that reconstructs a 3D structure/model/point cloud using 2D images of a scene or an object. It is a photogrammetric imaging technique and lies in the quantification category along with digital image correlation 
. The term quantification means a method of obtaining real-life measurements from a 2D image dataset 
. SfM reconstructs 3D models by matching features in various images and estimating the relative position of a camera. The inputs are in the form of image data with recommended 60% side overlap and 80% forward overlap between images to realize high-quality and detailed 3D models or 3D point clouds as outputs 
. This technique automatically detects and matches features from an image dataset of varying scales, angles, and orientations. Various studies have demonstrated the use of image-based reconstruction utilizing high-quality images taken from the construction environment for progress monitoring, productivity measurement, quality control, and safety management, providing the project management teams with a remarkable opportunity to visualize as-built data 
Unordered image collection from construction sites has been used in various studies to test the effectiveness of SfM, and high accuracy of generated 3D as-built models has been reported 
. Moreover, 
utilized the high-quality images taken from the interior scene of a construction project to demonstrate image-based 3D reconstruction through SfM and compared it with the output of a laser scanner. The study concluded that the accuracy of the model generated from the image-based reconstruction was less than the laser scanner however the proposed approach automatically overlays the hi-resolution images to 3D point clouds models which presented its potential for its use in progress monitoring through as-built visualization. Another study 
collected several images from proper positions from two real-life construction projects, i.e., one-story, and two-story residential buildings. The SfM technique was deployed to generate a 3D point cloud model for two case study projects and quantities were calculated using the proposed technique. This reported 99% accuracy and identified that this system becomes less accurate as the length of the building/element increases. The process of reconstructing a 3D model from an image dataset remains reliant on human intervention at various steps to improve the output quality.
3.2. Convolutional Neural Network (CNN)
CNN is a technique that identifies and differentiates various objects or features in an image by assigning weights and biases to them 
. CNN is a Deep Learning (DL) algorithm and falls under the feature detection/classification category of CV-based analysis. Long Short-Term Memory (LSTM), which can analyze and obtain information from video frames, belongs to the same category. The term DL refers to Machine Learning (ML) in an artificial learning environment that is capable of learning unlabeled or unstructured data without supervision. CNN comprises a convolutional and a pooling layer; usually, a pooling layer is added after the convolutional layer. The input is in the form of an image, and the convolutional layer uses matrix-based scanning over the image and identifies features. Later, the pooling layer reduces the number of parameters to learn and computations required by a network, thereby reducing the size of feature maps, which is a summarized version of features detected in the input 
. In recent years these CNN-based techniques have achieved further development in the construction domain 
. Object detection and tracking have been the interest of many researchers, e.g., unsafe behaviors were detected by tracking the workers while walking on formwork supports 
, and diverse construction activities were also recognized to save the valuable time of project management teams 
, also single worker and equipment were tracked for longer periods for calculating productivity 
, multi-worker/machinery tracking also for productivity estimation 
Few researchers have used CNN to monitor the progress of construction machinery and the installation of various prefabricated components 
. In a recent study 
, the installation of precast concrete walls was monitored by detecting and tracking individual wall panels by utilizing the video feed from a surveillance camera installed on a construction site. This vision method was designed to get two types of information, i.e., time information and location information. The study reported the useability of such algorithms for CPM purposes and directed further research to extend this technique to detect other construction features as well. Similarly, another study 
demonstrated the combination of the CNN technique to identify the work cycle of an earthmoving excavator by utilizing long video sequences. This demonstrated the feasibility of the idea of calculating the stretching-bending cycle of the excavator to estimate the quantity of earth moved during the overall operation. However, the proposed technique was simpler and further research was directed to explore the viability of such techniques for accurate measurement of work cycles and hence an accurate measurement of progress. The CNN process requires pre-training of the algorithm to efficiently identify various features from the input and can automate the entire process.
3.3. Support Vector Machines (SVM)
SVM is a technique that classifies the features or information in an image by assigning positive and negative values to features across a hyperplane. SVM is a classifier that lies in the classification/feature detection category. Unlike ANN, SVM is a supervised ML technique highly regarded for its two-group classification with a higher degree of accuracy; however, multigroup classification can be achieved by dividing a problem into several two-group classification problems 
. The input is in the form of an image, and a pre-trained SVM classifier performs a binary analysis and classifies various features by drawing a hyperplane between two groups.
Various studies have implemented SVM to detect various construction materials and estimate the project progress 
. For example, 
inferred the construction activity of girder launching for a rail project. This utilized the structural responses collected from the girder launching equipment and identified the exact state of a girder, i.e., auto launching, segment lifting, post-tensioning, and span lowering. However, this was a demonstration of such techniques and highlighted the limitations of only relying on structural responses, and directed future studies to integrate more sensors to get accurate feedback. Another study 
investigated the installation of drywall using a video feed from the interior construction environment. Based on the identification of three different states of drywall panel during installation, i.e., installation, plastering, and painting of panels, the progress of drywall installation were measured. The SVM was trained with the extracted feature to demonstrate the success of the proposed technique. The learning of SVM can be significantly improved using the k-nearest neighbor algorithm 
; however, not many studies can be found on testing its performance in a real-world construction site with a great degree of uncertainty, occlusions, and variability.
3.4. Simultaneous Localization and Mapping (SLAM)
SLAM is a technique of reconstructing or updating a 3D map of an unknown location while navigating through it 
. SLAM is similar to SfM; however, SLAM maps an environment in real time. Similar to SfM, SLAM is a photogrammetric technique and is in the quantification category. SLAM learns by moving around an environment and searching for known features, which can be achieved by moving around in the environment once or multiple times. The inputs are in the form of images obtained from video frames and the outputs are in the form of feature points 
A preliminary study investigated the effectiveness of SLAM and reported its potential application in tracking construction equipment 
. For example, 
conducted a pilot study to demonstrate the real-time 3D reconstruction of a construction environment by utilizing visual SLAM and UAV. This discussed the use of the proposed technique on three different projects, i.e., by calculating the volume of earthwork between two instances, measuring the progress of pavement compaction by tracking the equipment on a job site, and tracking site assets, e.g., labor, equipment, material, etc. The study proposed a primitive SLAM algorithm and highlighted various limitations, i.e., limitations of performance in complex construction environments, limited sensing range of visual sensors, memory management, and difficulty of maneuvering UAVs through construction worksites. Despite this demonstration, this technique is less explored for construction progress estimation, and its effectiveness is subject to further research in this domain.
3.5. Cascading Classifiers (CC)
The CC technique is a technique to detect and track an object or a feature in an image. CC lies in the classification/detection category 
. It is an ML-based approach in which the classifier is trained by inputting many positive and negative images. The positive images are the ones intended to be recognized by the said classifier; otherwise, they are negative. The inputs are in the form of images from a construction environment, and then a pre-trained CC identifies various features from the dataset and indicates or highlights them on the input images. The accuracy of this technique depends on a detailed algorithm and pre-training using a well-sorted image dataset.
Few studies have attempted using CC in progress monitoring by detecting construction features such as drywall or concrete walls and reported a good performance 
. For example, 
attempted to automate the progress monitoring for the interior construction environment and focused on the visualization and computer vision techniques by utilizing an object-based approach. In the proposed approach, the study compares as-built BIM and as-planned images in a 3D walkthrough model. The rapid object detection scheme based on the Haar-like cascading classifier was deployed to detect features from the acquired vision dataset. The cascading classifier utilized was first trained to detect specific construction features from the images using a couple of hundred positive and negative samples. However, the proposed algorithm was limited to specific construction features and this suggested that detecting multiple features from a complex construction environment requires further research towards modifying and improving such an algorithm. The supervised training of this technique makes it less desirable for a fully automated CV-based CPM process.
3.6. Histogram of Oriented Gradients (HoG)
HoG is a feature descriptor and is used for object detection. HoG is a feature extraction technique and lies in the classification/detection category. HoG identifies features in an image by returning a descriptor of each cell that it creates when an input image is given to the algorithm. Each input is decomposed into small cells or blocks, and the algorithm computes the HoG by counting occurrences in each cell or block and returns the detection of various features present in an image. This technique accurately detects various construction features by focusing on their shapes 
. The detection methods that rely on visual features, e.g., shape and color have been proposed and tested in construction scenarios. HoG feature is among the top two popular shape-based features that are being used to detect construction workers and equipment 
Few studies have explored the effectiveness HoG technique in CPM using a CV-based dataset by identifying and tracking construction workers and equipment 
. For example, 
attempted to automate the estimation of the progress of earthmoving activity by monitoring the movement of dump trucks on large-scale construction projects. The said study evaluated the combination of HoG algorithms to recognize off-highway dump trucks in a noisy video stream. The study effectively demonstrated the ability of HoG algorithms to detect the activity of trucks in an effective and timely manner and presented its usefulness in productivity measurement, performance control, and other safety-related applications on large-scale civil construction projects. The combination of HoG with tracking techniques has successfully reported very precise detection and tracking of workers and equipment on a construction site for CPM purposes 
. However, very few studies have explored this technique and further research in this domain is required to ascertain the effectiveness of this technique in CPM-related applications.
3.7. Laplacian of Gaussian (LoG)
LoG is a well-known algorithm for detecting edges and is widespread in image processing and CV applications. The Laplacian algorithm is used to detect edges but is sensitive to noise; therefore, a Gaussian filter is commonly applied to images to remove noise, yielding LoG as their combination 
. LoG lies in the edge detection category. LoG filters are derivative filters that work by detecting the rapid changes in an image. They detect objects and boundaries and extract features.
CPM by counting brick has been attempted using LoG and reported relatively higher precision values 
. Hui and Brilakis 
attempted to automatically count the number of bricks ordered vs. consumed to eliminate manual surveys for the said purpose. The study proposed novel and automated method to count bricks during the facade construction of a building. This method utilized images and videos from the construction site and selected the color thresholds based on the color of the bricks. The LoG was deployed to detect the edges of the bricks from the constructed wall and compared various known features, i.e., shape and size to accurately detect the number of bricks. However, the implementation of LoG in CPM is one of the least explored areas. This technique requires various manual steps to achieve the desired accuracy, making it a less desirable option for automating the entire CV-based CPM process.
3.8. Speeded-Up Robust Features (SURF)
The SURF technique is a template matching technique that detects features from an image. It is a feature extraction or detection technique that lies in the classification category 
. It can be used for object recognition, image registration and 3D point cloud generation. The SURF technique computes operators using a box filter that enables fast computation, thereby allowing real-time object detection and tracking.
A recent study 
attempted CPM of prefabricated timber construction using surveillance cameras and reported near real-time monitoring. This proposed an automated installation rate measurement system using inexpensive digital cameras installed on the mast of a tower crane. The time-lapse footage of the construction sequence was processed and analyzed for precise progress information. The study also successfully demonstrated the ability of SURF in aligning removing vision differences of images resulting from wind and tower crane vibrations. The study further reported the 95% accuracy rate of detected timber panels during the observation. This also directed future research efforts towards the proper setup of gear by ensuring the minimum level of noise in the footage and algorithm improvements for multiple camera feedback. The process of point cloud generation and registration can also be enhanced using this technique 
. The implementation of this technique in the CV-based CPM can automate the whole process. However, assessing the benefits of this technique in achieving the intended targets requires further research efforts.
4. Progress Estimation
Progress estimation is the process of determining whether construction execution is according to a pre-planned or baseline schedule. In CV-based CPM, this process can also be termed the comparison between as-built and as-planned. This comparison provides information on whether the intended construction activities are executed according to the schedule, upon which construction managers can take necessary actions to keep the project on track and avoid construction delays. Distinct techniques were proposed in the research retrieved and selected to obtain necessary information on the progress status of various construction activities by comparing as-built and as-planned models and otherwise.
4.1. Building Information Models (BIMs) Registration
Building information modeling is a process of creating and managing digital representations of any built entity in a highly collaborative environment. BIMs are highly intelligent, data-rich, and object-oriented models; they not only represent various objects and spaces of buildings but also contain knowledge on how these objects and spaces relate 
. These qualities make BIMs efficient as-planned models to perform the comparison between as-built and as-planned to estimate progress 
. Usually, 4D BIMs, which are 3D models integrated with the fourth dimension, i.e., time, are used as as-planned models to be compared with superimposed as-built models 
. For automated CV-based CPM, many studies have attempted various techniques of acquiring 3D point clouds or as-built models and reported the intended results by comparing as-built with as-planned BIMs 
. The intended results are reported by comparing as-built with as-planned BIMs.
The process of superimposing an as-built model over an as-planned model is called registration. The registration process requires post-processing of the acquired CV-based data to remove noise. There are two distinct methods of image model registration: coarse registration and fine registration. The coarse registration along with post-processing allows for rough alignment, whereas the fine registration can achieve near-optimal alignment 
. The coarse registration can be achieved through various approaches, such as plane-based matching, principal component analysis-based alignment 
, plane patches-based matching, 3D to 2D transformation 
, building extraction, and alignment for multi-building point clouds 
, etc. For example, 
proposed a semi-automated plane-based coarse registration approach and compared the proposed method with already existing general-purpose registration software, and reduced the complexities and time requirements associated with this process. This system addressed the issues of self-similarities at the object and model level by the semi-automated matching stage and demonstrated resilience and robustness in challenging registration cases. The plane-based registration finds the matching planes from as-built and as-planned datasets and aligns both models 
. However, the plane patched-based system is the state-of-art of current practice, allowing for automatic registration using a 4-plane approach rather than a 3-plane as in a plane-based process 
. Moreover, the Iterative Closest Point (ICP) is most frequently used for fine registration in CV-based CPM studies 
. For example, 
proposed a fully automated registration of 3D data to a 3D CAD model for CPM purposes. This deployed a two-step global-to-local registration procedure, i.e., principal component analysis-based global registration and an ICP-based local registration, and demonstrated that not only this proposed technique fully automates the process but also proved beneficial for project progress monitoring purposes. In conclusion, few studies have proposed semi-automated and fully automated registration techniques and directed further research efforts in exploring the useability of these techniques for CPM in the real construction environment.
4.2. Progress Estimation through Object Recognition/Matching
The progress estimation process requires the recognition of various features or objects present in the built environment following the model registration process. The comparison between as-built and as-planned datasets does not provide useful information by the mere superimposition of both models. This requires proper recognition, identification, or classification of construction features in the as-built point cloud, thereby comparing information available in object-based as-planned BIMs to retrieve information.
Many scholars have proposed techniques to detect, classify and recognize various features of construction, such as walls, panels, tiles, ducts, doors, windows and furniture. 
. A few of these techniques are Mask R-CNN 
, DeepSORT 
, Voxel 
, OpenGL 
, Probabilistic Model 
, Point Net 
, Surface-based recognition 
, timestamp 
, point calculation 
, segmentation by color thresholding 
, color images 
, etc. These techniques retrieve useful information from object-based models that calculate progress estimation. Apart from as-built vs. as-planned comparison, a few studies have also estimated progress based on counting equipment cycles 
, material classification 
, material usage 
, and installation speed 
5. Output Visualization
Output visualization corresponds to the presentation of useful information or results obtained from the information retrieval or progress estimation. In the CV-based CPM process, output visualization is as essential as DAQ, information retrieval, and progress estimation. The results of this sub-process are crucial to CMTs, as they must make decisions based on the output extracted from the entire process. Traditionally, CMTs use reports, Gantt charts, or other visual techniques. The literature on the output of the CV-based construction management process suggests a few visualization techniques, which are discussed as follows.
5.1. Color Labels
Color labels are the most frequently used form for representing the information on a vision dataset. The color labels used in CV environments are called bounding boxes. These labels can provide a range of information depending on the purpose of an algorithm or a process. The output shown by these labels can be classification, identification, segmentation, verification, detection, recognition, etc. 
Few studies have also superimposed various forms of color labels on the input images to visualize the current state of construction activities under consideration 
. For example 
, the researchers utilized color labels for performance monitoring. A color label was given to each construction component to indicate whether that component was built ahead of schedule (green label), on time (semitransparent white label), or behind schedule (red label). Furthermore, different color variations were also suggested in annotating other factors as well, i.e., darker blue label to indicate that component has not been built as planned. Similarly, another study 
color-coded the construction elements to indicate whether the element in consideration was behind or on schedule. The green label was utilized to annotate the element on schedule, red labels were assigned to elements behind schedule and grey labels indicated those elements whose progress has not been reported. However, the proposed thresholds were tailored for the specific construction elements and require further research to be proven significant in different cases 
. The size, shape, and description of these color labels depend on the technique or selected algorithm and can be modified as per the project requirements.
5.2. Augmented Reality (AR) and Virtual Reality (VR)
AR is an interactive experience of a physical world with useful information loaded onto the video feed for multiple purposes and operations 
. Output visualization of CV-based CPM can also be enabled by VR after processing vision datasets for extracting construction progress status. Some studies have explored the use of AR by linking it with processed BIMs for monitoring construction progress 
. An object-based interior CPM was proposed by 
utilizing the common as-built construction photographs and displaying the interior construction progress by imposing color and pattern coding based on the actual status. The study reported the difficulty in object detection and classification in the interior construction environment and directed future studies to improve the algorithm to automatically detect various types of interior objects without manual human intervention. Another study 
proposed a real-time AR-based system for modular CPM. The proposed system demonstrated the automatic AR registration method represented by relative coordinates and a fixed camera and successfully presented a live animation of the construction sequence. However, the study was conducted in a controlled lab environment using a simple mockup of a building. To sum it up, AR-based visualization requires the accurate alignment of BIMs and real-world data. For such accurate positioning, sophisticated surveying equipment is required. Another approach is to install fiduciary markers to locate and estimate the exact location. Mobile-based AR systems can provide the required accuracy with construction status. However, it requires further research to be implemented for CV-based CPM 
5.3. Earned Value Management (EVM)
EVM is a project monitoring and control technique that integrates cost, time, and scope to calculate project performance 
. It requires as-planned and actual information on all three constraints to calculate the schedule and cost variances and provide the schedule and cost performance indexes as an alternative means of accessing the project performance. EVM is a valued output for CMTs as it enables them to access the current state of a project and make necessary decisions to keep the project on track.
A recent study has suggested that the output of CV-based monitoring of construction projects can be integrated with EVM systems 
. An automated calculation of EVM indicators can provide necessary project control information to identify potential delays and make useful decisions to control construction delays and cost overruns 
. However, the retrieved literature does not provide practical evidence of EVM-based output from CV-based CPM.