Multiscale-Deep-Learning Applications

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Mohd Asyraf Zulkifley	--	5023	2022-10-25 07:53:03	\|
2	format correct	Vivi Li	Meta information modification	5023	2022-10-26 03:07:24	\|

This entry is adapted from the peer-reviewed paper 10.3390/s22197384

In general, most of the existing convolutional neural network (CNN)-based deep-learning models suffer from spatial-information loss and inadequate feature-representation issues. This is due to their inability to capture multiscale-context information and the exclusion of semantic information throughout the pooling operations. In the early layers of a CNN, the network encodes simple semantic representations, such as edges and corners, while, in the latter part of the CNN, the network encodes more complex semantic features, such as complex geometric shapes. Theoretically, it is better for a CNN to extract features from different levels of semantic representation because tasks such as classification and segmentation work better when both simple and complex feature maps are utilized. Hence, it is also crucial to embed multiscale capability throughout the network so that the various scales of the features can be optimally captured to represent the intended task.

machine learning artificial intelligence deep learning neural network convolutional neural network multiscale features

1. Introduction

Automated systems that utilize advanced artificial intelligence technology, and particularly the deep-learning method, have transformed people's lives by simplifying various everyday tasks. This technology continues to fascinate people with its limitless opportunities in almost every sector, including e-commerce, healthcare, manufacturing, and entertainment, among many others. It has been extremely successful when applied to imaging input, and it seems to be performing better than humans in a variety of use-case scenarios, and especially in the three most challenging computer-vision applications, which are classification, object detection, and segmentation.

For the image-classification task, the main concern is to obtain the primary labels of all the items that are visible in the image, while, for the object-detection task, the algorithm’s complexity is taken a step further by the attempt to determine the location of the object in the form of a bounding box, in addition to its class. The segmentation task takes the automation challenge to a whole new level by attempting to determine the exact borders of the items in the image. A major category of this task is semantic segmentation, which involves dividing a picture into parts based on its semantic content. It aims to identify the label for each pixel in the image, or, in general, it might refer to segmenting all the pixels in an image into different object categories. It also attempts to split the picture into semantically relevant pieces, and to categorize each portion into one of the predetermined classes, using semantic-segmentation techniques. The same goal can also be achieved by each pixel rather than classifying the entire image. This is what has been defined as a pixel-wise classification, which leads to the same result, but with a slightly different approach. Due to the importance of pixel-wise labeling, there have been various computer-vision projects that rely on semantic segmentation as their foundations. In summary, the objective of semantic segmentation, given an image, is to assign a category label to each pixel ^[1]. It is a challenging task with many real-world applications, such as autonomous vehicles, sensing technologies, and scene recognition ^[2]. This semantic-segmentation problem effectively covers both the classification and localization subproblems. However, both these subtasks are inherently conflicting and have a substantial impact on the segmentation model’s design principles. In the case of classification, the models should be invariant to local image details, such as in-plane transformations and deformations, in order to acquire higher-level representations that reflect the global context ^[3]. This simplification of the spatial information is unfavorable for the localization task, as it requires the model to resolve pixel-level information with good accuracy ^[4].

It has been proven that the implementation of CNN-based techniques has produced state-of-the-art performances in computer-vision applications for the tasks of classification ^[5], segmentation ^[6], object detection ^[7], and many more. CNN is a model that utilizes feed-forward networks with many hidden layers that are mostly used to extract spatial features for image-processing and object-detection applications ^[8]. Its hidden layers learn the respective features through a sliding-convolutional-kernel operation, which is usually coupled with an activation function and a pooling layer to produce the best set of feature maps. The last fully connected layer then identifies the object categories. The CNN’s excellent ability to automatically learn the complicated features during the training phase, rather than depending on the handcrafted set of features, is one way in which the segmentation accuracy may well be improved.

CNN feature maps contain spatial variance, which directly implies that they contain spatial dimensions, and as a result, features that are represented by a specific set of feature maps may only become active in a subset of the map’s spatial regions. Apart from the spatial dimensions, this spatial variance also includes spatial scales among the respective entries, at which the localization information of the input images is utilized to estimate the spatial mappings that link them to the input-image regions. Each CNN feature-map activation is only linked to a small number of input units that are all located in the same spatial neighborhood. The spatial-variance characteristic of CNN feature maps arises as a result of the local spatial structure of the convolution filters and the spatially limited receptive fields of their convolution filters. Once these spatial-variance features are extracted, they are then further processed by multiscale processing, and in the end, they improve the ability of the CNN model to obtain a better learning ability.

For the segmentation task, there is also a compromise between the depth of the network and the complexity. However, the reduction in the feature-map size for the classification task will not have too many negative effects, as the goal is to discriminatively identify the class, without concern for the spatial coherency. Hence, if researchers solely process these deep characteristics, then researchers will not achieve adequate localization because of the poor resolution.

The most straightforward and efficient way to improve a CNN’s learning capability is to make the network layer deeper. However, this method has several drawbacks, as discussed in ^[9]:

There is a tradeoff between the network complexity and processing speed. Typically, a very deep network may produce great accuracy, but it will not be nearly as fast as a lightweight network. This tradeoff applies to both classification and segmentation models;
If the number of training data is limited, then increasing the network complexity, which directly increases the number of parameters that needs to be fit, will likely result in an overfitting problem;
The backpropagated gradient will dissipate as the network becomes deeper, leading to the gradient’s diffusion. This circumstance makes it harder to optimize the deep model.

Learning multiscale features, such as multiscale training or changing the receptive field, has resulted in significant performance improvements in the task of image scene classification and segmentation, which appears at different scales due to changes in the image distance and intrinsic object-size properties. This method optimally minimizes the drawback discussed earlier in ^[9].

The lack of sufficient data to construct a strong training resolution model is one of the challenges that is faced by the existing deep-learning method. Multiscale modeling is highly beneficial because it leverages learning discriminative-feature representation to maximize the information gain and optimize the efficiency by integrating low- and high-resolution data, and by merging multiple sources of data. This nature of multiscale learning has opened up a new paradigm to explain phenomena at a higher scale as a result of the collective action on lower scales. The research presented in ^[10] suggests a novel method for the early diagnosis of Alzheimer’s disease (AD) by combining multimodal information from MRI and FDG-PET images at multiple scales within the context of a deep neural network. The proposed multiscale approach preserves the structural and metabolic information across multiple scales, extracts features of coarse to fine structures, and increases the diagnostic-classification accuracy.

Multiscale modeling has great flexibility to be combined with another advanced network (such as LSTM, GAN, or another known reference network model) to produce a better performance in classification or segmentation. Parallel configurations of multiscale-deep-learning models and generative networks can be established to provide independent confirmation of the parameter sensitivity. An untapped possibility lies in employing generative models to disentangle the high dimensionality of the parameter variation from the low dimensionality of the dynamics. In one study ^[11], generative adversarial networks were used to come up with a new method for generalized multiscale feature extraction. The authors used a U-Net architecture with multiscale layers in the generator of the adversarial network to obtain the sequence feature from a one-dimensional vibration signal. The proposed feature-extraction method can accurately predict the RUL, and it outperforms the conventional RUL prediction approaches that are based on deep neural networks.

2. Multiscale-Deep-Learning Taxonomy

In this section, this entry introduces the advanced multiscale-deep-learning methods, primarily for classification and semantic-segmentation tasks. A comprehensive set of taxonomies used in multiscale approaches will be introduced that have been used in the majority of the previous works. Various multiscale-deep-learning architectures will also be discussed, which are broadly classified into two main categories: multiscale feature learning and multiscale feature fusion. This entry will help readers to develop a theoretical insight into the design principle of the multiscale-deep-learning network. The main division in the taxonomy of the multiscale-deep-learning architecture is represented in Figure 1.

Figure 1. The primary taxonomy of multiscale-deep-learning architecture used in classification and segmentation tasks.

2.1. Multiscale Feature Learning

Recent studies have shown the substantial potential of multiscale learning in various applications, such as scene parsing, self-driving cars, medical diagnosis, and many more ^[12]^[13]^[14]. The underlying idea of multiscale feature learning is to construct several CNN models with various contextual input sizes concurrently, whereby the features from multiple models are combined at the fully connected layer ^[15]. Multiscale feature learning can be defined as the process of inferring feature maps by analyzing kernels at a variety of scales to capture a wider range of relevant features and estimate the spatial mapping that links to the input images.

For a given CNN feature map, the spatial scale of an input image is the size, in pixels, of the rectangle in the input image that affects the value of the respective feature-map registration. Multiscale receptive fields on deep-feature maps aim to capture the semantic and contextual object information, as shown in Figure 2. With respect to the cardiac segmentation example, the red, yellow, and green convolutional filters indicate three different sizes of filters that are used to capture the latent features. The red area tends to be sensitive primarily to the left ventricle, shown in the middle region, while the yellow area covers the endocardium and epicardium regions, and the green area covers the area to the right of the ventricle. The figure also shows that the green area has the largest activation range, which is able to differentiate between the left ventricle, endocardium, and right ventricle with respect to the background information.

Figure 2. Multiscale receptive fields of deep-feature maps that are used to activate the visual semantics and their contexts. Multiscale representations help in better segmenting the objects by combining low-level and high-level representations.

2.2. Multiscale Feature Fusion

The goal of a semantic-segmentation model is to predict the semantic class of each individual pixel. Due to this dense prediction requirement, it is typical to maintain high-resolution features so that a better pixel-wise classification can be obtained. However, it is difficult to obtain large receptive fields in high-resolution features by using standard convolution. Multiscale feature fusion is based on the utilization of several features with various resolutions to capture both short- and long-distance patterns, without the need for a very deep network. The multiscale-feature-fusion method is an effective way to obtain high-quality features, which can be divided into image-level fusion and feature-level fusion.

3. Application of Multiscale Deep Learning

This entry will focus on just a few key areas where multiscale deep learning has been used, focusing on where the method has been most widely used in recent studies. This section of the entry primarily highlights the application of multiscale deep learning in four different areas, including satellite imagery, medical imaging, agriculture imaging, and multiscale deep learning used in industry and manufacturing.

3.1. Satellite Imagery

In recent years, a variety of methods for automating the process of extracting data from satellite images have been developed. These efforts have been applied to a variety of applications with the help of computer-vision algorithms to comprehend the content of satellite images. Some of the applications that have successfully applied intelligent-based remote-sensing systems are agriculture, forestry, urban planning, and climate change research. Historically, satellite imagery is acquired from a bird’s-eye perspective, seen from the top down, and featured in a variety of spectral-band objects. It is represented in multiple channels of a flat 2D plane, which is often at a lower resolution, in which each pixel has its own semantic meaning. Generally, semantic-segmentation models designed for remote-sensing applications aim to extract roads, identify buildings, and classify land cover. The segmentation of satellite images, which is used to find and locate objects and boundaries (straight lines, curves, etc.) of interest in images, is the process of dividing a digital image into several pixel sets. Furthermore, segmentations based on deep-learning techniques have evolved in recent years and have improved significantly with the emergence of fully convolutional neural networks ^[9].

In some applications of satellite imagery in which the size of the annotated satellite-image datasets is small, for the purpose of semantic segmentation, it is useful to initialize the encoder through the transfer-learning methodology, as in ^[16]^[17]^[18]^[19], to improve the network performance. Because most satellite images contain objects of varying sizes and shapes, standard DL algorithms with a single-input scale may fail to capture critical scale-dependent characteristics throughout the focal plane. As a result, it is hard to choose the right parameters to form spatial characteristics for various types of objects. Multiscale contextual and compositional elements in the spatial domain should be considered to facilitate the learning process. Zhao et al. ^[20] proposed a multiscale CNN model that employs dimension reduction, feature extraction, and classification components to construct a pyramid of image models for each component, whereby these models were trained for spatial-analysis purposes. The obtained spatial attributes were then concatenated for the final classification task. Li et al. in ^[21] combined multiscale CNNs with a bidirectional long short-term memory (Bi-LSTM) network to create spectral-, spatial-, and scale-dependent hyperspectral-image (HIS) attributes. By using this Bi-LSTM, they may take advantage of the correlation among multiscale properties without sacrificing scale-dependent details. Several previous works have focused on observing multiscale CNNs in satellite imagery, and they are summarized in Table 1 below.

Table 1. The Application of Multiscale Deep Learning in Satellite Imagery.

Literature	Target Task	Network Structure	Method	Strength	Weakness
Gong et al., 2019 ^[22]	Hyperspectral Image	Spatial Pyramid Pooling	CNN with multiscale convolutional layers, using multiscale filter banks with different metrics to represent the features for HSI classification.	The accuracy is comparable to or even better than other classifications in the both spectral and spectral-spatial classification of the HSI image.	Extracts only the spatial features in the limited-size filtering or convolutional windows.
Hu et al., 2018 ^[23]	Small Objects	Multiscale-Feature CNN	Identifying small objects by extracting features at different object convolution levels and applying multiscale features.	When compared with Faster RCNN, the accuracy of the small-object detection is significantly higher.	The performance is restricted by the computational costs and image representations.
Cui et al., 2019 ^[24]	Hyperspectral Image	Atrous Spatial Pyramid Pooling	Integrating both fused features from multiple receptive fields and multiscale spatial features based on the structure of the feature pyramid at various levels.	Better accuracy compared with other classification methods for Indian Pine, Pavia University, and Salina Datasets.	The classification significantly depends on the quality and quantity of the labeled samples, which are costly and time consuming to obtain.
Li et al., 2019 ^[25]	Aerial Image	Multiscale U-Net	The main structure is U-Net with cascaded dilated convolution at the bottom with varying dilation rates.	The best accuracy for the whole set is compared to four well-known methods using Inria Aerial Image Dataset. The best IoU in Chicago and Vienna Image in the same dataset.	The average IoU performance is still very weak, and especially in the Inria dataset.
Gong et al., 2021 ^[17]	Hyperspectral Image	Multiscale Fusion + Spatial Pyramid Pooling	The main structure includes a 3D CNN module, a squeeze-and-excitation module, and a 2D CNN pyramid-pooling module.	The method was evaluated on three public hyperspectral classification datasets: Indian Pine, Salinas, and Pavia University. The classification accuracies were 96.09%, 97%, and 96.56%, respectively.	The method still has the misclassification of bricks and gravel. The classification performance is still weak, and especially in the Indian Pine dataset.
Liu et al., 2021 ^[16]	Hyperspectral Image	Multiscale Fusion	Multiscale feature learning uses three simultaneous pretrained ResNet sub-CNNs, a fusion operation, and a U-shaped deconvolution network. A region proposal network (RPN) with an attention mechanism is used to extract building-instance locations, which are used to eliminate building occlusion.	When compared with a mask R-CNN, the proposed method improved the performance by 2.4% on the self-annotated building dataset of the instance-segmentation task, and by 0.17% on the ISPRS Vaihingen semantic-labeling-contest dataset.	The use of fusion strategies invariably results in increased computational and memory overhead.
Liu et al., 2018 ^[26]	UC Merced Dataset, SIRI-WHU Dataset, Aerial Image Dataset (AID),	Multiscale CNN + SPP	The proposed method trains the network on multiscale images by developing a dual-branch CNN network: F-net (given that training is performed at a fixed scale), and V-net (given that training is performed with varied input scales per n-iterations).	The MCNN reached a classification accuracy of 96.66 ± 0.90 for the UC Merced Dataset, 93.75 ± 1.13 for the SIRI-WHU Dataset, and 91.80 ± 0.22 for the AID Dataset.	This method reduces the possibility of feature discrimination by focusing solely on the feature map from the last CNN layer and ignoring the feature data from additional layers.
Gao et al., 2022 ^[19]	Hyperspectral Image	Multiscale Fusion	This method employs cross-spectral spatial-feature extraction (SSCEM). This module sent previous CNN layer information into the spatial and spectral extraction branches independently, and so changes in the other domain after each convolution could be fully exploited.	The proposed network excels in many deep-learning-based networks on three HSI datasets. It also cuts down on the number of training parameters for the network, which helps, to a certain extent, to prevent overfitting problems.	The performance is restricted by the complexity of the network structure, which implies a greater computational cost.

3.2. Medical Imaging

Deep-learning initiatives are gaining traction in the healthcare domain by bringing about new efficiencies and possibilities that enable physicians, clinicians, and researchers who are passionate about improving the lives of others. Generally, medical imaging is used to diagnose various disorders, including cancer, growth problems, and diabetic retinopathy. Apparently, the use of medical imaging has a huge impact in terms of providing an accurate clinical screening and diagnosis ^[27]^[28]^[29]^[30]^[31]^[32]. One of the subdomains of this application is medical-image segmentation, which uses advanced automated-segmentation algorithms to provide segmentation results that are as similar as possible to the region’s original structure ^[33]^[34]^[35]^[36]. Deep-learning methods for medical-image applications, either for classification or segmentation purposes, often encounter the following three profound issues:

The range of the annotated medical images required for optimally training the model is often limited;
The regions of interest (ROIs) are generally small in size, and they have imprecise edges that make them appear in unpredictable x, y, and z positions. Furthermore, sometimes only the entire image label is labeled, even though the targeted ROIs are not available;
The ROIs in medical images often contain visual information with similar patterns and that vary in size (scale).

Thus, the multiscale semantic feature plays a crucial role in improving the automation performances of medical-image-classification and segmentation networks. In order to obtain multiscale representations in vision tasks, feature extractors must utilize a wide range of receptive fields to capture the contexts of objects at different scales. Several previous works have focused on observing multiscale CNNs in medical images, and they are summarized in Table 2 below.

Table 2. The Application of Multiscale CNNs in Medical Imaging.

Literature	Target Task	Network Structure	Method	Strength	Weakness
Wolterink et al., 2017 ^[37]	Vessel Segmentation	CNN + Stacked Dilation Convolution	CNN with ten-layer network. The first eight layers are the feature-extraction levels, whereas Layers 9 and 10 are fully connected classification layers. Each feature-extraction layer uses 32 kernels. The level of the dilation rate increases between Layers 2 and 7.	The myocardium and blood pool had Dice indices of 0.80 ± 0.06 and 0.93 ± 0.02, respectively, average distances to boundaries of 0.96 ± 0.31 and 0.89 ± 0.24 mm, respectively, and Hausdorff distances of 6.13 ± 3.76 and 7.07 ± 3.01 mm, respectively.	Due to hardware limitations, the work still used a large receptive field and led to a less precise prediction.
Du et al., 2020 ^[38]	Vessel Segmentation	Dilated Residual Network + Modified SPP	The network’s inception module initializes a multilevel feature representation of cardiovascular pictures. The dilated-residual-network (DRN) component extracts features, classifies the pixels, and anticipates the segmentation zones. A hybrid pyramid-pooling network (HPPN) then aggregates the local and worldwide DRN information.	Best result in quantitative segmentation compared with four well-known methods in all five substructures (left ventricle (LV), right ventricle (RV), left atrium (LA), right atrium (RA), and LV myocardium (LV_Myo)).	The HD value of this method is higher than that of U-Net, which shows that it still has some issues with segmenting small targets.
Kim et al., 2018 ^[18]	Lung Cancer	Multiscale Fusion CNN	Multiscale-convolution inputs with varying levels of inherent contextual abstract information in multiple scales with progressive integration and multistream feature integration in an end-to-end approach.	On two parts of the LUNA16 Dataset (V1 and V2), the method did much better than other approaches by a wide margin. The average CPMs were 0.908 for V1, and 0.942 for V2.	The anchor scheme used by the nodule detectors introduces an excessive number of hyperparameters that must be fine-tuned for each unique problem.
Muralidharan et al., 2022 ^[39]	Chest X-ray	Multiscale Fusion	The input image is divided into seven modes, which are then fed into a multiscale deep CNN with 14 layers (blocks) and an additional four extra layers. Each block has an input layer, convolution layer, batch-normalization layer, dropout layer, and max-pooling layer, whereby the block is stacked three successive times.	The proposed model successfully differentiates COVID-19 from viral pneumonia and normal classes with accuracy, precision, recall, and F1-score values of 0.96, 0.97, 0.99, and 0.98, respectively.	The obtained results are still based on random combinations of the extracted modes, and so they need to run the model with every possible combination of the hyperparameters to obtain the desired result.
Amer et al., 2021 ^[40]	Echocardiography	Multiscale Fusion + Cascaded Dilated Convolution	The network uses residual blocks and cascaded-dilated-convolution modules to pull both coarse and fine multiscale features from the input image.	Dice-similarity-performance measure of 95.1% compared with expert’s annotation and surpasses Deeplabv3 and U-Net performances by 8.4% and 1.2%, respectively.	The work only measures the image-segmentation performance, without including the LV-ejection-fraction (ED and ES) clinical cardiac indicators.
Yang et al., 2021 ^[41]	Cardiac MRI	Dilated Convolution	The dilated block of the segmentation network captures and aggregates multiscale information to create segmentation probability maps. The discriminator part differentiates the segmentation probability map and the ground truth at the pixel level to provide confidence probability maps.	The Dice coefficients on the ACDC 2017 for both ED and ES are 0.94 and 0.89, respectively. The Hausdorff distances for both the ED and ES are 10.6 and 12.6 mm, respectively.	The model still produces weak Dice coefficients in both the ED and ES of the left-ventricle-myocardium part.
Wang et al., 2021 ^[42]	Cardiac MRI	Multiscale Fusion/Dilated Convolution	The encoder part uses dilated convolution. The decoding part reconstructs the full-size skip-connection structure for contextual-semantic-information fusion.	The Dice coefficients on the ACDC 2017, MICCAI 2009, and MICCAI 2018 datasets reached 96.2%, 98.0%, and 96.8%, respectively. Overall, Jaccard indices of 0.897, 0.964, and 0.937 were observed, with Hausdorff distances of 7.0, 5.2, and 7.5 mm, respectively.	The work only measures the image-segmentation performance, without including the LV-ejection-fraction (ED and ES) clinical cardiac indicators.
Amer et al., 2022 ^[43]	Echocardiography Lung Computed Tomography (CT) Images	U-Net + Multi-scale Spatial Attention + Dilated Convolution	The model uses a U-Net architecture with channel attention and multiscale spatial attention to learn multiscale feature representations with diverse modalities, as well as shape and size variability.	The proposed model outperformed the basic U-Net, ResDUnet, Attention U-Net, and U-Net3+ models by 4.1%, 2.5%, 1.8%, and 0.4%, respectively, on lung CT images. It also outperformed the basic U-Net, ResDUnet, Attention U-Net, and U-Net3++ models by 2.8%, 1.6%, 1.1%, and 0.6%, respectively, on the left-ventricle images.	The approach still struggles to capture edge details accurately, and it loses segmentation detail at complicated edges.

3.3. Agriculture

Global agricultural production is under increasing amounts of pressure because of several factors, such as population growth, climate change, ecological environment deterioration, the COVID-19 epidemic, and the war in Ukraine. Declining agriculture production could result in severely negative consequences, and especially on food availability, whereby a price hike is to be expected. Therefore, innovative strategies that utilize automated agricultural technology are required to improve the production rates while ensuring sustainable and environmentally friendly farming. The advancements in deep learning, sensor technology, and mechanical automation provide enormous potential to address these challenges.

The adoption of high-performance imaging sensors (RGB, hyperspectral, thermal, and SAR) and unmanned mobile platforms (satellites, drones, and terrestrial robots) is creating huge potential for accurate automated systems. The current use of these imaging data is geared toward transitioning conventional agriculture into data-driven precision agriculture (PA), with the primary objective of reducing the dependency on manual laborious tasks through automation approaches. More importantly, these data contain a large amount of valuable information that will be able to assist farmers in predicting yields, scheduling sowing, tracking the growth states of their crops, monitoring pests and diseases, as well as controlling weeds.

The deep-learning network has been used extensively as part of the automated decision-making tool by extracting hierarchical features from input data for various agricultural tasks. As a result, this wide adoption of DL has opened new possibilities for interpreting massive amounts of data accurately for agriculture analytic systems, such as:

Crop surveillance systems through remote sensing to map the land cover and crop discrimination ^[44]^[45]^[46];
Plant-stress-monitoring systems by implementing classification and segmentation networks to better understand the interactions between pathogens, insects, and plants, as well as to determine the causes of plant stress ^[47]^[48]^[49]^[50]^[51];
Disease and pest identification and quantification systems that will assist in monitoring the health condition of plants, including the nutritional status, development phase, and yield prediction ^[48]^[52]^[53]^[54]^[55].

Several previous works that have utilized multiscale CNN algorithms in automated agricultural applications are summarized in Table 3 below.

Table 3. The Application of Multiscale Deep Learning in Agriculture Sensing.

Literature	Target Task	Network Structure	Method	Strength	Weakness
Hu et al., 2018 ^[50]	Plant Leaf	Multiscale Fusion CNN	With a list of bilinear interpolation procedures, the input image is split up into several low-resolution images. These images are then fed into the network so that it can learn to understand different features at different depths.	Produced a better accuracy rate in most of the MalayaKew Leaf Dataset and LeafSnap Plant Leaf Dataset.	The training process required a more complex sample set that needed to provide both whole and segmented images.
Li et al., 2018 ^[56]	Chinese Herbal Medicines	Multiscale Fusion CNN	Near and far multiscale input images are fused together into a six-channel image using a CNN of three convolutional and three pooling layers.	The requirements of Chinese-herbal-medicine classification were met by the model, with a classification accuracy of more than 90%.	There are still many problems with the method, such as less training data, a less accurate classification, and less ability to avoid interference.
Turkoglu et al., 2021 ^[46]	ZueriCrop Dataset	Early Fusion + CNN	The model consists of layered CNN networks. In a hierarchical tree, different network levels are indicative of increasingly finer label resolutions. At the refining stage, the three-dimensional probability regions from three different stages are passed to the CNN.	The achieved precision, recall, F1 score, and accuracy are 0.601, 0.498, 0.524, and 0.88, respectively, which outperforms the advanced benchmarked methods.	It is unclear how to adapt the model layout to standard CNNs without affecting the feature-extraction backbone for recurrent networks.
Li et al., 2021 ^[57]	Crop Image (UAVSAR and RapidEye)	Multiscale Fusion CNN	A sequence of object scales is gradually fed into the CNN, which transforms the acquired features from smaller scales into larger scales by adopting gradually larger convolutional windows.	This technique provides a novel method for solving the issue of image classification for a variety of terrain types.	The model still generates blurred boundaries between crop fields due to the requirement for an input patch.
Wang et al., 2021 ^[58]	Tomato Gray Mold Dataset	Feature Fusion + MobileNetv2 + Channel Attention Module	MobileNetv2 was used as the base network, whereby multiscale feature fusions provide the fused feature maps. The efficient channel-attention module then enhances these feature maps, and the relevant feature paths are weighted. The resultant features were used to predict mold on tomatoes.	Precision and F1 score reached 0.934 and 0.956, respectively, and it outperformed the Tiny-YOLOv3, MobileNetv2-YOLOv3, MobileNetv2-SSD, and Faster R-CNN performances.	Missed detection persists, and especially at extreme shooting angles, and it imposes inaccurate early diagnosis at different parts under different shooting conditions.
Zhou et al., 2022 ^[59]	Fish Dataset	ASPP + GAN	A generative adversarial network (GAN) is introduced before applying CNN to augment the existing dataset. Then, the ASPP module fuses the input and output of a dilated convolutional layer with a short sample rate to acquire rich multiscale contextual information.	On the validation dataset, the obtained F1 score, GA, and mIoU reached 0.961, 0.981, and 0.973, respectively.	The model still loses a lot of segmentation detail at the complicated edges.

3.4. Industrial and Manufacturing Systems

Smart manufacturing makes use of wireless networks, sensors, and intelligent systems to increase the production efficiency, improve the system performance, and reduce wastage while lowering costs ^[60]. The increasing adoption rate of intelligent sensors and the Internet of things (IoT) has revolutionized the manufacturing sector by allowing computer networks to collect and transform enormous amounts of data from linked machines into information that can be utilized to make automated decisions ^[61]^[62]. This has led to an increasing demand for efficient systems to handle high-volume, high-velocity, and high-diversity production data. It happens that the deep-learning approach is able to deliver state-of-the-art analytic capabilities for processing and evaluating massive volumes of production data. It bridges the gap in connecting huge machinery data and intelligent-machine monitoring, which enables it to facilitate the extraction of useful knowledge and make appropriate decisions from vast volumes of data. This approach also promotes high-performance systems through smart manufacturing by reducing the maintenance and operational costs, adapting to customer expectations, boosting productivity, enhancing visibility, and adding more value to the overall operations. Therefore, the deep-learning methodology has contributed to data-driven manufacturing, including the following applications: machine-fault diagnosis, predictive analytics and defect prognosis, and surface-integration inspection.

References

Gao, K.; Niu, S.; Ji, Z.; Wu, M.; Chen, Q.; Xu, R.; Yuan, S.; Fan, W.; Chen, Y.; Dong, J. Double-Branched and Area-Constraint Fully Convolutional Networks for Automated Serous Retinal Detachment Segmentation in SD-OCT Images. Comput. Methods Programs Biomed. 2019, 176, 69–80.
Teng, L.; Li, H.; Karim, S. DMCNN: A Deep Multiscale Convolutional Neural Network Model for Medical Image Segmentation. J. Healthc. Eng. 2019, 2019, 8597606.
Sermanet, P.; Lecun, Y. Traffic Sign Recognition with Multi-Scale Convolutional Networks. In Proceedings of the International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2809–2813.
Buyssens, P.; Elmoataz, A.; Lézoray, O. Multiscale Convolutional Neural Networks for Vision–Based Classification of Cells. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2013; Volume 7725 LNCS, pp. 342–352.
Zamri, N.F.M.; Tahir, N.M.; Ali, M.S.A.M.; Ashar, N.D.K.; Al-misreb, A.A. Mini-Review of Street Crime Prediction and Classification Methods. J. Kejuruter. 2021, 33, 391.
Abdani, S.R.; Zulkifley, M.A.; Zulkifley, N.H. Analysis of Spatial Pyramid Pooling Variations in Semantic Segmentation for Satellite Image Applications. In Proceedings of the 2021 International Conference on Decision Aid Sciences and Application, DASA 2021, Online, 7–8 December 2021; pp. 397–401.
Mohamed, N.A.; Zulkifley, M.A.; Kamari, N.A.M.; Kadim, Z.; Mohamed, N.A.; Zulkifley, M.A.; Azwan, N.; Kamari, M.; Kadim, Z.; My, N.A.M.K. Symmetrically Stacked Long Short-Term Memory Networks for Fall Event Recognition Using Compact Convolutional Neural Networks-Based Tracker. Symmetry 2022, 14, 293.
LeCun, Y.; Kavukcuoglu, K.; Farabet, C. Convolutional Networks and Applications in Vision. In Proceedings of the ISCAS 2010–2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems, Paris, France, 30 May–2 June 2010; pp. 253–256.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 10 December 2015; pp. 770–778.
Lu, D.; Popuri, K.; Ding, G.W.; Balachandar, R.; Beg, M.F.; Weiner, M.; Aisen, P.; Petersen, R.; Jack, C.; Jagust, W.; et al. Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer’s Disease Using Structural MR and FDG-PET Images. Sci. Rep. 2018, 8, 5697.
Suh, S.; Lukowicz, P.; Lee, Y.O. Generalized Multiscale Feature Extraction for Remaining Useful Life Prediction of Bearings with Generative Adversarial Networks. Knowl. Based Syst. 2022, 237, 107866.
Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, UK, 26 June–1 July 2012; Volume 1, pp. 575–582.
Zhou, W.; Lin, X.; Lei, J.; Yu, L.; Hwang, J.N. MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB-Thermal Urban Road Scene Parsing. IEEE Trans. Multimed. 2022, 24, 2526–2538.
Zhang, R.; Chen, J.; Feng, L.; Li, S.; Yang, W.; Guo, D. A Refined Pyramid Scene Parsing Network for Polarimetric SAR Image Semantic Segmentation in Agricultural Areas. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5.
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2018; Volume 11211 LNCS, pp. 833–851.
Liu, Y.; Chen, D.; Ma, A.; Zhong, Y.; Fang, F.; Xu, K. Multiscale U-Shaped CNN Building Instance Extraction Framework with Edge Constraint for High-Spatial-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6106–6120.
Gong, H.; Li, Q.; Li, C.; Dai, H.; He, Z.; Wang, W.; Li, H.; Han, F.; Tuniyazi, A.; Mu, T. Multiscale Information Fusion for Hyperspectral Image Classification Based on Hybrid 2D-3D CNN. Remote Sens. 2021, 13, 2268.
Kim, B.C.; Yoon, J.S.; Choi, J.S.; Suk, H.I. Multi-Scale Gradual Integration CNN for False Positive Reduction in Pulmonary Nodule Detection. Neural Netw. 2018, 115, 1–10.
Gao, H.; Wu, H.; Chen, Z.; Zhang, Y.; Zhang, Y.; Li, C. Multiscale Spectral-Spatial Cross-Extraction Network for Hyperspectral Image Classification. IET Image Process. 2022, 16, 755–771.
Zhao, W.; Du, S. Learning Multiscale and Deep Representations for Classifying Remotely Sensed Imagery. ISPRS J. Photogramm. Remote Sens. 2016, 113, 155–165.
Li, S.; Zhu, X.; Bao, J. Hierarchical Multi-Scale Convolutional Neural Networks for Hyperspectral Image Classification. Sensors 2019, 19, 1714.
Gong, Z.; Zhong, P.; Yu, Y.; Hu, W.; Li, S. A CNN with Multiscale Convolution and Diversified Metric for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3599–3618.
Hu, G.X.; Yang, Z.; Hu, L.; Huang, L.; Han, J.M. Small Object Detection with Multiscale Features. Int. J. Digit. Multimed. Broadcasting 2018, 2018, 4546896.
Cui, X.; Zheng, K.; Gao, L.; Zhang, B.; Yang, D.; Ren, J. Multiscale Spatial-Spectral Convolutional Network with Image-Based Framework for Hyperspectral Imagery Classification. Remote Sens. 2019, 11, 2220.
Li, X.; Jiang, Y.; Peng, H.; Yin, S. An Aerial Image Segmentation Approach Based on Enhanced Multi-Scale Convolutional Neural Network. In Proceedings of the 2019 IEEE International Conference on Industrial Cyber Physical Systems, ICPS 2019, Taipei, Taiwan, 6–9 May 2019; pp. 47–52.
Liu, Y.; Zhong, Y.; Qin, Q. Scene Classification Based on Multiscale Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 7109–7121.
Wang, G.; Li, W.; Zuluaga, M.A.; Pratt, R.; Patel, P.A.; Aertsen, M.; Doel, T.; David, A.L.; Deprest, J.; Ourselin, S.; et al. Interactive Medical Image Segmentation Using Deep Learning with Image-Specific Fine Tuning. IEEE Trans. Med. Imaging 2018, 37, 1562–1573.
Yin, S.; Bi, J. Medical Image Annotation Based on Deep Transfer Learning. J. Appl. Sci. Eng. 2019, 22, 385–390.
Li, P.; Chen, Z.; Yang, L.T.; Zhang, Q.; Deen, M.J. Deep Convolutional Computation Model for Feature Learning on Big Data in Internet of Things. IEEE Trans. Ind. Inform. 2018, 14, 790–798.
Zhao, L.; Chen, Z.; Yang, Y.; Zou, L.; Wang, Z.J. ICFS Clustering with Multiple Representatives for Large Data. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 728–738.
Zulkifley, M.A.; Abdani, S.R.; Zulkifley, N.H.; Shahrimin, M.I. Residual-Shuffle Network with Spatial Pyramid Pooling Module for COVID-19 Screening. Diagnostics 2021, 11, 1497.
Roslidar, R.; Syaryadhi, M.; Saddami, K.; Pradhan, B.; Arnia, F.; Syukri, M.; Munadi, K.; Roslidar, R.; Syaryadhi, M.; Saddami, K.; et al. BreaCNet: A High-Accuracy Breast Thermogram Classifier Based on Mobile Convolutional Neural Network. Math. Biosci. Eng. 2022, 19, 1304–1331.
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 39, 640–651.
Havaei, M.; Davy, A.; Warde-Farley, D.; Biard, A.; Courville, A.; Bengio, Y.; Pal, C.; Jodoin, P.M.; Larochelle, H. Brain Tumor Segmentation with Deep Neural Networks. Med. Image Anal. 2017, 35, 18–31.
Pace, D.F.; van Dalca, A.; Geva, T.; Powell, A.J.; Moghari, M.H.; Golland, P. Interactive Whole-Heart Segmentation in Congenital Heart Disease. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 80–88.
Zotti, C.; Luo, Z.; Lalande, A.; Jodoin, P.M. Convolutional Neural Network with Shape Prior Applied to Cardiac MRI Segmentation. IEEE J. Biomed. Health Inform. 2019, 23, 1119–1128.
Wolterink, J.M.; Leiner, T.; Viergever, M.A.; Išgum, I. Dilated Convolutional Neural Networks for Cardiovascular MR Segmentation in Congenital Heart Disease. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2017; Volume 10129 LNCS, pp. 95–102.
Du, X.; Song, Y.; Liu, Y.; Zhang, Y.; Liu, H.; Chen, B.; Li, S. An Integrated Deep Learning Framework for Joint Segmentation of Blood Pool and Myocardium. Med. Image Anal. 2020, 62, 101685.
Muralidharan, N.; Gupta, S.; Prusty, M.R.; Tripathy, R.K. Detection of COVID19 from X-Ray Images Using Multiscale Deep Convolutional Neural Network. Appl. Soft Comput. 2022, 119, 108610.
Amer, A.; Ye, X.; Janan, F. ResDUnet: A Deep Learning-Based Left Ventricle Segmentation Method for Echocardiography. IEEE Access 2021, 9, 159755–159763.
Yang, X.; Zhang, Y.; Lo, B.; Wu, D.; Liao, H.; Zhang, Y.T. DBAN: Adversarial Network with Multi-Scale Features for Cardiac MRI Segmentation. IEEE J. Biomed. Health Inform. 2021, 25, 2018–2028.
Wang, Z.; Peng, Y.; Li, D.; Guo, Y.; Zhang, B. MMNet: A Multi-Scale Deep Learning Network for the Left Ventricular Segmentation of Cardiac MRI Images. Appl. Intell. 2021, 52, 5225–5240.
Amer, A.; Lambrou, T.; Ye, X. MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation. Appl. Sci. 2022, 12, 3676.
Wang, L.; Wang, J.; Liu, Z.; Zhu, J.; Qin, F. Evaluation of a Deep-Learning Model for Multispectral Remote Sensing of Land Use and Crop Classification. Crop J. 2022.
Xu, J.; Zhu, Y.; Zhong, R.; Lin, Z.; Xu, J.; Jiang, H.; Huang, J.; Li, H.; Lin, T. DeepCropMapping: A Multi-Temporal Deep Learning Approach with Improved Spatial Generalizability for Dynamic Corn and Soybean Mapping. Remote Sens. Environ. 2020, 247, 111946.
Turkoglu, M.O.; D’Aronco, S.; Perich, G.; Liebisch, F.; Streit, C.; Schindler, K.; Wegner, J.D. Crop Mapping from Image Time Series: Deep Learning with Multi-Scale Label Hierarchies. Remote Sens. Environ. 2021, 264, 112603.
Ubbens, J.R.; Stavness, I. Deep Plant Phenomics: A Deep Learning Platform for Complex Plant Phenotyping Tasks. Front. Plant Sci. 2017, 8, 1190.
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using Deep Learning for Image-Based Plant Disease Detection. Front. Plant Sci. 2016, 7, 1419.
Boulent, J.; Foucher, S.; Théau, J.; St-Charles, P.L. Convolutional Neural Networks for the Automatic Identification of Plant Diseases. Front. Plant Sci. 2019, 10, 941.
Hu, J.; Chen, Z.; Yang, M.; Zhang, R.; Cui, Y. A Multiscale Fusion Convolutional Neural Network for Plant Leaf Recognition. IEEE Signal Process. Lett. 2018, 25, 853–857.
Zulkifley, M.A.; Moubark, A.M.; Saputro, A.H.; Abdani, S.R. Automated Apple Recognition System Using Semantic Segmentation Networks with Group and Shuffle Operators. Agriculture 2022, 12, 756.
Ferentinos, K.P. Deep Learning Models for Plant Disease Detection and Diagnosis. Comput. Electron. Agric. 2018, 145, 311–318.
Genaev, M.A.; Skolotneva, E.S.; Gultyaeva, E.I.; Orlova, E.A.; Bechtold, N.P.; Afonnikov, D.A. Image-Based Wheat Fungi Diseases Identification by Deep Learning. Plants 2021, 10, 1500.
Rangarajan Aravind, K.; Maheswari, P.; Raja, P.; Szczepański, C. Crop Disease Classification Using Deep Learning Approach: An Overview and a Case Study. Deep. Learn. Data Anal. 2020, 173–195.
Rahman, C.R.; Arko, P.S.; Ali, M.E.; Iqbal Khan, M.A.; Apon, S.H.; Nowrin, F.; Wasif, A. Identification and Recognition of Rice Diseases and Pests Using Convolutional Neural Networks. Biosyst. Eng. 2020, 194, 112–120.
Li, T.; Sun, F.; Sun, R.; Wang, L.; Li, M.; Yang, H. Chinese Herbal Medicine Classification Using Convolutional Neural Network with Multiscale Images and Data Augmentation. In Proceedings of the 2018 International Conference on Security, Pattern Analysis, and Cybernetics, SPAC 2018, Jinan, China, 14–17 December 2018; pp. 109–113.
Li, H.; Zhang, C.; Zhang, Y.; Zhang, S.; Ding, X.; Atkinson, P.M. A Scale Sequence Object-Based Convolutional Neural Network (SS-OCNN) for Crop Classification from Fine Spatial Resolution Remotely Sensed Imagery. Int. J. Digit. Earth 2021, 14, 1528–1546.
Wang, X.; Liu, J. Multiscale Parallel Algorithm for Early Detection of Tomato Gray Mold in a Complex Natural Environment. Front. Plant Sci. 2021, 12, 719.
Zhou, X.; Chen, S.; Ren, Y.; Zhang, Y.; Fu, J.; Fan, D.; Lin, J.; Wang, Q. Atrous Pyramid GAN Segmentation Network for Fish Images with High Performance. Electronics 2022, 11, 911.
Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep Learning for Smart Manufacturing: Methods and Applications. J. Manuf. Syst. 2018, 48, 144–156.
Jeschke, S.; Brecher, C.; Meisen, T.; Özdemir, D.; Eschert, T. Industrial Internet of Things and Cyber Manufacturing Systems. In Industrial Internet of Things; Springer: Cham, Switzerland, 2017; pp. 3–19.
Yin, S.; Li, X.; Gao, H.; Kaynak, O. Data-Based Techniques Focused on Modern Industry: An Overview. IEEE Trans. Ind. Electron. 2015, 62, 657–667.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Elizar Elizar

Mohd Asyraf Zulkifley

Rusdha Muharar

Mohd Hairi Mohd Zaman

Seri Mastura Mustaza

View Times: 516

Update Date: 26 Oct 2022

Table of Contents

Video Upload Options

Confirm