Construction Method of Knowledge Graph for Image Recognition

Construction Method of Knowledge Graph for Image Recognition: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

Lijun Chen

Jingcan Li

Qiuting Cai

Xiangyu Han

Yunqian Ma

Xia Xie

With the continuous development of artificial intelligence technology and the exponential growth in the number of images, image detection and recognition technology is becoming more widely used. Image knowledge management is extremely urgent. The data source of a knowledge graph is not only the text and structured data but also the visual or auditory data such as images, video, and audio.

knowledge graph
image recognition

1. Introduction

With the continuous development of intelligent technology and the exponential growth in the number of images, image detection and recognition technology is becoming more widely used. Computer image recognition is usually divided into several main steps: information acquisition, preprocessing, feature extraction and selection, classifier design, and classification decision. In recent years, the best effect of image recognition has been instance segmentation, which is based on convolution neural network target detection and semantic segmentation. For the initial instance segmentation method in 2014, inspired by the R-CNN ^[1] two-stage target detection framework, Bharath Hariharan proposed the SDS ^[2] model, which is also the earliest instance segmentation algorithm, and this laid the foundation for subsequent research.

Ren et al. proposed an instance segmentation model, Mask R-CNN, which has become the basic framework for many instance segmentation tasks and is based on Faster-RCNN ^[3]; this was added as a branch of semantic segmentation in 2017. In 2020, Chen et al. proposed BlendMask ^[4]. The method, based on the shortcomings of instance segmentation in one-stage target detection, takes FCOS ^[5] as the framework and combines the top–down and bottom–up methods ^[6]. A blender module was designed to fuse high-level features with lower-level features. However, due to the convolution kernel existing in the feature fusion process, the receptive field in the feature layer of large pixels is too small ^[7], which will lead to inaccurate feature extraction and fusion.

2. Construction Method of Knowledge Graph for Image Recognition

2.1. Target Detection

In 2001, Viola et al. proposed the Haar feature extraction method and combined the AdaBoost ^[8] classification algorithm to achieve face detection. In 2005, Dalal et al. proposed the histogram of oriented gradients algorithm to carry out feature recognition through edge features and realize pedestrian detection by combining it with an SVM classifier. In 2008, Bay et al. proposed the SURF ^[9] algorithm based on the improvement of the SIFT algorithm, which greatly reduced the running time of the program and increased the robustness of the algorithm. In 2015, Redmon et al. ^[10] proposed YOLO, which was a new approach to object detection. The object detection frame was defined as the boundary box of space separation and the regression problem of correlation probability. From 2016 to now, the YOLO series has changed from YOLOv1 to YOLOv7. In object detection, the anchored regression method and non-anchored regression method are dominant, but they also have certain shortcomings. In 2020, Bin et al. ^[11] proposed CPM R-CNN, which contains three efficient modules to optimize the anchor-based point-guided method. In recent years, unsupervised pretraining methods have been designed for target detection, but they usually have defects in image classification. Enze et al. ^[12] proposed a simple and effective self-supervised target detection method, DetCo.

2.2. Classical Model

In 2014, Szegedy et al. designed the GoogLeNet ^[13] network model and proposed the structure of inception and its branch structure; the model achieved a new low error rate on the image datasets (ImageNet). At the same time, Simonyan et al. proposed the VGG-Net ^[14] model. In 2015, He et al. proposed the ResNet ^[15] network model, which achieved an error rate of only 3.6% on ImageNet. Convolutional neural networks (CNNs) have gained remarkable success in many image classification tasks in recent years. Wen-Shuai et al. ^[16] proposed an automatic CNN architecture design method by using genetic algorithms to effectively address the image classification tasks. In order to suppress the uncertainty of ResNet, in 2021, Kai et al. ^[17] proposed a simple yet efficient Self-Cure Network (SCN) based on ResNet18, which prevents deep networks from over-fitting uncertain facial images. In 2022, Jing et al. ^[18] introduced a regulator module as a memory mechanism to extract the complementary features of the middle layer and feed them further to ResNet. Recently, the size of the convolution kernel has also changed. In 2022, Xiaohan Ding et al. ^[19] proposed RepLKNet, a pure CNN architecture whose kernel size was as large as 31 * 31, in contrast to the commonly used 3 * 3. In 2023, combined with MobileNet and the ResNet-18 model, Lee et al. ^[20] proposed a block processing strategy, which effectively improved the efficiency of facial expression processing.

2.3. Instance Segmentation

In 2015, Dai et al. proposed an Instance-sensitive Fully Convolutional Network (FCN) ^[21] in order to make up for the translation invariance defect of the Fully Convolutional Network (FCN) ^[22], and it completed the task of instance segmentation. Faster R-CNN proposes an RPN network based on the R-CNN series of algorithms to obtain accurate candidate regions. It is an end-to-end detection model for multi-object classification and localization.

In 2019, Bolya et al. proposed the YOLACT model to add the mask branch to the existing one-stage target model based on Mask R-CNN operating the same as Faster-RCNN but without explicit localization steps. To classify based on the instance center points problem and the dense distance regression problem, Xie et al. proposed the PolarMask ^[23] model. Based on the instance category of the quantized object center position and object size, Wang et al. proposed the SOLO model ^[24] to identify a single pixel: not a single output category but a category with location information in 2019. In the same year, based on the principle that inspection and segmentation should promote each other, Wang et al. proposed the RDSNet ^[25] to improve the performance of instance segmentation by making full and reasonable use of the information interaction between the target detection and instance segmentation. In 2022, Lu et al. proposed the Segmenting Objects from Relational Visual Data ^[26] that promoted the development of image segmentation. In 2023, Lei et al. ^[27] modeled the image formation as the composition of two overlapping layers and used the double-layer structure to model the occlusion relationship, which naturally decoupled the boundaries between instances and effectively solved the image segmentation problem in the case of occlusion.

2.4. Knowledge Graph

As early as 1960, semantic networks were proposed as a method of knowledge expression, which was mainly used in the field of self-speech language understanding. In 2006, Tim introduced linked data to highlight the essence of the semantic web to establish links between open data. In recent years, the application of knowledge graph technology in various industries has become an important trend ^[28], such as the Baidu knowledge graph and the Google knowledge graph in the search field. In the medical field, there is a knowledge graph of traditional Chinese medicine ^[29]; there is JD.com’s e-commerce field, and so on. These fully illustrate the universality of the knowledge graph.

This entry is adapted from the peer-reviewed paper 10.3390/math11194174

References

Ren, S.; He, K.; Girshick, R.; Jian, S. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149.
Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Simultaneous Detection and Segmentation; Springer: Cham, Switzerland, 2014.
Ullah, A.; Xie, H.; Farooq, M.O.; Sun, Z. Pedestrian Detection in Infrared Images Using Fast RCNN. In Proceedings of the 2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA), Xi’an, China, 7–10 November 2018; pp. 1–6.
Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Yan, Y. Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020.
Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT++ Better Real-Time Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1108–1121.
Viola, P.A.; Jones, M.J. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001.
Zhang, J.; Ye, G.; Tu, Z.; Qin, Y.; Qin, Q.; Zhang, J.; Liu, J. A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition. CAAI Trans. Intell. Technol. 2022, 7, 46–55.
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionm, Boston, MA, USA, 7–12 June 2015.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
Zhu, B.; Song, Q.; Yang, L.; Wang, Z.; Liu, C.; Hu, M. CPM R-CNN: Calibrating point-guided misalignment in object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3248–3257.
Xie, E.; Ding, J.; Wang, W.; Zhan, X.; Xu, H.; Sun, P.; Li, Z.; Luo, P. Detco: Unsupervised contrastive learning for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 5–9 January 2021; pp. 8392–8401.
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651.
Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G.; Lv, J. Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans. Cybern. 2020, 50, 3840–3854.
Wang, K.; Peng, X.; Yang, J.; Lu, S.; Qiao, Y. Suppressing uncertainties for large-scale facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6897–6906.
Xu, J.; Pan, Y.; Pan, X.; Hoi, S.; Yi, Z.; Xu, Z. RegNet: Self-regulated network for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–6.
Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975.
Lee, D.H.; Yoo, J.H. CNN Learning Strategy for Recognizing Facial Expressions. IEEE Access 2023, 11, 70865–70872.
Dai, J.; He, K.; Li, Y.; Ren, S.; Sun, J. Instance-sensitive fully convolutional networks. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016.
Huang, Z.; Huang, L.; Gong, Y.; Huang, C.; Wang, X. Mask scoring r-cnn. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019.
Zhou, X.; Yang, X.; Ma, J.; Wang, I.K. Energy efficient smart routing based on link correlation mining for wireless edge computing in iot. IEEE Internet Things J. 2021, 9, 14988–14997.
Zhou, X.; Xu, X.; Liang, W.; Zeng, Z.; Yan, Z. Deep-learning-enhanced multitarget detection for end–edge–cloud surveillance in smart iot. IEEE Internet Things J. 2021, 8, 12588–12596.
Zhou, X.; Liang, W.; Li, W.; Yan, K.; Wang, I.K. Hierarchical adversarial attacks against graph neural network based iot network intrusion detection system. IEEE Internet Things J. 2021, 9, 9310–9319.
Lu, X.; Wang, W.; Shen, J.; Crandall, D.J.; Van Gool, L. Segmenting Objects from Relational Visual Data. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7885–7897.
Ke, L.; Tai, Y.W.; Tang, C.K. Occlusion-Aware Instance Segmentation Via BiLayer Network Architectures. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10197–10211.
Wang, E.; Yu, Q.; Chen, Y.; Slamu, W.; Luo, X. Multi-modal knowledge graphs representation learning via multi-headed self-attention. Inf. Fusion 2022, 88, 78–85.
Guo, Q.; Cao, S.; Yi, Z. A medical question answering system using large language models and knowledge graphs. Int. J. Intell. Syst. 2022, 37, 8548–8564.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.