Optical Convolutional Neural Networks: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , ,

As a leading branch of deep learning, the convolutional neural network (CNN) is inspired by the natural visual perceptron mechanism of living things, showing great application in image recognition, language processing, and other fields. Photonics technology provides a new route for intelligent signal processing with the dramatic potential of its ultralarge bandwidth and ultralow power consumption, which automatically completes the computing process after the signal propagates through the processor with an analog computing architecture.

  • convolutional neural networks
  • optical computing
  • photonics signal processing

1. Introduction

Convolutional neural networks (CNNs), as an important category of deep neural networks, are inspired by the natural visual perceptron mechanism of living things [1]. Since the first modern sense framework of CNNs, known as LeNet-1 [2], emerged in 1989, numerous representative CNN frameworks have been developed, including LeNet-5 (1998) [3], AlexNet (2012) [4], ZFNet (2014) [5], VGGNet (2015) [6], GoogLeNet (2015) [7], and ResNet (2016) [8]. Meanwhile, abundant progress has been made to deepen CNNs’ complexity and reduce the number of parameters [9,10,11]. Owing to the continuous optimization of network frames, CNNs have been widely used in image recognition [2,3,4,5,6,7,8,9,11,12,13], speech recognition [14,15,16], gaming [17,18], medicine [19,20], autonomous driving [21,22], and other fields.
The explosive increase in Internet data year by year has called for more intelligent and effective data processing [23]. As is well-known, there is a positive correlation between the accuracy of a CNN and the number of parameters [24]. Therefore, it has more stringent requirements on the computing hardware due to the demands of massive data processing and high-precision processing. For electrical hardware processors, performance improvements have followed Moore’s Law over the past few decades [25,26]. As the chip manufacturing process has gradually approached its physical limitations in recent years, the growth rate of single-chip computing power has gradually slowed [27,28], and semiconductor technology has entered the post-Moore era. Additionally, using the Von Neumann computing paradigm in the traditional computing hardware, such as CPU, GPU, FPGA, ASIC, etc., it is an indisputable fact that the discrete architecture of processor and memory makes it inevitable to trade-off between bandwidth and power consumption [29,30,31,32]. Hence, it is an obvious sharp conflict with the ever-increasing demand for high-performance processing and the slowing growth of computing power [28].
Optical devices, as an alternative, have been regarded as a competitive candidate in the “more than Moore” era [33] with the superiority of ultralarge bandwidth and ultralow power consumption. Compared with electrical vector-matrix multiplication (VMM), it is able to achieve better performance using optical devices, with the computing speed increasing by three orders of magnitude and the power consumption decreasing by two orders of magnitude [34]. In recent years, optical computing solutions, because of their intrinsic high computing speed [35], high computational density [36,37], and low power consumption [38,39], have been massively demonstrated by means of both discrete systems and integrated chips. CNNs, as one of the main branches, require more than 80% of full calculations to execute the convolution operation [57]. Accelerating the convolution process in the optical domain provides a subversive way to improve the computing speed and decrease the power consumption.

2. Development of Optical Convolution Neural Network

Generally, a CNN is composed of convolutional layers, pooling layers, fully connected layers, and nonlinear activations. With the convolutional layer, convolution operations are conducted to extract the features of input images. The convolutional layer with multiple kernels concurrently performs the convolutional operations to extract various feature images. The pooling layer following the convolutional layer is used to subsample and compress features, reduce the amount of calculation, and alleviate overfitting. Immediately afterwards, the fully connected layers are able to realize the full connection of parameters and generate the final classification results. Additionally, various nonlinear activation functions are employed following the convolutional layers and the fully connected layers, aiming to lead the nonlinear properties to the network.
One convolution operation can be divided into two processes: (1) multiplication between elements in the kernel matrix and data matrix and (2) addition of all multiplication results, which can be expressed as follows:
 
y = i = 1 m j = 1 n w i j x i j ,
where 𝑤𝑖𝑗 is the element of the convolution kernel, 𝑥𝑖𝑗 is the element of the input data, 𝑚 is the row count, and 𝑛 is the column count in the convolutional kernel. From Equation (1), one convolution operation is converted into vector-vector multiplication (VVM), and parallel convolution of multiple kernels is represented as a VMM [58]. At the same time, the optical matrix operation has been widely investigated [59], which makes it convenient to accelerate the convolution operation process with optical methods.
The optical architecture enabling the convolution operation has been blooming. For most of the reported optical CNNs, the convolution operations of CNNs are accelerated in the optical domain, and the rest remain in the electrical domain, which absorbs both the respective advantages for the ultra-bandwidth and low loss of light and the high precision recognition of electricity. Optical CNNs based on the implementation principle are generally divided into four categories: diffraction-based optical CNNs, interference-based optical CNNs, wavelength division multiplexing-based (WDM-based) optical CNNs, and tunable optical attenuation-based optical CNNs.
As an emerging research direction, optical CNNs have garnered significant attention since their inception. Table 1 presents a comparison of four optical CNN schemes, highlighting their parallelism, computing speed, integration density, and reconfigurability.
Table 1. Comparison of optical CNN schemes.
Diffraction-based optical CNNs exhibit advantages in terms of parallelism, computing speed, and scalability due to their utilization of spatial light. The presence of spatial light allows for a large number of neurons in each layer, facilitating the expansion of multiple channels and kernels in the spatial domain, thus enabling high parallelism. The abundance of neurons and high parallelism contributes to achieving high computing speeds. However, diffraction-based optical CNNs also suffer from notable disadvantages. The discrete components used make the system bulky, and attempts at integration result in performance degradation. Moreover, kernel-loading devices such as DOE and metamaterials are nearly impossible to reconfigure, while SLM and DMD devices have low data rates (typically ~kHz).
An interference-based optical CNN excels in reconfigurability. This scheme often utilizes MZI for kernel matrix loading, enabling rapid refresh rates in the range of tens of GHz. Despite the advantages of high-speed reconstruction offered by MZIs, their relatively large volume limits the integration density of interference-based schemes. Furthermore, the use of coherent light restricts the transmission of only one light at a time in the optical waveguide, thereby constraining parallelism and computing speed.
WDM-based optical CNN represents a promising and extensively researched solution. This scheme fully exploits the optical wavelength dimension, leading to high parallelism. The use of MRR as a wavelength-sensitive optical attenuator, with its radius as small as several micrometers and modulation rate reaching tens of GHz, further enhances computing speed, integration density, and reconfigurability.
The optical CNN based on tunable optical attenuation depends on the specific characteristics of the optical attenuator used. Due to the absence of a unified conclusion, it is not listed in the table. Additionally, apart from the four types of optical CNNs discussed above, there are other noteworthy optical CNN solutions, such as photon frequency synthesis [115] and photodetectors with adjustable responsivity [116], which warrant further research.
Presently, although optical CNNs exhibit advantages in terms of bandwidth, latency, and computational speed compared with electrical architectures, optical CNNs face challenges in surpassing the limitations of small realizable matrix sizes, a limited range of realizable functions, and low computing precision. Consequently, extensive efforts are required for optical CNNs to gain widespread usage.
First, on-chip large-scale integration needs to be broken through. The reported on-chip integrated optical accelerated computing architectures only realize the integration of tens of thousands devices at present, which is far less than their electrical counterparts. In optical computing, the power required for calculations such as electro-optical conversion, photoelectric conversion, and analog-to-digital conversion remains basically unchanged. By integrating more photonic devices, the common power required for computing will be averaged, thereby improving the energy efficiency of optical computing and giving full play to the advantages of optical computing.
Second, more functions should be realized with optical methods. Most optical CNN solutions primarily focus on the optical implementation of convolutions. Although there are related studies on the optical implementation of nonlinear activation and full connection [35,38,91,113,117,118,119,120], these studies remain relatively limited and warrant further exploration. In particular, limited by the small matrix size, it is still a challenge to achieve large-scale full connections using optical methods. Implementing more functions using optical methods will be conducive to expanding the application field of optical computing and promoting the practical application of optical computing.
Finally, the development of an in situ trainable arbitrary reconfigurable computing architecture is essential. At present, most optical CNN implementations adopt the offline training method, and the weight matrix is pretrained in the neural network simulation model. As a result, a deviation between the simulation model and the experimental system inevitably appears. In situ training, which updates the weight directly and performs the computation at the original place, offers a new form to accelerate the reconfiguration performance of the neural network and improve its precision. Several recently proposed in situ training schemes, such as physics-aware training [121], adaptive training [122], and other relevant methods [123,124], have been successfully introduced to optical computing. These approaches can be effectively incorporated into the optical CNN framework. By incorporating in situ training, it is more convenient to reconstruct the network structure (such as changing the size and number of convolution kernels, changing the number of convolution layers, etc.), and the influence of errors can be considered during the training process. In this way, the refactoring of more different, more complex classification tasks on the same hardware will become a reality, rather than the simple tasks that are fixed today (such as classifying handwritten digits).

This entry is adapted from the peer-reviewed paper 10.3390/app13137523

This entry is offline, you can click here to edit this entry!