Neuromorphic photonics represents a cuttingedge, multidisciplinary realm at the confluence of artificial intelligence (AI), photonics, and neuroscience ^{[1]}. Its overarching goal is nothing short of a transformative evolution in computing, seamlessly uniting the foundational principles of neuromorphic computing with the swiftness and efficiency inherent in photonics ^{[2]}. This inventive paradigm employs lightbased neurons and optical synapses to emulate the intricate behaviors of human brain cells closely, resulting in specialized hardware uniquely tailored for the domains of AI and machine learning ^{[3]}. The standout feature of this field is its remarkable energy efficiency, enabling lightningfast, parallel data processing while conserving power resources. By harnessing the velocity of light and mirroring the intricate neural networks (NNs) of the human brain, neuromorphic photonics has the potential to unlock entirely novel horizons in highperformance computing, poised to dramatically elevate applications in pattern recognition, data manipulation, and intricate problemsolving ^{[4]}^{[5]}. While still in its infancy, this field holds promise of more capable and efficient AI systems, with the potential to fundamentally reshape the computing landscape ^{[6]}.
AI technologies, encompassing facial recognition, machine learning, and autonomous driving, are reshaping our daily lives ^{[7]}^{[8]}. Deploying of taskspecific AI systems demands training NNs with extensive datasets on conventional computers. However, limitations in throughput and efficiency due to prevailing computer architectures currently hinder this process ^{[9]}. Drawing inspiration from the intricate architecture of the human brain, researchers are pioneering the development of nextgeneration intelligent computing systems designed to emulate synapses and neurons. These systems encode information using spatiotemporal pulse patterns generated by presynaptic neurons, with postsynaptic neurons accumulating and generating new neuronal pulses upon reaching stimulation thresholds. By integrating myriad neurons, these systems give rise to nonlinear spiking NNs, enabling information processing through spatiotemporally encoded neuron pulses. Intel’s TrueNorth chips, for instance, have achieved a remarkable level of energy efficiency, surpassing conventional microelectronic chips for specific AI tasks and rivaling the computational capabilities of the human brain ^{[10]}. Nevertheless, the scalability of integrated neurons remains hampered by challenges such as electrical interconnect bandwidth, pulse loss, and communication delays. Optical interconnects, offering substantial bandwidth, minimal loss, and negligible latency, have the potential to address these electrical interconnect limitations ^{[11]}.
The demands of realtime, dataintensive, intelligent information processing tasks underscore the need for innovative and smart optimization hardware. Convolutional neural networks (CNNs) excel at extracting hierarchical feature maps to enhance recognition accuracy, and there is a growing interest in employing photonics for their implementation. In this context, a largescale and adaptable photonic convolutional neural network (PCNN) that leverages a hardwarefriendly distributed feedback laser diode (DFBLD) is proposed ^{[12]}. This approach involves applying a biological timetofirstspike coding method to a DFBLD neuron to execute temporal convolutional operations (TCO) for image processing. In practical experiments, PCNN successfully employs TCO to extract image features using 11 × 11 convolutional kernels. Additionally, the temporal pulse shaping of a DFBLD neuron is explored to construct a densely connected and fully connected layer, enabling rapid adjustments of synaptic weights at a remarkable rate of 5 GHz and providing high classification accuracy in benchmark image classification tasks, with 98.56% for MNIST and 87.48% for FashionMNIST. These findings underscore the potential of optical analog computing platforms resembling neurons for realtime and intricate intelligent processing networks ^{[13]}.
2. Neuromorphic Photonic Integrated Circuits
With the recent emergence of Photonic Integrated Circuit (PIC) technology platforms, the timing is perfect for developing scalable, fully reconfigurable systems capable of executing vastly more complex operations than ever before ^{[14]}. While numerous fields, such as microwave photonics and physical layer security, stand to benefit significantly from this rapid increase in complexity, the community has yet to establish a universal processing standard for programming intricate multistage operations within the photonic domain. Neuromorphic photonics is an exciting and emerging field at the intersection of neuroscience and photonics. This groundbreaking discipline harnesses the efficiency of NNs and the lightningfast capabilities of photonics to create processing systems that can outperform microelectronics by orders of magnitude. Thanks to their partial analog nature, neuromorphic circuits can leverage optical signals’ vast bandwidth and energy efficiency. Additionally, they set the stage for a comprehensive processing standard for reconfigurable circuits capable of theoretically executing any task that an artificial NN can compute. Integrating these systems with lowpower microelectronic control promises processing efficiencies that surpass current digital standards by a considerable margin. In essence, the emergence of PIC technology, coupled with the advent of neuromorphic photonics, heralds a new era of computing where the potential for innovation and efficiency is boundless.
To transcend the constraints imposed by traditional microelectronic computing, it is imperative to incorporate unconventional techniques that leverage new processing methodologies. PICs offer a promising avenue to address these limitations, and several factors underscore their suitability. Firstly, photonic interconnects present a direct solution to the data transport quandary: a substantial portion of energy consumption on modern microelectronic chips is attributed to metal wires’ constant charging and discharging. This energy overhead can be circumvented by using onchip photonic links, especially as optical devices advance in efficiency ^{[15]}. Secondly, photonic systems can harness optical multiplexing and highspeed signals to achieve an impressive bandwidth density. This translates into a remarkable computational density (operations per second per square millimeter, ops/s/mm^{2}) for closely spaced waveguides or filters that perform densely packed operations ^{[16]}.
Furthermore, implementing linear operations like MultiplyAccumulate (MACs) in the photonic realm inherently consumes minimal energy, yielding a highly advantageous, sublinear scaling of energy consumption concerning the number of operations conducted ^{[17]}. The combination of these three properties can deliver substantial enhancements in performance, encompassing energy efficiency and computational density, as illustrated in Figure 1.
Figure 1. A comparison between specialized deeplearning digital electronic architectures and silicon photonic and nanophotonic platforms. In this context, photonic systems can support high onchip bandwidth densities while maintaining low energy consumption during data transmission and computational tasks. The metrics for electronic architectures have been sourced from various references ^{[18]}^{[19]}^{[20]}^{[21]}. The metrics for silicon photonic platforms are estimated based on a contemporary silicon photonic setup operating at 20 GHz, comprising 100 channels with tightly packed micro rings. Meanwhile, the nanophotonic metrics are derived from the assumption of closely packed athermal microdisks ^{[22]}, each occupying an area of approximately 20 µm, running at 100 GHz and operating close to the shot noise limit. Inspired by ^{[14]}.
Neuromorphic photonic systems have demonstrated processing speeds 6–8 orders of magnitude higher than their electronic counterparts ^{[23]}. Silicon photonics, an optoelectronic integration technology compatible with wellestablished microelectronics, harmonizes the ultralargescale logic and precision manufacturing attributes of CMOS technology with the highspeed and lowpower consumption benefits of photonic technology, effectively reconciling the conflict between technological advancement and cost constraints. In recent years, onchip NNs based on silicon photonic technology have made significant strides ^{[24]}. In 2017, Shen et al. showcased an onchip NN employing a siliconbased Mach–Zehnder interferometer structure capable of recognizing fundamental vowels ^{[17]}. In this architecture, an external subsystem configures the matrix element values for vectormatrix multiplication using Mach–Zehnder interferometer (MZI) structures. To modify these values during optimization, signals must be relayed from the NN to the control system. Tait et al. introduced onchip variable weight synapses based on silicon electrooptical modulators in 2016 ^{[25]}, as well as onchip neurons relying on silicon electrooptical modulators in conjunction with offchip multiwavelength lasers, wavelength division multiplexers/demultiplexers, and onchip photodetectors in 2019 ^{[26]}. This innovative structure facilitates weight adjustments by modulating the silicon microring with electrical signals and regulates the silicon microring modulator to achieve neuron functionality through electrical signals derived from onchip detector optoelectrical conversion.
Neuromorphic PICs on silicon platforms have witnessed remarkable advancements in recent times ^{[23]}^{[27]}^{[28]}^{[29]}. These photonic NNs (PNNs), even in their early stages with a limited number of neurons, have showcased their prowess in highbandwidth, lowlatency machinelearning signal processing applications. The next frontier in this domain involves the quest for largescale PNNs endowed with flexibility and scalability, positioning them to tackle dataintensive machine learning (ML) applications with highspeed requirements. In ^{[30]}, architectural foundations are proposed, focusing on microring resonator (MRR)based photonic neurons, both nonspiking and spiking, and the orchestration of PNNs through a broadcastandweight approach. A novel expansion of NN topologies by cascading photonic broadcast loops is discussed, culminating in a scalable NN structure with consistent wavelengths. Moreover, incorporating wavelengthselective switches (WSS) within these broadcasting loops is proposed, delivering the concept of a wavelengthswitched photonic NN (WSPNN). This innovative architecture opens new doors for integrating offchip WSS switches, enabling the interconnection of photonic neurons in versatile combinations, delivering unmatched scalability for PNNs, and accommodating an array of feedforward and recurrent NN topologies.
2.1. Deep DNNs
Deep neural networks (DNNs) have gained prominence due to advancements in processing power and the ubiquity of data. Faster and more affordable computing resources have facilitated rapid convergence, making deep learning (DL) more accessible. The widespread availability of data, along with improved algorithms, enhances the value of these networks, especially in applications like chatbots for businesses ^{[31]}^{[32]}. These networks, however, demand substantial computational power and extensive data sets. They excel in scenarios where ample data is available and where it is feasible to categorize or rank preferred outcomes ^{[4]}.
DNN represents a sophisticated machine learning (ML) technique that empowers computers, through training, to accomplish tasks that would be exceedingly challenging with traditional programming methods ^{[33]}. The inspiration for NN algorithms is drawn from the human brain and its intricate functions. Like the human mind, DNNs are designed not to rely solely on predetermined rules but to predict solutions and draw conclusions based on previous iterations and experiences. A NN consists of multiple layers of interconnected nodes that receive input from previous layers and generate an output, ultimately reaching a final result. NNs can encompass various hidden layers, and the complexity increases with adding more layers. Here are distinct neural network architectures (Figure 2):
Figure 2. Traditional NN (left) versus DNN (right).
 (A)

Traditional NNs: Typically composed of 2 or 3 hidden layers.
 (B)

DL Networks: These can contain up to 150 hidden layers, making them significantly more complex.
A DNN is considerably more intricate than a “simple” NN. A standard NN operates akin to a chess game, adhering to predefined algorithms. It offers different tactics based on inputs from the programmer, such as how chess pieces move, the size of the chessboard, and strategies for various situations. However, a NN transcends this inputbound behavior and can learn from past experiences, evolving into a DNN. For instance, on the same computer, you can train an NN, play games against other individuals, and enable it to learn as it engages in these matches. As it learns from various players, defeating a DNN, even for chess masters, might become exceedingly challenging or even insurmountable. DNNs can recognize voice commands, identify voices, recognize sounds and graphics, and accomplish a wide array of tasks beyond the capacity of traditional NNs. They leverage “big data” along with sophisticated algorithms to tackle complex problems, often requiring minimal to no human intervention.
Understanding the process of a DNN is best illustrated through a practical example. Imagine you have an extensive collection of hundreds of thousands of images, some of which feature dogs, and you aim to create a computer program to identify dogs in these pictures. At this point, you face a crucial decision. You can either write a program explicitly designed to identify dogs or opt for a more intelligent approach—a program that “learns” how to recognize dogs. Initially, you might choose the former option, but this turns out to be a lessthanideal choice. Conventional programming techniques require a laborious and intricate process, and the outcomes often lack the desired accuracy. To explicitly identify dog pictures, you must create a software program filled with conditional “if” and “then” statements. This program would elevate the probability of a dog’s presence whenever it detects a doglike attribute, such as fur, floppy ears, or a tail.
Convolutional neural networks (CNNs) represent a subset of AI explicitly designed to handle and learn from vast datasets. These networks are aptly named due to their distinctive architecture and purpose. CNNs excel in image recognition and perform not only generative but also descriptive tasks. Generative tasks encompass various activities such as autocropping, caption generation, video processing, mimeographing, and image overlays. A vital component of a CNN is the convolutional layer, where each neuron processes information from a small portion of the visual field, with their inputs forming a checksumlike pattern to create feature maps.
Artificial neural networks (ANNs) are interconnected perceptrons organized into various layers. ANNs are often called Feedforward Neural Networks, as they process inputs linearly, forwarding the results through the network layers. These networks are known as universal function approximators, capable of learning any function, and their versatility is attributed, in part, to activation functions. These functions introduce nonlinearity into the network, enabling it to learn intricate relationships between inputs and outputs and promoting cooperative learning among network parts. It is important to note that the logic behind neural networks is often incomprehensible to humans. Deep learning models operate as black boxes, with hidden layers of nodes creating complex, interconnected logic. Some attempts have been made to visualize the logic behind NNs for image recognition, but this is not always possible, especially for demanding tasks.
2.2. NNs with Complex Arithmetic Calculations
While computers excel at performing complex calculations, the realm of solving mathematical problems continues to present a significant challenge for artificial intelligence ^{[34]}. This challenge can be viewed from two distinct angles. On the one hand, grounding structured mathematical knowledge into a framework of intrinsic meaning has persisted as a longstanding issue in symbolic AI ^{[35]}. On the other hand, NNs have traditionally struggled to acquire mathematical proficiency, as their nature primarily hinges on statistical pattern recognition abilities rather than the explicit application of syntactic rules ^{[36]}. The process of mathematical reasoning poses welldocumented hurdles for connectionist models. Mathematical formulas employ symbols that often appear as arbitrary tokens, necessitating manipulation under welldefined rules that involve compositionality and systematicity. Furthermore, extracting mathematical knowledge from examples should extend beyond the observed data distribution, facilitating the ability to extrapolate by discovering fundamental ‘first principles’.
Notwithstanding these formidable challenges, recent breakthroughs in DL have sparked a renewed enthusiasm for the notion that NNs may attain advanced reasoning capabilities, consequently displaying symbolic behavior ^{[37]}. Although deep networks have historically grappled with fundamental concepts such as the understanding of ‘integer numbers’ ^{[38]}, the last few years have witnessed the emergence of several models that showcase remarkable proficiency in tackling intricate mathematical tasks.
For instance, sequencetosequence architectures have demonstrated their ability to learn the intricacies of function integration and the resolution of ordinary differential equations, occasionally outperforming even widely used mathematical software packages in terms of accuracy ^{[39]}. DL models have further made notable inroads in the realm of automated theorem proving ^{[40]} and have actively supported expert mathematicians in the formulation of conjectures and the establishment of pioneering results in the realm of pure mathematics ^{[41]}.
In remarkable developments from last year, deep reinforcement learning uncovered a more efficient algorithm for performing matrix multiplication ^{[42]}, while finetuning a pretrained language model on computer code enabled the resolution of universitylevel mathematical problems at a level comparable to human expertise ^{[43]}. These achievements herald a promising new era where neural networks may bridge the gap between mathematical reasoning and machine learning, potentially unlocking new frontiers in artificial intelligence.
These outstanding accomplishments owe much to the advent of meticulously curated, expansive datasets encompassing mathematical problems and their corresponding solutions. Furthermore, they owe their success to inventing novel, sometimes ad hoc, architectures tailored to more effectively process numerical symbols and mathematical notations. In addition, strides in many tasks have been propelled by creating largescale language models, which exhibit astonishing innate numerical capabilities ‘out of the box’, that can be further honed through finetuning and strategic prompting techniques.
However, it is imperative to emphasize that these achievements do not necessarily equate to a full grasp of the semantics underlying numbers and basic arithmetic by these models. Their performance on relatively straightforward numerical tasks often reveals fragility, signaling a need to enhance their foundational mathematical skills to establish a more dependable foundation for mathematical capabilities. This notion finds support in a wealth of literature on child development and education, which underscores the significance of fundamental numeracy skills such as counting, quantity comparison, comprehension of number order, and mastery of the baseten positional numeral system as robust predictors of later mathematical achievement ^{[44]}.
The quest for solutions to matrix eigenvalues has perpetually been a focal point of contemporary numerical analysis, with profound implications for the practical application of engineering technology and scientific research. While extant algorithms for matrix eigenvalue computation have made considerable progress in computational accuracy and efficiency, they have struggled to find a foothold of photonic platforms. Enter the PNN, a remarkable fusion of potent problemsolving capabilities and the inherent advantages of photonic computing, characterized by its astonishing speed and minimal energy consumption. In ^{[45]}, an innovative approach introduces an eigenvalue solver tailored for realvalue symmetric matrices, leveraging reconfigurable PNNs. This strategy demonstrates the practicality of solving eigenvalues for n × n realvalue symmetric matrices using locally connected networks. In a groundbreaking series of experiments, the capacity to solve eigenvalues for 2 × 2, 3 × 3, and 4 × 4 realvalue symmetric matrices through the deployment of graphene/Si thermooptical modulated reconfigurable photonic neural networks featuring a saturated absorption nonlinear activation layer was showcased. Theoretical predictions indicate a remarkable test set accuracy of 93.6% for 2 × 2 matrices, with experimental results achieving a measured accuracy of 78.8%, aligning with standardized metrics for easy comparison. This work not only charts a course for onchip integrated photonic solutions to eigenvalue computation for realvalue symmetric matrices but also forms the bedrock for a new era of intelligent onchip integrated alloptical computing. This breakthrough promises to transform the landscape of computational methodologies, ushering in a future where photonic platforms play a pivotal role in numerical problemsolving across various domains ^{[45]}.
The objective of the proposed PNN is to address the challenge of computing eigenvalues for symmetric matrices. This problem frequently arises in the context of various physical scenarios (as shown in Figure 3a). The initial focus centers on solving the eigenvalue problem for 2 × 2 symmetric matrices characterized by nonnegative realvalue elements and eigenvalues. Furthermore, the matrix elements were confined within the range of 0 to 10. This limitation does not constrain the network’s performance, as any other matrices can be derived through linear scaling from a matrix within this constrained domain. Crucially, this network is adaptable and designed to handle the eigenvalue problem for n × n matrices under similar conditions. This versatility allows it to be employed in diverse scenarios, offering a powerful tool for eigenvalue computation in various applications.
Figure 3. Conceptual framework of the newly proposed photonic neural network. The essential steps involved in achieving the desired task (a), an optical micrograph showcasing the distinctive structure of the proposed network, featuring nine input ports (i_{1}–i_{9}) and four output ports (o_{1}–o_{4}) ^{[45]} (b), an optical micrograph that zooms in on a single cell within the network, housing two phase shifters and a merging structure ^{[45]} (c).
The structure of the PNN is characterized by an architectural design that includes a single linear fully connected layer, complemented by a sophisticated fivelayer locally connected arrangement. This network boasts nine input and four output ports, as Figure 3b depicts. The fivelayer structure is a critical component of the described architecture, characterized by an intricate arrangement of neurons. In the first layer, eight neurons are featured, each sharing a phase shifter with its neighboring unit (as illustrated in Figure 3c).
The next layer comprises seven neurons, with each successive layer reducing the count by one, resulting in 35 tunable weights. Additionally, the researchers introduced two extra weights for training. The first weight pertains to the input light’s intensity, denoting the intensity ratio. This factor is crucial as the nonlinear activation function behaves differently under varying intensities. The second weight governs the output ratio, linearly adjusting the relationship between output intensity and the corresponding eigenvalue, effectively establishing the output ratio. This adjustment is essential because, unlike electronic neural networks, optical layers cannot manipulate light intensity freely and directly. Consequently, the absolute value of the output signal may not align with the scale provided in the dataset.
Photonic circuits have also found their applicability in complexvalued neural networks ^{[46]}^{[47]}^{[48]}. The articles ^{[46]}^{[47]} presented neural network architectures that use complex arithmetic computations and a MZI to encode information in both phase and amplitude (Figure 4a,b). This approach allows complex arithmetic to be performed using the properties of interference. The resulting complexvalued ONCs (optical neural chips) perform better on several tasks than their counterparts in singleneuron and deployed network implementations. A single complexvalued neuron can solve some nonlinear problems that a realvalued analog cannot compute. There are many comparative analyses, tests, and trainings of the NN on various datasets. The data obtained suggest that this architecture uses double the number of trained free parameters, and can classify nonlinear patterns with simple architectures (fewer layers). Research results have shown that these architectures significantly improve the speed and accuracy of computation compared to traditional realvalued circuits.
Figure 4. The developed architectures of complexvalued optical neural architectures using: (a,b) MZIs, the circuits themselves realize a multiport interferometer with phase shifters (PSs) inserts used for phase tuning ^{[46]}^{[47]}; (c) MRRs for matrixvector multiplication (MVM) applications using WDM ^{[48]}.
The application of MRR arrays in complexvalued neural networks is also possible, as demonstrated in ^{[48]}. To realize the transition from real values to complexvalued data, an approach with a predecomposition of the input matrix (the values are supplied to beams with different wavelengths employing optical intensity modulators) and the transmission matrix (controlled by selection of values utilizing heaters on the resonator rings) (Figure 4c) is used in this work. A balanced photodetector registers the result of the multiplication of the two matrices. This approach allowed the realization of other mathematical transformations, including discrete Fourier transform (DFT) and convolutional image processing. The results of the experiments in both signal and image processing unequivocally show that the newly proposed system can expand matrix computation to include real numbers, full complex numbers, higher processing dimensions, and convolution. Consequently, the processor can function as a versatile matrix arithmetic processor capable of handling intricate tasks in different scenarios. The researchers note that improved system performance can be obtained by adding parallel computation with WDM and increasing the degree of integration of the circuit components.
2.3. Spike NNs
Over the past decade, ANNs have made remarkable strides, progressing from the initial multilayer perceptron (MLP) of the first generation to the cuttingedge techniques of the secondgeneration DNNs ^{[49]}^{[50]}. This advancement has been significantly fueled by abundant annotated data and the widespread availability of highperformance computing devices, including versatile Graphics Processing Units (GPUs). However, even with these achievements, ANNs still fall short of matching biological neural networks’ (BNN) energy efficiency and their online learning capabilities. Many endeavors have been undertaken to diminish the power consumption of conventional deeplearning models. These efforts aim to uncover more streamlined networks that deliver similar performance with reduced complexity and fewer parameters than their original counterparts. Several techniques have been developed for this purpose, including quantization ^{[51]}, pruning ^{[52]}, and knowledge distillation ^{[53]}. Quantization involves converting the network’s weights and inputs into integer types, thereby lightening the overall computational load. Pruning entails the iterative removal of connections within a network during or after training to compress the network without compromising performance. Knowledge distillation transfers the intricate knowledge acquired by a highcomplexity network, the teacher, to a lightweight network known as the student.
While ANNs and DNNs have traditionally been inspired by the brain, they fundamentally differ in structure, neural computations, and learning rules compared to BNNs. This realization has led to the emergence of spiking neural networks (SNNs), often regarded as the third generation of NNs, offering the potential to surmount the limitations of ANNs. The utilization of SNNs on neuromorphic hardware like TrueNorth ^{[54]}, Loihi ^{[55]}, SpiNNaker ^{[56]}, NeuroGrid ^{[57]}, and others presents a promising solution to the energy consumption predicament. In SNNs, similar to BNNs, neurons communicate via discrete electrical signals known as spikes and operate continuously in time. Due to their functional resemblance to BNNs, SNNs can exploit the sparsity inherent in biological systems and are highly amenable to temporal coding ^{[58]}. While SNNs may still trail behind DNNs regarding overall performance, this gap is narrowing for specific tasks. Notably, SNNs typically demand considerably less energy for their operations. Nevertheless, training SNNs remains challenging due to the intricate dynamics of neurons and the nondifferentiable nature of spike operations.
2.4. Convolutional Neural Networks (CNNs)
CNNs are inherently feedforward networks, exhibiting unidirectional information flow, transmitting data exclusively from inputs to outputs. As ANNs draw inspiration from biological systems, CNNs share a similar motivation. Their architecture is heavily influenced by the brain’s visual cortex structure, characterized by layers of simple and complex cells ^{[59]}^{[60]}. CNN architectures offer a range of variations yet generally comprise convolutional and pooling (subsampling) layers organized into distinct modules. These modules are subsequently followed by one or more fully connected layers, resembling a conventional feedforward NN. Often, these modules are stacked to create deep models. Figure 5 illustrates typical CNN architecture for a simplified image classification task, where an image is initially fed into the network and undergoes several convolution and pooling stages. The representations obtained from these operations are then channeled into one or more fully connected layers. Finally, the last fully connected layer provides the output as a class label. While this architecture remains the most prevalent in the literature, various changes have been proposed in recent years to enhance image classification accuracy or economize on computation costs.
Figure 5. CNN image classification pipeline.
CNNs represent a revolutionary paradigm shift in image recognition, enabling the detection and interpretation of intricate patterns within visual data ^{[61]}. Their effectiveness is unrivaled, positioning them as the preeminent architecture for image classification, retrieval, and detection tasks, delivering results characterized by exceptional accuracy. The versatility of CNNs extends to realworld scenarios, where they consistently yield highquality results. They excel in localizing and identifying objects, be it a person, a car, a bird, or any other entity within an image. This adaptability has made CNNs the default choice for predictive image input tasks. A fundamental attribute of CNNs is their capacity to attain ‘spatial invariance’. This signifies their ability to autonomously learn and extract image features from any location within the image, obviating the need for manual feature extraction. CNNs draw these features directly from the image or data, underscoring their potency within the realm of DL and their remarkable precision. As elucidated in ^{[62]}, the purpose of pooling layers is to reduce the spatial resolution of feature maps, thereby achieving spatial invariance to input distortions and translations. Pooling layers streamline image processing and enhance computational efficiency by reducing the number of required parameters, resulting in expedited data processing. This reduction in memory demands and computational costs bolsters the appeal of CNNs. While CNNs have prominently left their mark on image analysis, their scope extends well beyond this domain. They can be applied to diverse data analysis and classification challenges. This adaptability spans various sectors, yielding precise outcomes in face recognition, video classification, street and traffic sign recognition, galaxy classification, and the interpretation and diagnosis of medical images, among others ^{[63]}^{[64]}^{[65]}.
2.5. Methods for Implementing the Activation Functions in Optical Neural Networks
AI has become instrumental across diverse applications. Nevertheless, AI systems traditionally demand substantial computational resources and memory. The diminishing returns of Moore’s law have signaled a shift away from conventional architectures for AI algorithms, as referenced in ^{[66]}. Furthermore, the pressing need for powerefficient implementations of ANNs has surfaced, particularly in scenarios like image recognition, where processing a single image may entail billions of operations ^{[67]}. There is an active exploration into replacing or supplementing traditional integrated electronic circuits with photonic circuits. A pivotal facet of silicon photonics is WDM, which empowers the simultaneous transmission of multiple signals over a shared medium without interference. In Optical Neural Networks (ONNs), WDM facilitates parallel processing of multiple data streams simultaneously. ONNs promise to surpass their electronic counterparts in terms of both speed and energy efficiency. For instance, common operations like matrix multiplications are resourceintensive on conventional computers, but they can be executed at ultrahigh speeds using specialized configurations of photonic networks ^{[68]}. Alloptical ANNs, devoid of optoelectronics or electrooptical conversion other than the interface, enable matrix multiplications to occur at the speed of light as optical signals propagate through waveguides. Silicon photonics further allows the integration of photonic and electronic devices on the same platform ^{[69]}.
In this context, two prominent optical modulators, Mach–Zehnder interferometers (MZIs) and microring resonators (MRRs), are commonly employed ^{[70]}^{[71]}. MZIs, although bulkier, exhibit resilience to process and temperature variations due to their signal processing method, which involves signal delay within one of the two branches. On the other hand, MRRs are more compact and rely on slight detuning of the resonant wavelength from the input signal to perform dot products. This approach enables WDM but introduces challenges related to the accurate calibration of the resonant rings, as their resonance can drift with temperature variations, leading to increased complexity and power overhead.
Replicating an ANN with an Optical Neural Network (ONN) presents a significant challenge, primarily revolving around the comprehensive optical implementation of every core module in a conventional ANN. While optical matrix multiplication has been successfully realized ^{[72]}, the activation function (AF), a pivotal element in ANNs, remains a complex issue. The matrix multiplication stage corresponds to the linear transformation data undergo in an ANN. However, to achieve optimal results, a nonlinear transformation is equally essential, typically performed by the AF. Existing contributions in this domain have taken different approaches. Some ONN implementations incorporate the AF through computerbased or partially electrical components. In contrast, others strive for full optical integration by utilizing optical nonlinearities at either a material or device level. In the former approach, the optical circuit’s information is converted into electrical format for AF processing on a computer, and then the output is reconverted into the optical circuit. However, this method limits the network’s speed due to electronic circuit constraints, introducing noise that degrades accuracy. Moreover, this dual conversion process introduces considerable latency and higher power consumption, ultimately undermining the advantages of optical implementation.
Despite the introduced network delays and significant increases in power consumption and chip size, the OEO conversion remains the most common way to implement the activation function on a photonic chip. Since achieving nonlinearity of the characteristic only on photonic elements is a challenging task, many researchers are developing various combinations of photonic elements that can influence the characteristic for the necessary adjustment by electronic components. Also, solutions include using hybrid structures (Ge/Si hybrid structure in a microring resonator) ^{[73]}, structures using the freecarrier dispersion effect (scheme with a Mach–Zehnder interferometer loaded with MCR, heating elements, and a Mach–Zehnder coupler) ^{[73]}^{[74]}^{[75]} and another popular direction—phase change material (PCM) coatings ^{[76]}^{[77]}. The given examples can realize not one but several variants of activation functions: radial basis, sigmoid, softplus, Relu, and ELU. This increases the flexibility of these structures because, depending on the task solved by the neural network, different characteristics and threshold values of activation functions may be required.
Consequently, despite the promising results achieved by works that implement the AF electrically ^{[78]}^{[79]}, it is believed that an optical AF is imperative to unlock the full potential of ONNs. Such an approach can mitigate the bottlenecks associated with electronic conversions and offer the speed, precision, and efficiency required to fully harness the capabilities of ONNs.
Nevertheless, the implementation of AFs in optical networks can diverge due to the inherent nature of optical computing. Several standard optical activation functions are employed in these systems. One approach involves Nonlinear Optical Activation; whereby optical components are deliberately engineered to demonstrate nonlinear behavior. Notable examples include the Kerr effect and crossphase modulation, both of which enable the creation of nonlinear optical activations by nonlinearly modulating the intensity of the light field ^{[80]}. Optical bistability is another avenue, employing optical bistable devices as activation functions. These devices exhibit two stable states and can be manipulated by adjusting input power or other optical parameters, thus serving as activation elements ^{[81]}. Optical switches come to the fore in the realm of alloptical Switches ^{[82]}. These switches can be deployed to execute binarylike activation functions by altering the optical signal’s path or state based on input intensity, rendering them wellsuited for binary activations within optical neural networks. MZIs represent yet another option, capable of generating optical interference patterns sensitive to input intensity ^{[83]}. Through controlled phase shifts in the interferometer, they can be harnessed to perform activation functions. Nonlinear crystals offer a different route, enabling the creation of optical parametric amplifiers and oscillators and introducing nonlinear activation functions within photonic neural networks ^{[84]}. Lastly, resonators like ring resonators can be incorporated as activation functions, capitalizing on their resonance properties and input power levels ^{[85]}.
The choice of an optical activation function in photonic neural networks hinges on the specific architectural design, hardware components, and the intended network characteristics. These optical activation functions are engineered to carry out nonlinear operations on optical signals, mirroring the behavior of digital activation functions found in conventional neural networks. Optical neural networks remain an active arena of research, continually producing novel techniques for implementing optical activation functions.
2.6. Programmable PNNs
The rapid and explosive growth of AI and Deep Learning (DL), coupled with the maturation of photonic integration, has opened a new realm of possibilities for optics in computational tasks ^{[86]}^{[87]}. Applying photons and advanced optical technologies in Neural Network (NN) hardware holds immense promise. It is projected to substantially increase MultiplyAccumulate (MAC) operations per second compared to traditional NN electronic platforms. Computational energy efficiency is estimated to plummet below the femtojoule (fJ) per MAC mark, while the area efficiency is anticipated to soar beyond millions of MAC operations per square millimeter ^{[88]}^{[89]}. This paradigm shift in NN hardware seeks to leverage the high data transmission rates enabled by integrated photonic technologies while also harnessing the compact size and low power consumption capabilities inherent to chipscale designs. Up until now, the predominant focus in photonic devices designed for weight calculations has centered around elements that can be slowly reconfigured, such as ThermoOptic (T/O) phase shifters ^{[47]} and PhaseChange Material (PCM)based nonvolatile memory structures ^{[86]}. This emphasis on slow reconfiguration implies that inference applications currently take precedence in neuromorphic photonics ^{[23]}.
Extending reconfiguration capabilities to Photonic (P)NN implementations demands a platform that can accommodate various functional layouts within the same neural hardware. Over the past few years, the realm of photonics has made significant strides in programmability ^{[90]}, and programmable PICs ^{[91]} have emerged as a pivotal resource for fostering costeffective, versatile, and multifunctional photonic platforms, akin to the concept of electronic FieldProgrammable Gate Arrays (FPGAs) ^{[90]}. Furthermore, it has been demonstrated that merely incorporating slowly reconfigurable Mach–Zehnder Interferometric (MZI) switches within a suitable architectural framework can provide a plethora of circuit connectivity and functional possibilities ^{[90]}. Nonetheless, the unique characteristics of NN architectures necessitate the exploration of alternative functionalities yet to be covered by programmable photonic implementations. While contemporary photonic weighting technology can indeed facilitate weight value reconfiguration ^{[17]}, there is a growing shift towards considering programmable activation functions ^{[92]}. Nevertheless, it is essential to note that existing neuromorphic photonic architectures lack reconfiguration mechanisms for their linear neuron stages. Photonic Neural Networks (PNNs) have mainly advanced within two primary architectural categories for implementing linear neural layers. The first category involves incoherent or WavelengthDivisionMultiplexed (WDM) layouts, where each axon within the same neuron is assigned a distinct wavelength ^{[93]}. The second category centers on coherent interferometric schemes, in which a single wavelength is utilized throughout the entire neuron, harnessing interference between coherent electrical fields to perform weighted sum operations.
An innovative architecture is proposed in ^{[94]} that seamlessly integrates WDM and coherent photonics to empower Programmable Photonic Neural Networks (PPNNs) with four distinct operational modes for linear neural layers. Building upon their previously proposed dualIQ coherent linear neuron architecture ^{[95]}, which recently demonstrated remarkable computational performance as a PIC with groundbreaking compute rates per axon ^{[96]}, their next step is advancing single neuron architecture. This approach involves harnessing multiple wavelength channels and corresponding WDM De/Multiplexing (DE/MUX) structures to create multielement and singleelement fanin (input) and weight stages for each axon. Programmability is achieved by integrating Mach–Zehnder Interferometer (MZI) switches, which can dynamically configure the connections between fanin and weighting stages, offering the flexibility to define neural layer topologies through software.
A comprehensive mathematical framework for this programmable neuromorphic architecture was established and delved into a thorough analysis of potential performance limitations associated with using multiple wavelengths within the same interferometric arrangement. These findings led to a straightforward mechanism to mitigate wavelengthdependent behaviors in modulators and phase shifters at the fanin and weighting stages. As a result, this programmable layout consistently delivers exceptional performance across all four distinct operational modes, ensuring that supported neurons always maintain a relative error rate lower than a specified threshold, provided that interchannel crosstalk remains within the typical range of values below a certain threshold.
Figure 6a ^{[94]} depicts the fundamental structure of the neural layer. Instead of a single Continuous Wave (CW) input optical signal, M multiplexed CW signals are each centered at λ_{m} and dedicated to an independent virtual neuron. The input and weight modulators have been replaced by more intricate modulator banks, as illustrated in Figure 6c,e. Softwarecontrolled switches enclose these modulator banks. The multichannel input signal is divided into two portions in the initial stage. One portion is directed to the bias branch, while the remaining part enters the Optical Linear Algebraic Unit (OLAU). Within the OLAU, the signal undergoes further splitting, with equal power distribution achieved by a 1toN splitter, an example of which is provided in Figure 6b. Subsequently, after being appropriately modulated by inputs (x_{n,m}) and weighted by (w_{n,m}), the signal is routed to the Nto1 combiner, as depicted in Figure 6d ^{[94]}. At this juncture, the output signal interferes with the bias signal within a 3 dB Xcoupler and is then directed to the DEMUX to generate the outputs (y_{m}). In the final step, each channel (m) undergoes algebraic addition of the weighted inputs with a designated bias. This results in a total of M independent Nfanin neurons.
Many cuttingedge programmable photonic circuits leverage the remarkable capabilities of Mach–Zehnder interferometers (MZIs). MZIs offer precise control over power splitting ratios and relative phase shifts between input and output ports, achieved by adjusting the phaseshifting control elements using either thermooptic or electrooptic effects. Through the strategic combination of multiple directional couplers and phase shifters within specific mesh configurations ^{[97]}^{[98]}, MZIbased architectures can perform a diverse array of linear transformations across various ports. When complemented by opticelectrooptic nonlinearity ^{[17]} or opticalmodulatorbased reprogrammable nonlinearity ^{[99]}, MZIbased architectures have proven their mettle in tackling intricate machine learning tasks, boasting superior processing speeds. Nevertheless, in the pursuit of significant phase tuning ranges, MZIs demand relatively high driving voltages ^{[100]}, and the devices can extend up to around 100 μm in length. In largescale onchip integrated circuits designed for complex applications, two vital factors emerge as primary concerns: the device’s footprint and power consumption. A natural and promising avenue is the adoption of resonant structures that enhance lightmatter interactions, thereby reducing device footprint, driving voltages, and overall power consumption ^{[100]}.
Among these, MRRs have garnered attention for their ability to program realvalued weights through a ‘broadcastandweight’ protocol ^{[101]}, resembling a continuoustime recurrent neural network ^{[27]}. A notable advancement involves programming weights at the interconnected waveguides between two MRRs using phasechange materials. This innovation has led to the development of a photonic tensor core, serving as a robust dotproduct engine ^{[102]}. It is worth mentioning that most prior proposals employing MRRs primarily relied on wavelengthdivision multiplexing for input signals, and incoherently aggregated signals at the photodetectors. The potential of coherence networks, which harness the wave nature of electromagnetic fields, holds promise for novel advancements in the design of optical neural networks ^{[47]}.
A groundbreaking coherent optical neural network architecture built upon MRRs is proposed in ^{[103]}. This innovative approach offers notable advantages regarding device footprint and energy efficiency compared to conventional optical neural networks based on Mach–Zehnder interferometer (MZI) architectures. This architecture’s linear matrix multiplication layer is fashioned by linking multiple linear units, each comprising a serially coupled doubleRR ^{[104]} for harmonizing signals from different ports and a singleRR for precise phase adjustments. Incorporating elementwise activation at each port, this nonlinear unit is crafted using microring modulators and electrical signal processing, granting the flexibility to program diverse nonlinear activation functions. Notably, the linear and nonlinear components presented in this work maintain the coherency of input signals, thus constituting a complexvalued neural network ^{[47]}. Moreover, the inherent flexibility of this design enables the direct cascading of each layer on the same chip without the need for intermediate digitaltoanalogue conversions. This reduces latency and minimizes energy waste associated with signal conversions. The inputoutput relationship in the designed architecture was illustrated through a transfer function, and automatic differentiation was employed ^{[24]}^{[25]} to train the tunable parameters directly. The design and training algorithms are not confined to the ringbased MRR design and can be adapted to various tunable systems. The network’s proficiency in information processing tasks was showcased to provide a concrete example of its capabilities, such as functioning as an Exclusive OR (XOR) gate and conducting handwritten digit recognition using the MNIST dataset ^{[105]}.
In ^{[103]}, ringbased programmable coherent optical neural network configuration is presented, as illustrated in Figure 6. Figure 6f,g are dedicated to the fundamental elements responsible for executing the linear transformation, described by the matrix 𝑊_{𝑙}. In contrast, Figure 6h represents the component running nonlinear activation functions. These components are constructed using waveguides that are intricately coupled to RRs. It is noteworthy that in this design, all RRs maintain a uniform diameter, while the separation distances between the rings and waveguides can be adjusted based on the specific functionality they serve.
Furthermore, this design operates under continuous wave conditions at a single operating frequency, denoted as ω_{0}. This characteristic enables us to exert precise control over the phase and amplitude of transmitted signals by adjusting the refractive index of each component. In essence, this allows for finetuning the neural network’s performance ^{[103]}. Figure 6i displays a waveguide’s transmission and phase responses of sidecoupled with a ring as a function of phase detuning, Δϕ. These responses are shown for both the critically coupled and overcoupled scenarios. In the case of overcoupling, indicated by the components colored in green, these responses are utilized for phasetuning purposes. On the other hand, the nonlinear activation ring, highlighted in blue, requires critical coupling to achieve a more extensive amplitude tuning range. Figure 6j presents an illustrative example of the transmission and phase response of the coupled double ring employed as a signalmixing component. The key parameters involved here are the ringwaveguide coupling coefficient (𝑟𝑟𝑤) at 0.85, the ringring coupling coefficient (𝑟𝑟𝑟) at 0.987, and the single round trip amplitude transmission (𝑎) at 1 ^{[103]}.
Figure 6. (a) An illustration of the PPNN. It consists of several components, including M laser diodes (LDs), a MUX, a 3dB Xsplitter, a bias branch denoted as W_{b}, and a reconfigurable Optical Linear Algebra Unit (OLAU) ^{[94]}. The OLAU comprises a 1toN splitting stage, input (X_{n}) and weight (W_{n}) modulator banks, and an Nto1 combiner stage. The output from the combiner stage interferes with the bias signal within a 3dB Xcoupler and is then sent to a DEMUX. A closer examination reveals details of (b) 1toN splitting and (d) its Nto1 coupling stage ^{[94]}, (c) view of the bias branch, which includes wavelengthselective weights and phase modulators ^{[94]}, (e) a closer look at an axon of the OLAU, which consists of switches for signal routing and modulators for inputs (x_{n},m) and weights (w_{n},m) ^{[94]}, layout of a Single Layer Coherent Optical Neural Network ^{[103]} (f) a tunable allpass single RR functions as a phase tuning component, (g) tunable seriallycoupled double RRs are employed as signal mixing components between the ports, (h) the nonlinear activation unit transforms input signal 𝑥𝑛 into 𝑓(𝑥𝑛), where 𝑓 represents a nonlinear function (with 𝑛 = 3 in this example). The black ring within the nonlinear activation unit acts as a directional coupler, directing a portion of the optical energy (𝛼) for electrical signal processing. The diode is a photodetector, and the blue ring modulates the signal. An electronic circuit (M) processes the electronic output from the photodetector to generate a modulation signal for the right ring ^{[103]} (i) displays the transmission and phase responses of a bus waveguide sidecoupled with a ring, showcasing variations as a function of phase detuning, Δϕ. Overcoupling, indicated in green, is employed for phasetuning components. At the same time, critical coupling, highlighted in blue, is crucial for achieving a larger amplitude tuning range in the nonlinear activation ring ^{[103]}, (j) provides an example transmission and phase response of the coupled double ring, used as a signal mixing component ^{[103]}.