Since technology scaling cannot improve the performance of digital Chiplet (CPU compute die) and analog Chiplet (IO Chiplet and memory Chiplet) in the same proportion without increasing the cost. The design method of computing architecture based on Chiplet achieves the optimization of performance and cost by selecting the combination of Chiplet with the best technology. Further, it is necessary to reduce the size of electronics driven by small form factors and the lightweight of wearable (motion watch, bodily function devices, etc.), portable electronics (mobile, laptop, etc.); therefore, more and more computing systems are designed with 3D architectures. The computing system performance can be improved by co-design of 3D architectures and advanced packing technology.
This approach is widely used by AMD in high-performance computing (HPC) system design, enabling rapid development of two products through a different number of Chiplets combinations, such as Rome and Matisse [
9], as shown in
Figure 4a. The most obvious advantages are that the design of the computing system is simplified and the time to market of product is reduced. The other merits of the architecture include the fact that the digital Chiplet is backward compatible with complex interfaces and the memory Chiplet; that is, the optimal combination of computing and memory Chiplets can be selected according to the computing ability requirements, which has higher scalability and reconfigurability compared with the traditional multi-core architecture and SoC computing system architectures. In order to improve energy efficiency, Kadomoto et al. [
27] proposed a method to realize Chiplet communication using the mutual coupling effect of on-chip inductor coils, and fabricated a communication network using 0.18 µm process. The maximum bandwidth can reach 1.6 Gb/s, and the time variation is 3%. The total power consumption is 14.5 mW. The computing architecture has potential in medical microrobots. Although the inter-chip communication based on mutual inductance simplifies the routing design; however, electromagnetic coupling in a small volume leads to signal timing deterioration; therefore, this method requires a sufficient shielding design, which can increase the design difficulty. Burd et al. [
28] proposed the infinity fabric (IF) technology to connect Chiplets for higher scalability and configurability in a computing system. It combines scalable data fabric (SDF) and scalable control fabric (SCF) as a critical enabler and utilizes 3D package routing layers to support more complex connections. The in-package bandwidth can achieve 256 GB/s with 534 IFs, and its energy efficiency is 1.2 pj/bit (2 pj/bit for EMIB). CEA-LETI [
29] developed a 96-core processor by stacking 28 nm computing Chiplet on the 65 nm interposer with a power management module. The Chiplet interconnected with µbump (20 µm pitch), TSV (depth to width ratio of 10:1 and 40 µm pitch) and RDL (10 µm width and pitch of 20 µm). The Chiplets communication can be achieved by extendable Network on Chip (NoC), and the bandwidth is above 3 Tbit/s/mm
2, delay below 0.6 ns/mm [
30], as shown in
Figure 4b. The Lakefield mobile processor also adopted multiple Chiplets design technology, which consists of the computing and memory Chiplets prepared with optimal technology (10 nm and 22 FFL). All Chiplets were bonded face to face with micro-bumps in 50 µm pitch (Foveros technology) [
31]. The parasitic capacitance and resistance are below 250 fF and 70 mΩ, respectively. The data transfer rate bandwidth is up to 500 Mb/s with an energy efficiency of 0.2 pj/b. Foveros technology has good compatibility with EMIB and can be used for high-density interconnection of the same system for more flexible interconnection [
32]. IF, NoC, and Foveros are all based on 3D electrical interconnection, and the preparation technology is relatively mature. The performance of the computing system is highly predictable. The computing system can obtain a high bandwidth and energy efficiency at a certain working frequency (The typical value is 1.15 GHz, as shown in
Table 1); however, with the increase in operating frequency, the parasitic resistor, capacitor, and inductor of TSV and RDL can degrade the signal integrity. In addition, Joule heat produced by TSV and RDL can reduce the system reliability; therefore, more optimized interconnect technologies are needed.
Figure 4. (
a) AMD processors design technology based on Chiplet. (Reprinted from [
9], Copyright 2020, with permission from IEEE); (
b) INTACT computing architecture based on Chiplet. (Reprinted from [
29], Copyright 2019, with permission from IEEE); (
c) Hybrid optical–electrical interconnection. (Reprinted from [
31], Copyright 2020, with permission from IEEE); (
d) POPSTAR interconnection architecture. (Reprinted from [
19], Copyright 2019, with permission from IEEE).
Table 1. Comparison of computing architectures based on Chiplet.
|
Intel [24] |
TSMC [22] |
AMD [9] |
CEA-Leti [30] |
Intel [25] |
Bologna [26] |
Product Name |
Agilex |
- |
Ryzen |
INTACT |
Lakefield |
Manticore |
Launched Time |
201904 |
201908 |
201908 |
202002 |
202006 |
202012 |
Chiplet Technology (nm) |
10 |
7 |
7 + 12 |
FDSOI 28 |
10 + 22 FFL |
GF 22 FDX |
Chiplet Number |
scalable |
2 |
>2 |
6 |
1 |
4 |
Number of cores/Chiplet |
Cortex-A53 |
4 Cortex-A72 |
64 (Server) 16 (Cilient) |
16 |
1 Core+ 4 Atom |
1024 RISC-V |
Area (mm2) |
- |
4.4 × 6.2 |
- |
4 × 5.6 |
- |
9 |
Bandwidth (Max) |
32 Gb/s |
320 GB/s |
~55 GB/s |
527 GB/s |
~34 GB/s |
1 TB/s |
Bandwidth density |
|
1.6 Tb/s/mm2 |
- |
3 Tbit/s/mm2 |
- |
- |
Frequency (GHz) |
1.5 |
4 |
~1 |
1.15 |
~1 |
1 |
Integrated type |
2.5D |
2.5D |
3D |
3D |
3D |
2.5D |
Interposer type |
Passive |
Passive |
N/A |
Active |
Active |
Yes |
Interconnect pitch (µm) |
55 |
40 |
- |
20 |
50 |
20 |
Delay |
~60 ps |
- |
<9 ns |
0.6 ns/mm |
- |
- |
Integration technology |
EMIB |
CoWoS |
|
F2F |
Foveros |
- |
Yield |
High |
High |
High |
High |
High |
High |
Scalability |
High |
|
High |
High |
|
- |
Configurability |
Good |
Yes |
Yes |
Yes |
alternative |
High efficiency/performance |
Reusability |
High |
High |
High |
High |
High |
High |
Testability |
|
|
Good |
Good |
Good |
|
Power efficiency |
- |
0.56 pJ/b |
2 pJ/b |
0.59 pj/b |
0.2 pJ/b |
50 Gdopflop/sW |
Application |
Data Center, Networking, Edge Computing |
HPC |
Server and Desktop Products |
Cloud Computing Accelerators |
Mobile, PC |
Data Center, Networking, Edge Computing. |
Fotouhi et al. [
33] proposed a 3D integration architecture that uses the hybrid Chiplet interconnect technology, as shown in
Figure 4c. Silicon bridge is used for a short distance electrical interconnect transceivers (TRXs) Chiplet, and an arrayed waveguide grating router (AWGR) is used for long interconnection in wavelength division multiplexing (WDM). The computing performance is improved by 23%, while the power is reduced by 30%. Narayan et al. [
34] designed an optical communication structure for data-parallel transmission between Chiplets by wavelength selection, which can save 38% energy with 1% performance degeneration, and peak bandwidth of 1750 Gb/s, as shown in
Figure 4d. AWGR in [
34] and interconnection technology in [
35] are based on silicon photonic technology, which can realize the selective routing of optical signals by adjusting wavelengths. The higher data bandwidth, smaller signal delay, less heat, and higher energy efficiency can be achieved compared with the electrical interconnection; however, silicon photonic communication requires a high-power laser source, which is difficult to be integrated on the chip. In addition, the performance of optical devices is greatly affected by the fluctuation of the process, so the reliability is lower than the electrical interconnection. Due to the difficulty of fabrication and integration of silicon photonic devices, optical interconnection technology cannot be widely used; however, the advantages of the technology will drive the development of the integration technology, and it will be more widely used in future computing systems.