Revisiting the High-Performance Reconfigurable Computing

Revisiting the High-Performance Reconfigurable Computing: Comparison

Please note this is a comparison between Version 2 by Camila Xu and Version 1 by Ali Kashif Bashir.

Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective.

FPGA virtualization
datacenters
network on chip

1. Introduction

Today, datacenters are equipped with the heterogeneous computing resources that range from Central Processing Units (CPUs), Graphical Processing Units (GPUs), Networks on Chip (NoCs) to Field Programmable Gate Arrays (FPGAs), each suited for a certain type of operation, as concluded by Escobar et al. in ^[1]. They all purvey the scalability and parallelism; hence, unfold new fronts for the existing body of knowledge in algorithmic optimization, computer architecture, micro-architecture, and platform-based design methods ^[2]. FPGAs are considered as a competitive computational resource for two reasons, added performance and lower power consumption. The cost of electrical power in datacenters is far-reaching, as it contributes roughly half of lifetime cost, as concluded in ^[3]. This factor alone motivates the companies to deploy FPGAs in datacenters, hence urging the scientific community to exploit High-Performance Reconfigurable Computing (HRC).

Industrial and academic works both incorporated the FPGAs to accelerate large-scale datacenter services; Microsoft’s Catapult is one such example ^[4]. Putnam et al. chose FPGA over GPU on the question of power demand. The flagship project accelerated Bing search engine by 95% as compared to a software-only solution, at the cost of 10% additional power.

The deployment of FPGAs in datacenters will neither be sustainable nor economical, without realizing the multi-tenancy feature of virtualization across multiple FPGAs. To achieve this ambitious goal, the scientific community needs to master two crafts, an interconnect solution preferably Network on Chip (NoC) as a communication architecture and an improved virtualization method with all the features of an operating system. Accumulating the state of the art in a survey can foster the development in this area and direct the researchers into more focused and challenging problems. Despite of the two excellent surveys, ^[5] in 2004 and ^[6] in 2018, former one categorized the FPGA virtualization as temporal partitioning, virtualized execution, and virtual machines, while, after fourteen years, the later one classified based on abstraction levels to accommodate the future changes, but the communication architecture or interconnect possibilities are not fully explored. To address this gap, an improved survey on FPGA virtualization is presented with the coverage of network-on-chip evaluation choices as a mean to explore the communication architecture, and commentary on nomenclature of existing body of knowledge. WThe researchers revisited the network-on-chip evaluation platforms in order to highlight its importance as compared to bus-based architectures. We stretched our reviewThe researchers stretched from acceleration of standalone FPGA to FPGAs connected as a computational resource in heterogeneous environment. WeThe researchers attempted to create a synergy through combining three domains to assist the designers to choose right communication architecture for the right virtualization technique and, finally, share the work in the right language, only then, multi-tenant FPGAs in datacenters can be realized.

2. Revisiting the Nomenclature

The applications of FPGAs as computing resource are diverse that includes data analytics, financial computing and cloud computing. This broad range of applications in different areas requires efficient applications and resource management. This lays the foundation for the need of virtualizing the FPGA as a potential resource. Nomenclature is much varying due to the different backgrounds of the researchers contributing to this area. There are many such examples in literature where similar concepts or architecture is described using a different name or term. There is also an abundance of jargon terms and acronyms, which confuse the researchers rather enhancing their understanding. Table 1 identifies and lists non-standard terms in literature from the last decade.

Table 1.

Non-Standard Nomenclature Present in Literature.

Year	Non-Standard Term(s) in Published Literature
2010	RAMPSoC in ^[7]
2011	Lightweight IP (LwIP) in ^[8]
2012	ASIF (Application Specific FPGA) in ^[9]
2013	sAES (FPGA based data protection system) in ^[10]
2014	PFC (FPGA cloud for privacy preserving computation in ^[11]
2015	CPU-Cache-FPGA in ^[12]
2016	HwAcc (Hardware accelerators), RIPaaS and RRaaS in ^[13]
2017	FPGA as a Service (FaaS) and Secure FaaS in ^[14]
2018	ACCLOUD (Accelerated CLOUD) in ^[15], FPGAVirt in ^[16]
2019	vFPGA-based CCMs (Custom Computing Machines) in ^[17]

This area is stagnated for a lack of a standard nomenclature. WThe recsearchers recommend that the scientific community should use a unified nomenclature to present the viewpoint in order to improve the clarity and precision of communication for advancing the knowledge base. WeThe researchers also recommend that this area must be referred as High-Performance Reconfigurable Computing (HRC) in literature. Moreover, it has been observed that the use of computer science language is more conveying as virtualization in FPGAs is comparable to an operating system in CPUs.

3. Revisiting the Network on Chip Evaluation Tools

Data transfers in most of the high-performance architectures are limited by memory hierarchy and communication architecture, as summarized in [19,20]^[18][19]. Exploiting communication architecture suggests the use of NoC, an effective replacement for buses or dedicated links in a system with large number of processing cores [21,22]^[20][21]. NoC is composed of several tunable parameters like network architecture, algorithm, network topology and flow control. No System on Chip (SoC) is outright without NoC, today, due to promised high communication bandwidth with low latency as compared to the alternate communication architectures. Researchers heavily rely on automated evaluation tools, where performance and power evaluation can be viewed early in design, given the complexity of NoC. Figure 1 describes a typical cycle of NoC evaluation, with FPGA being connected to a Central Processing Unit (CPU). Traffic scenarios are generated through traffic generator, sent to NoC that resides in FPGA, and the evaluation results are received through traffic receptors. Tools for FPGA based NoC prototyping are diverse architecture-wise. De Lima et al. in [23]^[22] identified an architectural model comprising of three layers: network, traffic, and management.

Figure 1. Generic Architecture of Networks on Chip (NoC) Evaluation on Field Programmable Gate Arrays (FPGA)(s).

Generic Architecture of Networks on Chip (NoC) Evaluation on Field Programmable Gate Arrays (FPGA)(s).

There are four different types of network: Direct Mapping on Single or Multi FPGA(s), Fast Prototyping and Virtualization. The choice of the network affects the accuracy and resource utilization. Traffic on network can be generated in two different ways: synthetic and application-specific. Synthetic traffic is a kind of load testing to evaluate the overall performance, but it fails to forecast the performance under real traffic flow. Application-specific traffic, on the other hand, is based on the behavior of real traffic flow that is difficult to acquire but gives more accurate results. These patterns can be acquired either through trace, statistical method or executing application cores. As traces comprises of millions of packets so the size becomes a limiting factor. Running application cores to generate traffic is also resource-expensive method. Table 2 lists some FPGA based NoC evaluation tools, describing every architecture with network type, traffic type, number of routers, target board, and execution frequency, while hiding the complexity of NoC designs. The number of routers in NoC depends on the network type, architecture with relatively more routers, are based on second group type of network, fast prototyping and virtualization. WThe researchers have used the direct mapping network type in ourthe previous works due to relatively high execution frequency [24,25]^[23][24].

Table 2.

NoC Evaluation Tools based on FPGA(s)

Year	Network Type	Traffic Type	No. of Routers	Target FPGA	Freq. (M.Hz.)	Work
2010	Multiple FPGA	Real: App. Cores	16	Virtex 5 × 5	-	[26]	^[25]
2010	Direct Mapping	Real: Traces based	64	Virtex 2	45	[27]	^[

[42]^[41] is suitable for the designers working on Xilinx tools. Maxeler Technologies, however, offers specific solutions, like an algorithmic contribution for memory mapping [43]^[42] and an area optimization technique [44]^[43]. Virtualization plays a relatable role to an operating system in a computer, but the term is being used in different meanings in this area, due to non-uniform nomenclature discussed earlier. Yet, the universal concept of an abstraction layer remains unchanged, a layer for the user to hide the underlying complexity of the computing machine, where the computing machine is not a traditional one, but FPGA. Many virtualization architectures have been proposed as per the requirements of the diverse applications. In 2004, a survey in this regard categorized the virtualization architectures into three broad categories, temporal partitioning, virtualized execution, and overlays ^[5]. Since then, no serious effort has been recorded on the classification of virtualization, until Vaishnav et al. ^[6] in 2018 classified the virtualization architectures based on abstraction levels. This much-needed classification contributed by Vaishnav et al. has been adopted as is, to discuss the works in this survey. WThe researchers reiterated them with some of the representative work examples in Table 3. The works have been discussed under the same abstract classification.

Table 3.

Classification of FPGA Virtualization adopted from ^[6].

Abstract Classification	Sub-Class	Work Examples References
Resource Level	Overlays	[45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74]	^[44]^[45]^[46]^[47]^[48]^[49]^[50]^[51]^[52
Resource Level	²⁶	^]
^]	^[	⁵³^]^[⁵⁴^]^[⁵⁵^]^[⁵⁶^]^[⁵⁷^]^[58]^[59]^[60]^[61]^[62]^[63]^[64]^[65]^[66]^[67]^[68]^[69]^[70]^[71]	2010	Fast Prototyping	Synthetic	49	Virtex 6	50	[28]	^[27]
^[	⁷²	^]^[⁷³^]
Input Output (I/O) Virtualization	[4,75,	^[	76,77,	⁷⁴^]^[	78,	⁷⁵^]^[	79,80]	^[4]⁷⁶^]^[^77]^[78]^[79]	2011	Direct Mapping	Real: Traces based	25	Virtex 5	-	[29]	^[28]
2011	Virtualization	Real: Traces based	256	Virtex 6
Node Level	Virtual Machine Monitors	[81,82,83,84]	^[80]^[81]^[82]^[83]
	Shells	[4,41,75,76,77,78,85,86,87,88,89,93,94,95,96,97]	^[4]^[40]^[74]^[75]^[76]^[77]^[84]^[85	90,	^]^[	91,	⁸⁶	92,	^]^[⁸⁷^]^[⁸⁸^]^[⁸⁹^]^[⁹⁰^]^[91]^[92]^[93]^[94]^[95]^[96]
	Scheduling	[82,	152	89,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112]	^[81]^[88]^[97]^[98]^[99]^[100	[	30]	^[29]
^]	^[¹⁰¹^]^[¹⁰²^]^[¹⁰³^]^[104]^[105]^[106]^[107]^[108]^[109]^[110]^[111]	2011	Fast Prototyping	Synthetic	576	Virtex 6	300	[31]	^[30]
Multi Node Level	Custom Clusters	[113,114,115,116,117]	^[112]^[113]^[114]^[115]^[116]	2011	Virtualization	Real: App. Cores	64	Virtex 2
	Frameworks	[77,103	-	,118,119,120,121,122,123]	^[76]^[102]^[117]^[118]^[119]^[120]^[121]^[122]	[	32]	^[31]
	2011	Virtualization	Real: App. Cores
Cloud Services	[4,40,124,125,	16	]	Virtex 5	3, 15	[33]	^[32]
2011	Direct Mapping	Synthetic	36	Virtex 5	-	[34]	^[33]
2012	Direct Mapping	Real: App. Cores	9	Virtex 5	-	[35]	^[34]
2013	Multiple FPGA	Synthetic	18	Virtex 5 × 2	-	[36]	^[35]
2014	Virtualization	Synthetic	1024	Virtex 7	42	[37]	^[36]
^[	^4]	126,	^[^39]	127,	^[	128,129,	¹²³^]^[	2015	Direct Mapping	Synthetic	64	Virtex 6	50	[38]	^[37]
¹²⁴	^]^[¹²⁵^]^[^126]^[127]^[128]	130	^[¹²⁹^]	2016	Direct Mapping	Synthetic	16	Virtex 6	250	^[13]
2017	Direct Mapping	Synthetic	16	Virtex 6	250	[24]	^[23]
2018	Direct Mapping	Synthetic	16	Virtex 6	250	[25]	^[24]

These evaluation platforms assist the designers to reach the design-specific communication architecture, meeting most of the requirement specifications, for a certain application. These evaluation platforms take comparatively more time to synthesize the change, while on the other hand, a simulator can accommodate the same change in much lesser time. Designers offer dynamic reconfiguration, as a peroration to this limitation, but simulators are still the first choice of many entry-level researchers. However, the choice of NoC to realize the future datacenters with multi-tenant multi-FPGAs is yet to explore. The linking of several computational nodes becomes complicated and affects the performance of the overall system. Although NoC is not the only choice for communication within an FPGA as well as among multiple FPGAs but offer a competitive and promising solution. Other solutions include traditional bus, bus combined with a soft shell, different types of soft NoC and hard NoC. Many comparative studies evaluated these choices based on parameters like useable bandwidth, area consumption, latency, wire requirement and routing congestion. The way NoC is generated, also affects the performance so designers must be careful while choosing the NoC or an alternate for their design.

4. Revisiting the FPGA Virtualization

Resources are time multiplexed in a cloud services provider datacenter, referred as Infrastructure as a Service (IaaS). The sharing of resources is achieved through virtualization, an abstraction layer for hiding the physical resources from users. The process of virtualization raises issues like ease-of-use, privacy and performance but yet IaaS provide individual users and small organizations with an economic choice of renting over spending on infrastructure. Other than an academic example, such as SAVI testbed [39]^[38], industry offers plenty of solutions that are equally popular among designers. Amazon Web Services EC2 [40]^[39], IBM Zurich [41]^[40], and Intel are important competitors. Alveo on the Nimbix Cloud

Although there are many features of virtualization like management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, but the most important feature in the context of scope of this research is the multi-tenancy because it is essential for a sustainable and economically viable deployment in datacenters. FPGA has two types of fabric: reconfigurable and non-reconfigurable. The virtualization for the non-reconfigurable fabric is the same as of CPU, but there are several variations when it comes to the virtualization of the reconfigurable fabric.

References

Escobar, F.A.; Chang, X.; Valderrama, C. Suitability analysis of FPGAs for heterogeneous platforms in HPC. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 600–612.
De Bernardinisa, L.P.; Pinelloa, C.; Sgroia, A.L. Platform-based design for embedded systems. In Embedded Systems Handbook, 1st ed.; CRC Press: San Francisco, CA, USA, 2005.
Inta, R.; Bowman, D.J.; Scott, S.M. The chimera: An off-theshelf CPU/GPGPU/FPGA hybrid computing platform. Int. J. Reconfigurable Comput. 2012, 2012, 241439.
Putnam, A.; Caulfield, A.M.; Chung, E.S.; Chiou, D.; Constantinides, K.; Demme, J.; Esmaeilzadeh, H.; Fowers, J.; Gopal, G.P.; Gray, J.; et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), Minneapolis, MN, USA, 14–18 June 2014.
Plessl, C.; Platzner, M. Virtualization of hardware-introduction and survey. In ERSA; CSREA Press: Las Vegas, NV, USA, 2004.
Vaishnav, A.; Pham, K.D.; Koch, D. A survey on FPGA virtualization. In Proceedings of the 2018 28th International Conference on Field Programmable Logic and Applications (FPL), Dublin, Ireland, 27–31 August 2018.
Göhringer, D.; Hübner, M.; Hugot-Derville, L.; Becker, J. Message passing interface support for the runtime adaptive multi-processor system-on-Chip RAMPSoC. In Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, Samos, Greece, 19–22 July 2010; pp. 357–364.
Shen, X.; Wang, X.; Zhu, Y.; Huang, T.; Kong, X. Implementing dynamic web page interactions with a Java processor core on FPGA. In Proceedings of the Engineering and Industries (ICEI), 2nd International Conference on IEEE, Jeju, Korea, 29 November–1 December 2011.
Farooq, U.; Parveza, H.; Mehrez, H.; Marrakchi, Z. A new heterogeneous tree-based application specific FPGA and its comparison with mesh-based application specific FPGA. Microprocess. Microsyst. 2012, 36, 588–605.
Chen, Y.; Wang, Y.; Ha, Y.; Felipe, M.R.; Ren, S.; Aung, K.M.M. sAES: A high throughput and low latency secure cloud storage with pipelined DMA based PCIe interface. In Proceedings of the 2013 International Conference on Field-Programmable Technology (FPT), Kyoto, Japan, 9–11 December 2013; pp. 374–377.
Xu, L.; Shi, W.; Suh, T. PFC: Privacy preserving FPGA cloud—A case study of mapreduce. In Proceedings of the IEEE International Conference on Cloud Computing, Anchorage, AK, USA, 27 June–2 July 2014; pp. 280–287.
Yang, H.; Yan, X. Memory coherency based CPU-Cache-FPGA acceleration architecture for cloud computing. In Proceedings of the Information Science and Control Engineering (ICISCE), 2nd International Conference, Shanghai, China, 24–26 April 2015; pp. 304–307.
Kidane, H.L.; Bourennane, E.B.; Ochoa-Ruiz, G. Noc based virtualized accelerators for cloud computing. In Proceedings of the IEEE 2016, 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC), Lyon, France, 21–23 September 2016; pp. 133–137.
Will, M.A.; Ko, R.K. Secure FPGA as a service—Towards secure data processing by physicalizing the cloud. In Proceedings of the IEEE Trustcom/BigDataSE/ICESS 2017, Sydney, NSW, Australia, 1–4 August 2017; pp. 449–455.
Yazar, A.; Erol, A.; Schmidt, E.G. ACCLOUD (Accelerated CLOUD): A novel FPGA-Accelerated cloud architecture. In Proceedings of the 26th Signal Processing and Communications Applications (SIU), Izmir, Turkey, 2–5 May 2018; IEEE: Piscataway, NJ, USA, 2018.
Mbongue, J.; Hategekimana, F.; Tchuinkou Kwadjo, D.; Andrews, D.; Bobda, C. FPGAVirt: A Novel Virtualization Framework for FPGAs in the Cloud. In Proceedings of the IEEE 11th International Conference on Cloud Computing, San Francisco, CA, USA, 2–7 July 2018.
Al-Aghbari, A.A.; Elrabaa, M.E.S. Cloud-based FPGA custom computing machines for streaming applications. IEEE Access 2019, 7, 38009–38019.
Bittner, R.; Ruf, E.; Forin, A. Direct GPU/FPGA communication via PCI express. Cluster Comput. 2013, 17, 339–348.
Mueller, R.; Teubner, J.; Alonso, G. Streams on wires: A query compiler for FPGAS. Proc. VLDB Endow. 2009, 2, 229–240.
Dally, W.J.; Towles, B. Route packets, not wires: On-chip interconnection networks. In Proceedings of the Design Automation Conference, Las Vegas, NV, USA, 18–22 June 2001; pp. 684–689.
Yazdanshenas, S.; Betz, V. Interconnect solutions for virtualized field-programmable gate arrays. IEEE Access 2018, 6, 10497–10507.
de Lima, O.A.; Costa, W.N.; Fresse, V.; Rousseau, F. A survey of NoC evaluation platforms on FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT), Xi’an, China, 7–9 December 2016.
Kidane, H.L.; Bourennane, E.B.; Ochoa-Ruiz, G. Run-time scalable noc for fpga based virtualized ips. In Proceedings of the IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Seoul, Korea, 18–20 September 2017; pp. 91–97.
Kidane, H.L.; Bourennane, E.B. MARTE and IP-XACT based approach for run-time scalable NoC. In Proceedings of the IEEE 12th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), Hanoi, Vietnam, 12–14 September 2018; pp. 162–167.
Liu, Y.; Liu, P.; Jiang, Y.; Yang, M.; Wu, K.; Wang, W.; Yao, Q. Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification. Int. J. Electron. 2010, 97, 1241–1262.
Krishnaiah, G.; Silpa, B.V.; Panda, P.R.; Kumar, A. Fastfwd: An efficient hardware acceleration technique for trace-driven network-on-chip simulation. In Proceedings of the eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, Scottsdale, AZ, USA, 24–29 October 2010; ACM: New York, NY, USA, 2010; pp. 247–256.
Wang, C.; Hu, W.H.; Lee, S.E.; Bagherzadeh, N. Area and power-efficient innovative network on-chip architecture. In Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP), Pisa, Italy, 17–19 February 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 533–539.
Lotlikar, S.; Pai, V.; Gratz, P. AcENoCs: A configurable HW/SW platform for FPGA accelerated NoC emulation. In Proceedings of the 24th Internatioal Conference on VLSI Design (VLSI Design), Chennai, India, 2–7 January 2011; pp. 147–152.
Papamichael, M.K. Fast scalable FPGA-based network-on-chip simulation models. In Proceedings of the 2011 9th IEEE/ACM Internatioal Conference on Formal Methods and Models for Codesign (MEMOCODE), Cambridge, UK, 11–13 July 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 77–82.
Papamichael, M.K.; Hoe, J.C.; Mutlu, O. Fist: A fast, lightweight, FPGA-friendly packet latency estimator for noc modeling in full-system simulations. In Proceedings of the 2011 Fifth IEEE/ACM International Symposium on Networks on Chip (NoCS), Pittsburgh, PA, USA, 1–4 May 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 137–144.
Ku, W.-C.; Chen, T.-F. Accelerating manycore simulation by efficient noc interconnection partition on FPGA simulation platform. In Proceedings of the 2011 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, 25–28 April 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1–4.
Pellauer, M.; Adler, M.; Kinsy, M.; Parashar, A.; Emer, J. Hasim: FPGA-based high-detail multicore simulation using time-division multiplexing. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), San Antonio, TX, USA, 12–16 February 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 406–417.
Tan, J.; Fresse, V.; Rousseau, F. Generation of emulation platforms for NoC exploration on FPGA. In Proceedings of the 22nd IEEE International Symposium on Rapid System Prototyping (RSP), Karlsruhe, Germany, 24–27 May 2011; pp. 186–192.
Heck, G.; Guazzelli, R.; Moraes, F.; Calazans, N.; Soares, R. HardNoC: A platform to validate networks on chip through FPGA prototyping. In Proceedings of the VIII Southern Conference on Programmable Logic (SPL), Bento Goncalves, Spain, 20–23 March 2012; pp. 1–6.
Fresse, V.; Ge, Z.; Tan, J.; Rousseau, F. Case study: Deployment of the 2d noc on 3d for the generation of large emulation platforms. In Proceedings of the 2012 23rd IEEE International Symposium on Rapid System Prototyping (RSP), Tampere, Finland, 11–12 October 2012; pp. 23–29.
van Chu, T.; Sato, S.; Kise, K. Knocemu: High speed fpga emulator for kilo-node scale nocs. Embedded Multicore/Manycore SoCs (MCSoc). In Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, Aizu-Wakamatsu, Japan, 23–25 September 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 215–222.
de Lima, O.A.; Fresse, V.; Rousseau, F. Evaluation of snmp-like protocol to manage a NOC emulation platform. In Proceedings of the 2014 International Conference on Field-Programmable Technology (FPT), Shanghai, China, 10–12 December2014; pp. 199–206.
Kang, J.M.; Bannazadeh, H.; Leon-Garcia, A. Savi testbed: Control and management of converged virtual ict resources. In Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, Ghent, Belgium, 27–31 May 2013.
Amazon Web Services EC2. FPGA Hardware and Software Development Kit. Available online: https://github.com/aws/aws-fpga (accessed on 2 December 2019).
Weerasinghe, J.; Abel, F.; Hagleitner, C.; Herkersdorf, A. Enabling FPGAs in hyperscale data centers. In Proceedings of the 15th IEEE UIC-ATC-ScalCom, Beijing, China, 10–14 August 2015.
Alveo Nimbix Cloud. Available online: https://www.nimbix.net/alveotrial (accessed on 23 March 2020).
Voss, N.; Quintana, P.; Mencer, O.; Luk, W.; Gaydadjiev, G. Memory mapping for multi-die FPGAs. In Proceedings of the IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 28 April–1 May 2019.
Voss, N.; Girdlestone, S.; Becker, T.; Mencer, O.; Luk, W.; Gaydadjiev, G. Low area overhead custom buffering for FFT. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig), Cancun, Mexico, 9–11 December 2019.
Li, X.; Maskell, D.L. Time-multiplexed FPGA overlay architectures: A survey. ACM Trans. Des. Autom. Electron. Syst. 2019, 24, 54.
Jain, A.K. Architecture Centric Coarse-Grained FPGA Overlays. Ph.D. Thesis, Nanyang Technological University, Singapore, 2017.
Cheah, H.Y.; Fahmy, S.A.; Maskell, D.L. iDEA: A DSP block-based FPGA Soft Processor. In Proceedings of the 2012 International Conference on Field-Programmable Technology (FPT), Seoul, Korea, 10–12 December 2012.
Xilinx MicroBlaze Soft Processor Core. Available online: https://www.xilinx.com/products/design-tools/microblaze.html (accessed on 3 December 2019).
Altera Nios II Processor. Available online: https://www.altera.com/products/processors/overview.html (accessed on 4 December 2019).
Severance, A.; Lemieux, G.G.F. Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, Montreal, QC, Canada, 29 September–4 October 2013.
Severance, A.; Lemieux, G. VENICE: A compact vector processor for FPGA Applications. In Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), Stanford, CA, USA, 17–19 August 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1–5.
Chou, C.H.; Severance, A.; Brant, A.D.; Liu, Z.; Sant, S.; Lemieux, G.G. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays 2011, Monterey, CA, USA, 27 February–1 March 2011.
Yiannacouras, P.; Steffan, J.G.; Rose, J. VESPA: Portable, scalable, and flexible FPGA-based vector processors. In Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, Atlanta, GA, USA, 19–24 October 2008.
Yu, J.; Lemieux, G.; Eagleston, C. Vector Processing as a Soft-core CPU Accelerator. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays, Monterey, CA, USA, 24–26 February 2008.
Cong, J.; Huang, H.; Ma, C.; Xiao, B.; Zhou, P. A fully pipelined and dynamically composable architecture of CGRA. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, Boston, MA, USA, 11–13 May 2014.
Shukla, S.; Bergmann, N.W.; Becker, J. QUKU: A FPGA based flexible coarse grain architecture design paradigm using process networks. In Proceedings of the 2007 IEEE International Parallel and Distributed Processing Symposium, Rome, Italy, 26–30 March 2007.
Landy, A.; Stitt, G. A low-overhead interconnect architecture for virtual reconfigurable fabrics. In Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Tampere, Finland, 7–12 October 2012.
Coole, J.; Stitt, G. Fast, flexible high-level synthesis from OpenCL using reconfiguration contexts. IEEE Micro 2014, 34, 42–53.
Govindaraju, V.; Ho, C.H.; Nowatzki, T.; Chhugani, J.; Satish, N.; Sankaralingam, K.; Kim, C. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 2012, 32, 38–51.
So, H.K.-H.; Liu, C. FPGA Overlays. In FPGAs for Software Programmers; Springer: Cham, Germany, 2016.
Brant, A.D. Coarse and Fine Grain Programmable Overlay Architectures for FPGAs. MSc Thesis, University of British Columbia, Vancouver, BC, Canada, 2012.
Rashid, R.; Steffan, J.G.; Betz, V. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the 2014 International Conference on Field-Programmable Technology (FPT), Shanghai, China, 10–12 December 2014.
Paul, K.; Dash, C.; Moghaddam, M.S. reMORPH: A runtime reconfigurable architecture. In Proceedings of the 2012 15th Euromicro Conference on Digital System Design, Cesme, Izmir, Turkey, 5–8 September 2012.
Kapre, N.; Mehta, N.; Delorimier, M.; Rubin, R.; Barnor, H.; Wilson, M.J.; Wrighton, M.; DeHon, A. Packet switched vs. time multiplexed FPGA overlay networks. In Proceedings of the 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, CA, USA, 24–26 April 2006.
Papamichael, M.K.; Hoe, J.C. CONNECT: Re-examining conventional wisdom for designing nocs in the context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays 2012, Monterey, CA, USA, 22–24 February 2012.
Huan, Y.; DeHon, A. FPGA optimized packet-switched NoC using split and merge primitives. In Proceedings of the 2012 International Conference on Field-Programmable Technology, Seoul, Korea, 10–12 December 2012.
Kapre, N.; Gray, J. Hoplite: Building austere overlay NoCs for FPGAs. In Proceedings of the 2015 25th International Conference on Field Programmable Logic and Applications (FPL), London, UK, 2–4 September 2015.
Brant, A.; Lemieux, G.G.F. ZUMA: An open FPGA overlay architecture. In Proceedings of the 2012 IEEE 20th international symposium on field-programmable custom computing machines, Toronto, ON, Canada, 29 April–1 May 2012.
Ferreira, R.; Vendramini, J.G.; Mucida, L.; Pereira, M.M.; Carro, L. An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Taipei, Taiwan, 9–14 October 2011.
Kinsy, M.A.; Pellauer, M.; Devadas, S. Heracles: Fully synthesizable parameterized mips based multicore system. In Proceedings of the 21st International Conference on Field Programmable Logic and Applications, Chania, Greece, 5–7 September 2011.
Liu, C.; Yu, C.L.; So, H.K. A soft coarse-grained reconfigurable array based high-level synthesis methodology: Promoting design productivity and exploring extreme FPGA frequency. In Proceedings of the IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, Seattle, WA, USA, 28–30 April 2013.
Gray, J. GRVI-phalanx: A massively parallel RISC-V FPGA accelerator. In Proceedings of the IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines, Washington, DC, USA, 1–3 May 2016.
Li, X.; Jain, A.; Maskell, D.; Fahmy, S.A. An area-efficient FPGA overlay using DSP block based time-multiplexed functional units. In Proceedings of the 2nd International Workshop on Overlay Architectures for FPGAs, Monterey, CA, USA, 21–23 February 2016.
Kumar, H.B.C.; Ravi, P.; Modi, G.; Kapre, N. 120-core microAptiv MIPS overlay for the Terasic DE5-NET FPGA board. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2017.
Byma, S.; Steffan, J.G.; Bannazadeh, H.; Garcia, A.L.; Chow, P. FPGAs in the cloud: Booting virtualized hardware accelerators with OpenStack. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, Boston, MA, USA, 11–13 May 2014.
Fahmy, S.A.; Vipin, K.; Shreejith, S. Virtualized FPGA accelerators for efficient cloud computing. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom), Vancouver, BC, Canada, 30 November–3 December 2015.
Tarafdar, N.; Lin, T.; Fukuda, E.; Bannazadeh, H.; Leon-Garcia, A.; Chow, P. Enabling flexible network FPGA clusters in a heterogeneous cloud data center. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’17), Monterey, CA, USA, 22–24 February 2017; ACM: New York, NY, USA, 2017.
Knodel, O.; Genssler, P.R.; Spallek, R.G. Virtualizing reconfigurable hardware to provide scalability in cloud architectures. In Proceedings of the Tenth International Conference on Advances in Circuits, Electronics and Micro-electronics (CENICS 2017), Rome, Italy, 10–14 September 2017; IARIA: Wilmington, DE, USA, 2017.
Zhang, J.; Xiong, Y.; Xu, N.; Shu, R.; Li, B.; Cheng, P.; Chen, G.; Moscibroda, T. The feniks FPGA operating system for cloud computing. In Proceedings of the 8th Asia-Pacific Workshop on Systems, Mumbai, India, 2 September 2017.
Abbani, N.; Ali, A.; Doa’A, A.O.; Jomaa, M.; Sharafeddine, M.; Artail, H.; Akkary, H.; Saghir, M.A.; Awad, M.; Hajj, H. A distributed reconfigurable active SSD platform for data intensive applications. In Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications, Banff, AB, Canada, 2–4 September 2011.
Wang, W. pvFPGA: Accessing an FPGA-based hardware accelerator in a paravirtualized environment. In Proceedings of the 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS), Montreal, QC, Canada, 29 September–4 October 2013; IEEE: Piscataway, NJ, USA, 2013.
Xia, T.; Prévotet, J.C.; Nouvel, F. Hypervisor mechanisms to manage FPGA reconfigurable accelerators. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi’an, China, 7–9 December 2016.
Jain, A.K.; Pham, K.D.; Cui, J.; Fahmy, S.A.; Maskell, D.L. Virtualized execution and management of hardware tasks on a hybrid ARM-FPGA platform. J. Signal Process. Syst. 2014, 77, 61–76.
Chen, F.; Shan, Y.; Zhang, Y.; Wang, Y.; Franke, H.; Chang, X.; Wang, K. Enabling FPGAs in the cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers, Cagliari, Italy, 20–22 May 2014.
Bobda, C.; Majer, A.; Ahmadinia, A.; Haller, T.; Linarth, A.; Teich, J. The erlangen slot machine: Increasing flexibility in FPGA-based reconfigurable platforms. In Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology, Singapore, 11–14 December 2005.
Zhao, Q. Enabling FPGA-as-a-service in the cloud with hCODE platform. IEICE Trans. Inf. Syst. 2018, 101, 335–343.
Weerasinghe, J.; Polig, R.; Abel, F.; Hagleitner, C. Network-attached FPGAs for data center applications. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi’an, China, 7–9 December 2016.
Tarafdar, N.; Eskandari, N.; Lin, T.; Chow, P. Designing for FPGAs in the Cloud. IEEE Des. Test 2017, 35, 23–29.
Asiatici, M.; George, N.; Vipin, K.; Fahmy, S.A.; Ienne, P. Virtualized execution runtime for FPGA accelerators in the cloud. IEEE Access 2017, 5, 1900–1910.
Pahl, C. Containerization and the PaaS cloud. IEEE Cloud Comput. 2015, 2, 24–31.
Kirchgessner, R.; Stitt, G.; George, A.; Lam, H. VirtualRC: A virtual FPGA platform for applications and tools portability. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12), ACM, Monterey, CA, USA, 22–24 February 2012; pp. 205–208.
Asghari, M.; Rajabzadeh, A.; Dashtbani, M. HFIaaS: A proposed FPGA infrastructure as a service framework using high-level synthesis. In Proceedings of the 6th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 20–21 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 72–77.
Najem, M.; Bollengier, T.; le Lann, J.C.; Lagadec, L. Extended overlay architectures for heterogeneous FPGA cluster management. J. Syst. Archit. 2017, 78, 1–14.
Eskandari, N.; Tarafdar, N.; Ly-Ma, D.; Chow, P. A modular heterogeneous stack for deploying FPGAs and CPUs in the data center. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA, 24–26 February 2019; ACM: New York, NY, USA, 2019.
Vaishnav, A.; Pham, K.D.; Manev, K.; Koch, D. The FOS (FPGA Operating System) Demo. In Proceedings of the 29th International Conference on Field Programmable Logic and Application (FPL), Barcelona, Spain, 8–12 September 2019.
Intel OPAE. Available online: http://01.org/OPAE (accessed on 24 March 2020).
Yazdanshenas, S.; Betz, V. Quantifying and mitigating the costs of FPGA virtualization. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 4–8 September 2017.
Happe, M.; Traber, A.; Keller, A. Preemptive hardware multitasking in ReconOS. In Applied Reconfigurable Computing; Springer: Cham, Switzerland, 2015.
Rupnow, K.; Fu, W.; Compton, K. Block, drop or roll(back): Alternative preemption methods for RH multi-tasking. In Proceedings of the 17th IEEE Symposium on Field Programmable Custom Computing Machines, Napa, CA, USA, 5–7 April 2009.
Bourge, A.; Muller, O.; Rousseau, F. Generating efficient context-switch capable circuits through autonomous design flow. ACM TRETS 2016, 10, 1–23.
Lubbers, E.; Platzner, M. ReconOS: An RTOS supporting hard-and software threads. In Proceedings of the 2007 International Conference on Field Programmable Logic and Applications, Amsterdam, Netherlands, 27–29 August 2007.
Peck, W.; Anderson, E.; Agron, J.; Stevens, J.; Baijot, F.; Andrews, D. Hthreads: A computational model for reconfigurable devices. In Proceedings of the 2006 International Conference on Field Programmable Logic and Applications, Madrid, Spain, 28–30 August 2006.
Shan, Y.; Wang, B.; Yan, J.; Wang, Y.; Xu, N.; Yang, H. FPMR: MapReduce framework on FPGA. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays 2010, Monterey, CA, USA, 21–23 February 2010.
Chen, Y.T.; Cong, J.; Fang, Z.; Lei, J.; Wei, P. When spark meets FPGAs: A case study for next generation DNA sequencing acceleration. In Proceedings of the 24th FCCM, Washington, DC, USA, 1–3 May 2016.
Mavridis, S.; Pavlidakis, M.; Stamoulias, I.; Kozanitis, C.; Chrysos, N.; Kachris, C.; Soudris, D.; Bilas, A. VineTalk: Simplifying software access and sharing of FPGAs in datacenters. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 4–8 September 2017.
Eguro, K. SIRC: An extensible reconfigurable computing communication API. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM, Charlotte, NC, USA, 2–4 May 2010.
Intel FPGA. SDK for OpenCL. Programming Guide. UG-OCL002. 2016. Available online: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/archives/ug-aocl-programming-guide-16.1.pdf (accessed on 23 March 2020).
Segal, O.; Colangelo, P.; Nasiri, N.; Qian, Z.; Margala, M. SparkCL: A unified programming framework for accelerators on heterogeneous clusters. arXiv 2015, arXiv:1505.01120.
Kulp, J.; Siegel, S.; Miller, J. Open Component Portability Infrastructure (OPENCPI); Technial Report; Mercury Federal Systems Inc.: Arlington, VA, USA, 2013.
Intel HLS Compiler: Fast Design, Coding and Hardware. Available online: https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html (accessed on 24 March 2020).
Vivado High-Level Synthesis: Accelerates IP Creation by Enabling C, C++ and System C Specifications. Available online: https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html (accessed on 24 March 2020).
Open Spatial Programming Language (OpenSPL), Maxeler Technologies. Available online: https://www.maxeler.com/openspl-announced/ (accessed on 24 March 2020).
Pell, O.; Mencer, O.; Tsoi, K.H.; Luk, W. Maximum performance computing with dataflow engines. In Computing in Science & Engineering; IEEE: Piscataway, NJ, USA, 2012; pp. 98–103.
Fleming, K.; Adler, M. The LEAP FPGA Operating System. In FPGAs for Software Programmers; Springer: Cham, Switzerland, 2016.
Vesper, M.; Koch, D.; Vipin, K.; Fahmy, S.A. JetStream: An open-source high-performance PCI express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication. In Proceedings of the 2016 26th international conference on field programmable logic and applications (FPL), Lausanne, Switzerland, 29 August–2 September 2016.
Jacobsen, M.; Richmond, D.; Hogains, M.; Kastner, R. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM TRETS 2015, 8, 1–23.
Yoshimi, M.; Nishikawa, Y.; Miki, M.; Hiroyasu, T.; Amano, H.; Mencer, O. A performance evaluation of CUBE: One Dimensional 512 FPGA cluster. In Proceedings of the International Symposium on Applied Reconfigurable Computing, ARC 2010, Bangkok, Thailand, 17–19 March 2010; Springer: Heidelberg, Germany, 2010.
Wang, Z.; Zhang, S.; He, B.; Zhang, W. Melia: A MapReduce framework on OpenCL-Based FPGAs. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 3547–3560.
Yeung, J.H.C. Map-reduce as a programming model for custom computing machines. In Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines, Palo Alto, CA, USA, 14–15 April 2008.
Tsoi, K.H.; Luk, W. Axel: A Heterogeneous Cluster with FPGAs and GPUs. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays 2010, Monterey, CA, USA, 21–23 February 2010.
Iordache, A.; Pierre, G.; Sanders, P.; de FCoutinho, J.G.; Stillwell, M. High performance in the cloud with FPGA groups. In Proceedings of the 9th International Conference on Utility and Cloud Computing, Shanghai, China, 6–9 December 2016.
Huang, M. Programming and runtime support to blaze FPGA accelerator deployment at datacenter scale. In Proceedings of the Seventh ACM Symposium on Cloud Computing, SoCC ’16, Santa Clara, CA, USA, 5–7 October 2016.
Theodoropoulos, D.; Alachiotis, N.; Pnevmatikatos, D. Multi-FPGA evaluation platform for disaggregated computing. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA, 30 April–2 May 2017.
Ouyang, J.; Lin, S.; Qi, W.; Wang, Y.; Yu, B.; Jiang, S. SDA: Software-defined accelerator for large-scale DNN systems. In Proceedings of the IEEE Hot Chips 26 Symposium (HCS), Cupertino, CA, USA, 10–12 August 2014.
El-Araby, E.; Gonzalez, I.; El-Ghazawi, T. Virtualizing and sharing reconfigurable resources in high-performance reconfigurable computing systems. In Proceedings of the 2008 Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications, Austin, TX, USA, 17 November 2008.
Zheng, Z.; Wang, T.; Weng, J.; Mumtaz, S.; Bashir, A.K.; Hussain, C.S. Differentially private high-dimensional data publication in internet of things. IEEE Internet Things J. 2019.
Bashir, A.K.; Ohsita, Y.; Murata, M. A distributed virtual data center network architecture for the future internet. IEICE Tech. Rep. (IN2014–165) 2015, 114, 261–266.
Bashir, A.K.; Ohsita, Y.; Murata, M. Abstraction layer based virtual data center architecture for network function chaining. In Proceedings of the International Conference on Distributed Computing Systems Workshops (ICDCSW)—ICDCS, Nara, Japan, 27–30 June 2016.
Bashir, A.K.; Ohsita, Y.; Murata, M. Abstraction layer based distributed architecture for virtualized data centers. In Proceedings of the Cloud Computing 2015: 6th International Conference on Cloud Computing, Grids, and Virtualization, Nice, France, 22–27 March 2015; pp. 46–51.
Flynn, M.J. Flynn’s taxonomy. Encycl. Parallel Comput. 2011, 689–697.