Performance Comparison of Lattice Boltzmann Fluid Flow Simulation

Performance Comparison of Lattice Boltzmann Fluid Flow Simulation: Comparison

Please note this is a comparison between Version 2 by Catherine Yang and Version 1 by Predrag M. Tekic.

This paper presents performance comparison, ofthe lid-driven cavity flow simulation, with LatticeBoltzmann method, example, between CUDA and OpenCLparallel programming frameworks. CUDA is parallelprogramming model developed by NVIDIA for leveragingcomputing capabilities of their products. OpenCL is anopen, royalty free, standard developed by Khronos groupfor parallel programming of heterogeneous devices (CPU’s,GPU’s, ... ) from different vendors. OpenCL promisesportability of the developed code between heterogeneousdevices, but portability has performance penalty. Weinvestigate performance downside of portable OpenCL codecomparing to similar CUDA code run on the NVIDIAgraphic cards. Lid-driven cavity flow benchmark code, forboth examples, has been written in Java programminglanguage, and uses open source libraries to communicatewith OpenCL and CUDA.

CUDA
OpenCL
Lattice Boltzmann,
Java
GPU

NTRODUCTION

1. Introduction

In recent years multi-core and many-core processors arereplacing single core processors, especially GraphicsProcessing Units (GPUs) have greatly outperformed CPUs in memory bandwidth (Figure 1) and number of arithmetic operations per second. GPUs have an important role in today’s high performance computing applications. GPUs brought high performance computing, which was privilege to small group of people, scientists, and reserved for large computer clusters, to every commodity desktop/personal computer.Due to large processing power potential of GPUs, researchers and developers are becoming increasingly interested in exploiting this power for general purpose computing. Specific scientific fields, like computational fluid dynamics (CFD), benefited from this trend, of increasing processing power of GPUs. Algorithms that can be relatively easily parallelized, like Lattice Boltzmann method, gain more popularity.In this paper we investigate the performance differences between CUDA and OpenCL implementations of well-known CFD benchmark problem, one sided lid driven cavity flow. Code was developed using Java programming language, and open source java bindings libraries forOpenCL and CUDA, JOCL and JCUDA.

A. CUDA

1.1. CUDA

Compute Unified Device Architecture (CUDA) [1] has been introduced by NVIDIA in 2006., as proprietary, vendor specific, API and set of language extensions for programming NVIDIA products. Considering that CUDAhas been developed by the same company that produces hardware devices, it would be expected for CUDA code to perform better on their hardware products. Since, CUDA was another new device specific API and language,developers were forced to learn in order to utilize NVIDIA products, and that fact caused rise in demand fora single language and API that would be capable of dealing with any device architecture.CUDA provides two different APIs, the Runtime API and Driver API. Both APIs are very similar regarding basic tasks like memory handling, and starting withCUDA 3.0 APIs are interoperable and can be mixed tosome level. The most important differences between these two APIs is how kernel’s are managed and executed.

B. OpenCL

1.2. OpenCL

Open Computing Language (OpenCL) [2] is an open, royalty free, standard developed by Khronos group [3] for parallel programming of heterogeneous devices (CPUs,GPUs, DSPs) from different vendors. OpenCL has attracted vendor support, with implementations available from NVIDIA, AMD, Apple and IBM. It was introduced in late 2008. Because the standard has been designed to reflect the design of contemporary hardware there are a lot of similarities with the CUDA programming model.The execution model for OpenCL consists of the controlling host program and kernels which execute onOpenCL devices. To scientific programmers, the OpenCL standard may be an attractive alternative to CUDA, as it offers a similar programming model with the prospect of hardware and vendor independence.OpenCL code (kernel) can be compiled at runtime,which is not a case with CUDA compile model, and that add up to OpenCL execution time. On the other hand, thisjust in time compile model allows compiler to generatecode for the specific device (GPU), leveraging device’s architecture advantages

Open Computing Language (OpenCL) is an open, royalty free, standard developed by Khronos group for parallel programming of heterogeneous devices (CPUs,GPUs, DSPs) from different vendors. OpenCL has attracted vendor support, with implementations available from NVIDIA, AMD, Apple and IBM. It was introduced in late 2008. Because the standard has been designed to reflect the design of contemporary hardware there are a lot of similarities with the CUDA programming model. The execution model for OpenCL consists of the controlling host program and kernels which execute onOpenCL devices. To scientific programmers, the OpenCL standard may be an attractive alternative to CUDA, as it offers a similar programming model with the prospect of hardware and vendor independence.OpenCL code (kernel) can be compiled at runtime,which is not a case with CUDA compile model, and that add up to OpenCL execution time. On the other hand, this just in time compile model allows compiler to generatecode for the specific device (GPU), leveraging device’s architecture advantages

C. Similarities of CUDA and OpenCLCUDA and OpenCL are parallel computing frameworks. CUDA is supported only on NVIDIA products, OpenCL has more general approach, it is cross-platform and supported on heterogeneous devices from different vendors. Since, OpenCL standard has been designed to reflect contemporary hardware there are a lot of similarities between CUDA and OpenCL frameworks.OpenCL shares a set of core ideas with CUDA. These frameworks have similar platform models, memory models, execution models and programming models.Therefore, it is possible to transfer CUDA programs toOpenCL programs, and vice versa. Mapping betweenCUDA and OpenCL terminology, regarding memory and execution model, is presented C. Similarities of CUDA and OpenCLCUDA and OpenCL are parallel computing frameworks. CUDA is supported only on NVIDIA products, OpenCL has more general approach, it is cross-platform and supported on heterogeneous devices from different vendors. Since, OpenCL standard has been designed to reflect contemporary hardware there are a lot of similarities between CUDA and OpenCL frameworks. OpenCL shares a set of core ideas with CUDA. These frameworks have similar platform models, memory models, execution models and programming models.Therefore, it is possible to transfer CUDA programs to OpenCL programs, and vice versa. Mapping betweenCUDA and OpenCL terminology, regarding memory and execution model, is presented

There are a lot of similarities in every aspect of thesetwo programming frameworks, between CUDA andOpenCL. Almost every CUDA term can be mapped inOpenCL terminology. This fact led to creation of tools [8]for porting CUDA to OpenCL.In this work, existing OpenCL code [6,7], was ported toCUDA, manually, following syntax and other mappingspresented here.Mapping between CUDA and OpenCL thread/work-item indexing is given in

There are a lot of similarities in every aspect of thesetwo programming frameworks, between CUDA andOpenCL. Almost every CUDA term can be mapped inOpenCL terminology. This fact led to creation of tools [8]for porting CUDA to OpenCL.In this work, existing OpenCL code, was ported toCUDA, manually, following syntax and other mappingspresented here.Mapping between CUDA and OpenCL thread/work-item indexing is given in.

IMPLEMENTATION DETAILS

2. Implementation Details

To CUDA/OpenCL programmer, the computing system consists of a host (often a CPU) and one or more devices(often GPU) that are massively parallel processors equipped with a large number of arithmetic execution units. Programs, that have been developed, use Java programming language for the host part of the computing system, and CUDA and OpenCL kernel’s for programming of NVIDIA device that has been used.

A. Java and CUDA

2.1. Java and CUDA

In order to use Java as host programming language, we have used open source Java CUDA library (JCUDA ver.0.5.5) [5]. This library gives a level of abstraction, between host and device calls/commands. Eclipse IDE has been used to create Java project and add JCUDA .jar files to project, after that .dll files have to be copied to location in the environment “path”. Also, installation of CUDA toolkit (5.5) required an installation of MS Visual Studio(because it has bundled C compiler).Kernel source code has to be compiled using NVCC compiler. As a result we have a file that we can load and execute using Driver API. There are two options how the kernel can be compiled: as a PTX file, as a CUBIN file.We have compiled our kernel’s as PTX file, which is human readable (and not a case with CUBIN file).In order to use Java as host programming language, we have used open source Java CUDA library (JCUDA ver.0.5.5). This library gives a level of abstraction, between host and device calls/commands. Eclipse IDE has been used to create Java project and add JCUDA .jar files to project, after that .dll files have to be copied to location in the environment “path”. Also, installation of CUDA toolkit (5.5) required an installation of MS Visual Studio(because it has bundled C compiler).Kernel source code has to be compiled using NVCC compiler. As a result we have a file that we can load and execute using Driver API. There are two options how the kernel can be compiled: as a PTX file, as a CUBIN file.We have compiled our kernel’s as PTX file, which is human readable (and not a case with CUBIN file).

B. Java and OpenCL

2.2. Java and OpenCL

In order to use Java as host programming language, we have also used open source Java OpenCL library (JOCL ver. 0.1.3) [4]. We have used Eclipse IDE to create Java project, as with CUDA code. JOCL java archive files have been added to project path, and also JOCL.dll file has been put into environment “path”. OpenCL is able to compile kernel’s at runtime. In both cases (CUDA and OpenCL) three kernel files have been create for: “streaming”, “collision” and“boundaries”. Execution of these kernel’s have been called from host program written in Java (in cases of both frameworks). Performance results of these simulations are presented in next section.In order to use Java as host programming language, we have also used open source Java OpenCL library (JOCL ver. 0.1.3). We have used Eclipse IDE to create Java project, as with CUDA code. JOCL java archive files have been added to project path, and also JOCL.dll file has been put into environment “path”. OpenCL is able to compile kernel’s at runtime. In both cases (CUDA and OpenCL) three kernel files have been create for: “streaming”, “collision” and“boundaries”. Execution of these kernel’s have been called from host program written in Java (in cases of both frameworks). Performance results of these simulations are presented in next section.

III. DISCUSSION

3. Discussion

We have tested CUDA and OpenCL version of lid driven cavity numerical simulation on NVIDIAGeForce GT 220. In testing device (GPU) details have been listed. Latest CUDA drivers (320.57) andCUDA toolkit (5.5) have been usedWe have tested CUDA and OpenCL version of lid driven cavity numerical simulation on NVIDIAGeForce GT 220. In testing device (GPU) details have been listed. Latest CUDA drivers (320.57) andCUDA toolkit (5.5) have been used.