Porting CUDA to HIP

Give your proprietary CUDA code new life with an open platform for HPC.

You’ve invested money and time in writing GPU-optimized software with CUDA, and you’re wondering if your efforts will have a life beyond the narrow, proprietary hardware environment supported by the CUDA language.

Welcome to the world of HIP, the HPC-ready universal language at the core of AMD’s all-open ROCm platform [1]. You can use HIP to write code once and compile it for either the Nvidia or AMD hardware environment. HIP is the native format for AMD’s ROCm platform, and you can compile it seamlessly using the open source HIP/​Clang compiler. Just add CUDA header files, and you can also build the program with CUDA and the NVCC compiler stack (Figure 1).

Figure 1: HIP to device flowchart.

HIP is a permanent antidote to the cost penalties and inefficiency associated with vendor lock-in in the HPC space. Many HPC programmers are already making the switch to HIP as the best and most efficient language for GPU-optimized HPC code. But what about the programs you have already written?

Hundreds of CUDA programs exist in the world today, representing thousands of hours of programming time. How will this legacy CUDA code make the transition to the new era of open HPC programming?

The ROCm developers were well aware of the need for an easy solution to the problem of porting CUDA code, and the ROCm environment offers two automated methods for automatically converting CUDA projects to HIP:

• Hipify-perl – a Perl script you can run on the CUDA source code to convert it to HIP format

• Hipify-clang – a preprocessor that operates from within the HIP/​Clang compiler tool chain, converting the code as a preliminary step within the compiler process

These porting tools allow you to convert a complex CUDA program with only a few hours of programming time.

Hipify-perl

The hipify-perl script acts directly on the CUDA source code using a series of simple string replacements. The replacement routines automatically convert CUDA statements to HIP format. Hipify-perl is easy to use and is the preferred solution for smaller and less complicated programs. Around 90-99 percent of the code converts automatically using the script, and the programmer can use the available HIP debugging tools to troubleshoot the remaining bits.

The HIP installation files are available from the GitHub website [2]. After you install HIP and the ROCm environment, you will find the hipify-perl script in the HIP/​Bin directory. You can also produce hipify-perl using the hipify-clang compiler with the following command:

hipify‑clang ‑‑perl

To convert the CUDA file foo.cu to HIP format, enter the following:

hipify‑perl foo.cu > new_foo.cpp

You can then compile the HIP-ready .cpp file for the ROCm environment using the HIP/​Clang compiler:

*
hipcc new_foo.cpp

A recommended workflow for using hipify-perl is as follows:

1. Run the hipify-perl script on the CUDA source code.

2. Check the resulting HIP code into your preferred version control system.

3. Build the code using hipcc.

4. Correct any compiler errors or warnings and compile again.

After a few build cycles, the ROCm executable will be ready to run.

Hipify-clang

Hipify-clang is a preprocessor that uses the Clang compiler to parse the CUDA code and perform semantic translation. The hipify-clang option performs a more robust translation of the code and is therefore more suitable for larger and more complex projects. The preprocessor can generate warnings and provide assistance for the user. The result is a high-quality translation; however, the hipify-clang preprocessor is harder to use and is therefore intended primarily for larger and more professional projects.

Hipify-clang needs access to the same headers that would be needed to compile the file with Clang. For example:

./hipify‑clang square.cu ‑‑cuda‑path=/usr/local/cuda‑10.1 ‑I /usr/local/cuda‑10.1/samples/common/inc

Hipify-clang arguments are given first, followed by a separator, and then the arguments you’d pass to Clang.

To simplify troubleshooting, the HIP developers provide several unit tests through the LLVM lit/​FileCheck tool. For a list of hipify-clang options, type:

hipify‑clang ‑‑help

For more on using hipify-clang, see “The Clang Manual for Compiling CUDA Code” at the Clang website [3].

Examples

The ROCm project has already completed the task of porting a number of HPC libraries, frameworks, and applications to HIP format. Several of these tools were written in CUDA, and the ROCm developers have the goal of making these tools available to ROCm users. These porting projects serve as useful case studies for understanding the process of porting CUDA code to HIP.

One recent port that demonstrates the power and versatility of the HIP conversion tools is the Hardware/​Hybrid Accelerated Cosmology Code (HACC) cosmology simulation tool [4]. The HACC codebase consists of about 15,000 lines of .h and .c files. When porting the HACC codebase, 95 percent of the code passed without modification or was converted automatically using the hipify-perl script. The remaining 5 percent was mostly converted through custom substitutions using grep and other text-replacement tools. In all, the entire HACC codebase was ported to HIP in a single afternoon.

The ROCm developers have also ported the LAMMPS molecular dynamics simulator and the HPGMG benchmark to HIP with similar success. Another porting project was the Caffe deep learning framework [5], where 99.6 percent of the original code was either auto-converted or left unmodified. Figure 2 compares the complexity of the Caffe HIP port with the port to OpenCL.

Figure 2: Converting the CUDA-based Caffe deep learning framework to OpenCL versus HIP.

Getting Help

Simple CUDA programs port easily to HIP with only minor cleanup. If you’re looking for assistance with a more complex project, the ROCm porting center is ready to help with examples, experts, and other resources. For help with porting your CUDA Code, Contact the porting center at: 

cuda-to-hip@amd.com

Conclusion

If you’re worried about your CUDA code getting trapped in the dead end of vendor lock-in, you’re in luck. HIP is a versatile replacement for CUDA, and it comes with an all-open platform that supports other GPU-based hardware environments. The tools described in this paper provide an automated path for porting your CUDA programs to the open ROCm environment.

Once you have successfully ported your CUDA program to ROCm, you’ll have access to AMD’s powerful family of GPU-based products, including the Radeon Instinct™ accelerators. If you also plan to continue to use your program with Nvidia hardware, you can maintain the source code in HIP and compile it for either AMD or Nvidia hardware from a single codebase.

Info

[1] ROCm platform

[2] HIP installation

[3] The Clang Manual for Compiling CUDA Code

[4] HACC

[5] Caffe deep learning framework

Tags: CUDA CUDA , GPU GPU , HPC HPC , HPC HPC , ROCm ROCm