Freeing the GPU

Exploring AMD’s ambitious Radeon Open Compute Ecosystem with ROCm senior director Greg Stoner.

AMD released the Radeon Open Compute Ecosystem (ROCm) for GPU-based parallel computing about a year ago. The ambitious ROCm project builds a complete open source ecosystem around the once-very-proprietary world of GPU-accelerated high-performance computing. We sat down with ROCm Senior Director Greg Stoner to find out why ROCm could bring big changes to the HPC space.

ADMIN Magazine: How about if you start with a brief introduction to ROCm. What is it and why should the world be excited?

Greg Stoner: ROCm is an open source, HPC-class platform for GPU-based computing that is language independent. There are other GPU-accelerated platforms, of course; what’s different about ROCm is, we didn’t stop at the base driver but actually opened up all the tooling and libraries on top of it. The application work that we do we actually deliver in source form, and we use GitHub to deliver it.

ROCm offers several different languages and paths to code for the GPU. We even have this thing called HIP [Heterogeneous-compute Interface for Portability], which provides an easy way to port code from CUDA. As you might know, CUDA is GPU aware, but it only supports the GPUs of one vendor. ROCm is open source, so any vendor can work with it and port it to their platform. Code written in CUDA can port easily to the vendor-neutral HIP format, and from there, you can compile the code for either the CUDA or the ROCm platform. So, CUDA programmers have a comfortable environment to be in, and they can bring their code across to HIP using our porting tools.

We also built a solution for C++ programmers we call HCC. HCC is a C++ compiler single source that essentially lets you integrate the CPU code and the GPU code into one file. We wanted to make the system as simple as possible to install. Historically many drivers are installed with shell scripts, but we wanted closer integration with Linux, so we are actually working with conventional package installers. To install on Ubuntu, you just add the repo, and then it’s apt-get install rocm, and you’re ready to go.

I’ve been working with Linux for a long time. I ran a sysop team before I came to AMD. I used to run big clusters and storage arrays. We were all CentOS-based back then. I brought along that Linux sysop sensibility when I joined the ROCm project.

AM: Isn’t OpenCL an open source solution that supports GPU acceleration?

GS: OpenCL is a solid solution; it solves a set of critical needs, but it’s basically only C99, and you know that doesn’t work in the enterprise space.

So, what we wanted to do is make sure we have a foundation at the core to allow us to adapt different languages, so we put in things like standardized loader interfaces, standardized APIs, a system run-time interface, which will actually make the driver act more like an OS service. So now, you can dynamically load a set of languages that you want to use on the platform and then build up the userland from there.

We supply the standard set of languages, and yes we do support OpenCL.

AM: So the programmer gets to work in a familiar language and still benefit from GPU acceleration?

GS: OpenMP has something very similar – you basically tell it when you want to execute against the GPU. We’re doing the same thing with a set of lab operators, which are like directives for GPU computing, but you write your code more like how you historically did – very similar to how you do it in Fortran or C++, so it’s more natural.

ROCm also supports Python. We’ve been working with Anaconda [Python distribution for large-scale and scientific use cases]. The other thing is, ROCm supports inline assembly, so you can performance-tune the hot loops just like you do on a CPU. We have an assembler/disassembler as well, which you can use to work on performance-tuning your application. All our compiler technologies sit on LLVM, and LLVM actually generates native ISA, and we upstream all that, so there’s a standardized LLVM compiler for the platform.

There are two pieces to ROCm: the core driver, and that’s becoming more like an OS service, and then you have the programming stack, and that’s more like it is with in an operating system, where you can add in modules and composable software like you have in Unix. That’s the direction we’re headed: making everything composable. And it’s all open.

AM: The HPC space has a big focus on machine learning and AI. What kind of support do you provide in ROCm?

GS: We have open sourced the MIOpen machine-learning library, which supports GPU acceleration for deep convolution networks. We’re in the process of porting several machine-learning libraries and tools. Many of these tools were written with CUDA, and we can use HIP to bring them into the ROCm stack. People don’t realize that something like the TensorFlow machine-learning library is over a million lines of code, and we’re bringing that across, and then we’ll hand-tune it for our platform. Then we will release it back to the market, and we’ll maintain our ROCm-ready version. We’re in the process of releasing ROCm versions of these AI tools now. We started with Caffe, and now we’re working on MXNet, TensorFlow, and PyTorch.

AM: Do you have teams of HPC programmers out there who have been working with ROCm and are giving you feedback?

GS: Yes we do. We have also been working with AMD Research Group, which works with the US Department of Energy, and we’ve been working with the deep learning community as well and getting feedback from them.

AM: Sounds like a massive effort.

GS: It is a very massive effort; I remind my team how much we’ve achieved. But yes it’s been a Herculean effort.

AM: Describe a typical ROCm user. Who is going to be using this technology?

GS: We found a real niche in the oil and gas business, and they were some of our earlier customers. We’ve also been working with the CHIME project, a radio telescope project out of Canada. We’ve had a couple of bigger customers that were doing some interesting things in the manufacturing space and need to do hand-tuning in assembly. We’ve also got people using ROCm in the automotive industry.

AM: Where does it go from here? How do you see ROCm evolving over the next few years?

GS: One area that we see we’re going to be investing in more and more is system management. We’re going to continue to focus on deep learning and network-based programming models. We’re just at the beginning of what is posible.

If you really look at the last year, we received lots of input. In the first year we did over nine releases. We were moving. That’s how you drive feature function, then you do more feature function and work on performance stability. Once you get the core functionality in place, it’s about feature tuning, stability tuning, performance tuning. And there are some new things that are coming online – migration technologies that we’re extending into. We’ve got the foundation now; the biggest part is behind us.

AM: Since it’s all open, is it possible that another chip vendor could then employ this technology for their own hardware?

GS: Sure. Actually, it’s built on the HSA [Heterogeneous System Architecture] model, and there are people in the HSA Foundation that are using some of the technologies from ROCm already.

AM: Is that part of your dream – that this becomes a standard part of the open source infrastructure?

GS: Yes it is, In addition to AMD processors and Intel Xeon, we been bringing up Power8 support in the core – also ARM AArch64 support in the core. Over time, I would love to see what we’ve done – the core of ROCm – become a natural extension of the Linux driver and Linux distros – and all of the programming model on top of it. We’re trying to get to a point where it’s just a standard basic infrastructure that people can use for heterogeneous computing. That’s really what this is about; it’s how we build heterogeneous computing solutions that are standards and broadly distributed, so it’s easier for developers to do the work they care about and use all the compute power in the system. That’s what’s missing today. Everyone’s built these monolithic stacks that just work for them, and now we need to figure out how to get the entire industry working together on this class of programming.

Basically, I look at this as: How do we get this class of software to live for a millennium? And we’re harnessing the power of free software. GCC was created in the mid-90s, and it lacked some of the performance of commercial compilers at the time. But now, the commercial compilers can barely keep up with it. It keeps getting better and better. That’s how I see what we’re doing with ROCm. We planted a flag, and now it’s an evolution.

Resources

[1] ROCm

[2] ROCm – A New Era in Open GPU Computing

[3] ROCm documentation

[4] ROCm video

[5] HIP

[6] HIP datasheet

[7] MIOpen

Tags: AMD AMD , ROCm ROCm , Stoner Stoner