OpenCL
Compiling and Calling the Kernel
The kernel source code must be compiled at run time by the OpenCL library. This is the only way to ensure platform-independence. I create a cl::Program object from the source code and call program.build() to compile it (line 87).
In addition to a unique global index, each thread also has a local index because of the hardware layout. All the threads in a work group share a small, but very fast, memory area. The non-optimized example described here does not use this memory but uses, instead, a generic, two-dimensional size of 16x16 for a work group (lines 90 and 91).
I need the total number of threads in each dimension to call the kernel. This number must be at least as large as the output image; however, it also must be divisible by the work group size in every dimension.
I can now load the previously compiled kernel from cl::Program and pass in the numbers I just calculated. Finally, I need to start the kernel by passing the parameters to the cl::​KernelFunctor. The parameters are the input image buffer, output image buffer, convolution kernel buffer, and corresponding metadata (size) (line 100).
Although it is necessary to create and populate the buffers in advance, simple data types like the image sizes can be passed in directly (i.e., without using enqueueWriteBuffer()). A kernel call is always non-blocking. You can use a cl::Event with a .wait() to wait for the call to complete.
The threads from the kernel call write their results to the outGPU buffer. The results are copied from this buffer to host memory using enqueueReadBuffer().
Sample Code
To keep things simple, link the code against the free libpng library [15]. You can use this library to read and write grayscale PNG files.
After entering make all to build, you can use the convolution kernels to process any grayscale PNG. The command syntax is convolucl <input.png> <output.png> [kernel index]. If you call convolucl without any parameters, it gives you a list of available kernels:
0 Sobel 1 Gauss 5x5 2 Gauss 12x12 3 Mean 3x3 4 Mean 5x5 5 Emboss ;6 Sharpen 7 Motion blur
Figure 2 shows some of the results of these convolutions.