High-Performance Python – Distributed Python

Scale Python GPU code to distributed systems and your laptop.

High-Performance Python – GPUs

Python finally has interoperable tools for programming the GPU – with or without CUDA.

High-Performance Python – Compiled Code and Fortran Interface

Fortran functions called from Python bring complex computations to a scriptabllanguage.

High-Performance Python – Compiled Code and C Interface

Although Python is a popular language, in the high-performance world, it is not known for being fast. A number of tactics have been employed to make Python faster. We look at three: Numba, Cython, and ctypes.

OpenMP – Coding Habits and GPUs

In this third and last article on OpenMP, we look at good OpenMP coding habits and present a short introduction to employing OpenMP with GPUs.

In the Loop

Diving deeper into OpenMP loop directives for parallel code.

OpenMP

The powerful OpenMP parallel do directive creates parallel code for your loops.

Porting Code to OpenACC

OpenACC directives can improve performance if you know how to find where parallel code will make the greatest difference.

OpenACC Directives for Data Movement

The OpenACC data directive and its associated clauses allow you to control the movement of data between the host CPU and the accelerator GPU.

Parallelizing Code – Loops

OpenACC is a great tool for parallelizing applications for a variety of processors. In this article, I look at one of the most powerful directives, loop.