In the Loop

Loop Schedule

The compiler will try to divide the work equally among threads for parallel regions defined by omp parallel . You can control how the work is divided with a clause to the directive of the form

!$omp do schedule(kind[, <chunksize>])

The integer expression chunksize  is a positive value, and the values for kind  are:

  • static
  • dynamic
  • guided
  • auto
  • runtime

The static  schedule breaks up the iteration space into chunk sizes, and the chunks are then assigned cyclically to each thread in order (chunk 0 goes to thread 0, chunk 1 goes to thread 1, etc.). If chunksize  is not specified, the iteration space is divided into equal chunks (as much as possible), and each chunk is assigned to a thread in the team.

The dynamic  schedule divides the iteration space into chunks of size chunksize  and assigns them to threads on a first come, first served basis; that is, when a thread is finished with a chunk, it is assigned the next chunk in the list. When no chunksize  is specified, the default is 1 . Dynamic scheduling allows you to create more chunks than there are threads and still have them all execute.

The guided  schedule is somewhat similar to dynamic , but the chunks start off large and get exponentially smaller (again, with a chunksize  of  as the default). The specific size of the next chunk is proportional to the number of remaining iterations divided by the number of threads. If you specify chunksize  with this schedule, it becomes the minimum size of the chunks.

The auto  schedule lets the run time decide the assignment of iterations to threads on its own. For example, if the parallel loop is executed many times, the run time can evolve a schedule with some load balancing characteristics and low overheads.

The fourth schedule, runtime , defers the scheduling decision until run time. It is defined by an environment variable (OMP_SCHEDULE ), which allows you to vary the schedule simple by changing OMP_SCHEDULE . You cannot specify a chunksize  for this clause.

Fortran and C scheduling clause examples that use the dynamic  scheduler with a chunk size of  are shown in Listing 11. Which schedule is best really depends on your code, the compiler, the data, and the system. To make things easier when you port to OpenMP, I would recommend leaving the schedule to the default (which is implementation dependent) or changing it to runtime  and then specifying it with the OMP_SCHEDULE  environment variable. The auto  option delivers best performance.

Listing 11: schedule()  Clause

Fortran C
!$omp do schedule(dynamic, 4)
#pragma omp for schedule(dynamic, 4)

Summary

In this article I expanded on the previous article about OpenMP directives with some additional directives and some clauses to these directives. Remember, the goal of these articles is to present simple directives that you can use to get started porting your code to OpenMP. In this article, I covered the following topics:

  • Data and control parallelism
  • Teams and loops revisited
  • omp parallel  options, including reductions defaultfirstprivatelastprivatenowait , and reduction  and loop scheduling

In the next article, I will present some best practices in general use by myself and others. I’ll also take a stab at discussing how to use the OpenMP accelerator directive to run code on GPUs.

Related content

comments powered by Disqus