Granularity in Parallel Algorithms

Nathan Hottle
granularity - a relative measure of the ratio of the amount of computation to the amount of communication within a parallel algorithm implementation

What Is Granularity?

Parallel algorithms are defined as such because multiple sections of the algorithm are designed to run simultaneously on more than one processor. These sections compose the parallel part of the algorithm. Within the parallel sections, every active processor is assigned a specific task. Each may have its own personal task or they all may be given identical tasks to be performed on their personal data. The task may be as simple as incrementing a counter or it may be a subroutine that involv es many operations. The size of these tasks is expressed as the granularity of the parallelism. To emphasize the relation between granularity and size, an alternate definition is offered. The grain size of a parallel instruction is a meas ure of how much work each processor does compared to an elementary instruction execution time. It is equal to the number of serial instructions done within a task by one processor.

How Is It Measured?

The granularity in a parallel section of an algorithm is generally classified by 1 of 3 relative values: fine, medium, or coarse. Notice that I refer to a parallel section of an algorithm instead of the algorithm itself whe n determining granularity. An algorithm may contain many different grain sizes and in fact, even a section of an algorithm may have one grain size nested within another. Granularity is determined by three characteristics of the algorithm and the hardwar e used to run the algorithm.

Why Is It Important?

A study of granularity is important if one is going to choose the most efficient paradigm of parallel hardware for the algorithm at hand. SIMD machines are the best bet for very fine-grained algorithms. These machines are built for efficient communication, usually with neighboring PEs. MIMD machines are less effective on fine-grained algorithms because the message passing system characteristic of these machines causes much time to be wasted in communication. These machines p erform best with larger grained algorithms. Another parallel paradigm is a network of workstations. Very slow communication classifies this paradigm. It is recommended for coarse-grained algorithms only. In fact, it is often more efficient to u tilize fewer workstations than are available thereby reducing the amount of communication. Being able to recognize the parallelism within an algorithm and analyze its granularity will guide a programmer to the best parallel paradigm for the task at hand.

Lewis, Rewini. Introduction to Parallel Computing. Prentice-Hall, 1992.
Nevison, Christopher. "Numerical Solution of the Wave Equation: An Example of Data Parallel Computing."