LibJAD Multi-processing Routines

#include "jadmulti.h"

Overview

LibJAD provides a framework for multi-processing, built around the function process_list. The process_list function spawns one thread per logical processor and then uses those threads to call a specified worker function once for each item on the list in a parallel fashion. The function number_of_processors provides a portable way to determine the number of available processors available.

Functions

int number_of_processors(void)

problemA specification of the list to be processed.
returnA list of pointers to processed results.

Returns the number of logical processors available on the system. Multi-threaded systems usually report two-processors per core if multi-threading is enabled, although this behavior is somewhat dependant on the underlying operating system. If the environmental variable NUMBER_OF_PROCESSORS is set that value will be returned instead of the number reported by the operating system. Supported systems are Linux, FreeBSD, MAC OSX, SGI, and Solaris. On unsupported systems this function defaults to returning one if the variable NUMBER_OF_PROCESSORS is not set.

void** process_list(list_proc_obj problem)

problemA specification of the list to be processed.
returnA list of pointers to processed results.

This function takes a list of data items and a worker function, and calls that worker function once with each data item on the list. This task is spread across a a number of threads, equal to the number returned by the function number_of_processors. The functions attempts to load balance between the threads while also minimizing the number of times a mutex must be locked. The internal algorithm assumes that items close to each on the the list will require similar amounts of time to process. The input to this routine is a list_proc_obj data structure, which contains the list, the size of the list, a shared data block, and pointers to the worker function and functions to manage any needed per thread memory or setup. See the sections below for details on these data structures and functions.

list_proc_obj get_list_proc_obj(void** list, unsigned long list size, void* shared, function process_item)

listThe list to process
sizeThe size of the list
sharedA pointer to the shared memory block, may be NULL
process_itemA pointer to the worker function
returnA list_proc_obj filled out with the provided information.

A convenience routine for setting up a list_proc_obj.

Data Structures

list_proc_obj

void** listA list of pointers to data items to be processed.
unsigned long list_sizeThe number of items on list.
void* sharedA pointer to problem specific instance data.
function process_itemA pointer to the user supplied process_item function.
function allocate_scratch_spaceA function to allocate per thread process space and resources.
function free_scratch_spaceA function to free per thread process space and resources.

list_proc_obj is a typedefed structure used to specify the parameters and data for a problem to be solved in a multi-threaded fashion by the process_list function. If the function allocate_scratch_space is set to NULL, the process_item function will receive a NULL pointer for its per process scratch space. If free_scratch_space is set to NULL allocated scratch spaces will be passed to the standard free function. If there is no allocated scratch space because allocate_scratch_space was set to NULL, no cleanup routines will be called.

Developer Supplied Functions

To make use of process_list, a number of routine must be supplied. A process_item routine does the main computational work of processing each item on the list, and additional routines handle allocating and freeing the per thread scratch space, if such is needed.

void* process_item(void* shared_block, void* scratch, void* item)

shared_blockThe shared data block.
scratchPer thread scratch space.
itemA data item to process.
returnA pointer to the results of the processing.

The process item function will be called once for each item on the list to be processed. The first parameter will point to a block of data shared by all processes; many threads may be accessing this data at any given point and it is not protected in any way. The second parameter is a block of per-thread scratch space allocated by the provided allocate_scratch_space function. It may be used for per-thread house keeping, and is retained between calls to process_item within the same thread. Item is one of the items on the list to be processed. The return value of this function will be placed into the return list at the same relative offset as the item sent to this this particular invocation of the process_item function.

void* allocate_scratch_space(void* shared_block)

shared_blockThe shared data block.
returnA pointer newly allocated space.

This method is called once per thread to setup the per-thread scratch space. It is called with a pointer to the data block that is shared between the threads. Many instances will want to consult the shared_block and allocate a block of memory based on the problem instance data found there, but this routine may also be used to do more complicated setup. Unlike process_item, allocate_search_space will only be invoked serially although it will be invoked once for each thread to be used by the processor.

void free_scratch_space(void* shared_block, void* scratch)

shared_blockThe shared data block.
scratchPer thread scratch space to be freed
sharedThe shared data block.

One all processing is done process_list will invoke the provided free_scratch_space routine once for each scratch space allocated at the start of processing. This is the place to do any required cleanup as well as free the memory used.

void qsort_2t(void* list, size_t numel, size_t szof, int (*cmpfunc)(const void*,const void*))

listThe list to be sorted.
numelThe number of elements on the list.
szofThe size of each element.
cmpfuncThe comparator function.

This list sorts an array in a manner similar to qsort. When number_of_processors reports only a single processor, or when the list is less than 50 elements, this function will simply pass its parameters to qsort. On a multi-processor machines longer lists will be divided and each half sent to qsort in a separate thread. The resulting lists will then be merge sorted. Wall clock runtime in this case is N/2 + log(N/2) + N.