LibJAD Multi-processing Routines
#include "jadmulti.h"
Overview
LibJAD provides a framework for multi-processing, built around the function process_list. The process_list function spawns one thread per logical processor and then uses those threads to call a specified worker function once for each item on the list in a parallel fashion. The function number_of_processors provides a portable way to determine the number of available processors available.
Functions
int number_of_processors(void)
problem | A specification of the list to be processed. |
return | A list of pointers to processed results. |
Returns the number of logical processors available on the system. Multi-threaded systems usually report two-processors per core if multi-threading is enabled, although this behavior is somewhat dependant on the underlying operating system. If the environmental variable NUMBER_OF_PROCESSORS is set that value will be returned instead of the number reported by the operating system. Supported systems are Linux, FreeBSD, MAC OSX, SGI, and Solaris. On unsupported systems this function defaults to returning one if the variable NUMBER_OF_PROCESSORS is not set.
void** process_list(list_proc_obj problem)
problem | A specification of the list to be processed. |
return | A list of pointers to processed results. |
This function takes a list of data items and a worker function, and calls that worker function once with each data item on the list. This task is spread across a a number of threads, equal to the number returned by the function number_of_processors. The functions attempts to load balance between the threads while also minimizing the number of times a mutex must be locked. The internal algorithm assumes that items close to each on the the list will require similar amounts of time to process. The input to this routine is a list_proc_obj data structure, which contains the list, the size of the list, a shared data block, and pointers to the worker function and functions to manage any needed per thread memory or setup. See the sections below for details on these data structures and functions.
list_proc_obj get_list_proc_obj(void** list, unsigned long list size, void* shared, function process_item)
list | The list to process |
size | The size of the list |
shared | A pointer to the shared memory block, may be NULL |
process_item | A pointer to the worker function |
return | A list_proc_obj filled out with the provided information. |
A convenience routine for setting up a list_proc_obj.
Data Structures
list_proc_obj
void** list | A list of pointers to data items to be processed. |
unsigned long list_size | The number of items on list. |
void* shared | A pointer to problem specific instance data. |
function process_item | A pointer to the user supplied process_item function. |
function allocate_scratch_space | A function to allocate per thread process space and resources. |
function free_scratch_space | A function to free per thread process space and resources. |
list_proc_obj is a typedefed structure used to specify the parameters and data for a problem to be solved in a multi-threaded fashion by the process_list function. If the function allocate_scratch_space is set to NULL, the process_item function will receive a NULL pointer for its per process scratch space. If free_scratch_space is set to NULL allocated scratch spaces will be passed to the standard free function. If there is no allocated scratch space because allocate_scratch_space was set to NULL, no cleanup routines will be called.
Developer Supplied Functions
To make use of process_list, a number of routine must be supplied. A process_item routine does the main computational work of processing each item on the list, and additional routines handle allocating and freeing the per thread scratch space, if such is needed.
void* process_item(void* shared_block, void* scratch, void* item)
shared_block | The shared data block. |
scratch | Per thread scratch space. |
item | A data item to process. |
return | A pointer to the results of the processing. |
The process item function will be called once for each item on the list to be processed. The first parameter will point to a block of data shared by all processes; many threads may be accessing this data at any given point and it is not protected in any way. The second parameter is a block of per-thread scratch space allocated by the provided allocate_scratch_space function. It may be used for per-thread house keeping, and is retained between calls to process_item within the same thread. Item is one of the items on the list to be processed. The return value of this function will be placed into the return list at the same relative offset as the item sent to this this particular invocation of the process_item function.
void* allocate_scratch_space(void* shared_block)
shared_block | The shared data block. |
return | A pointer newly allocated space. |
This method is called once per thread to setup the per-thread scratch space. It is called with a pointer to the data block that is shared between the threads. Many instances will want to consult the shared_block and allocate a block of memory based on the problem instance data found there, but this routine may also be used to do more complicated setup. Unlike process_item, allocate_search_space will only be invoked serially although it will be invoked once for each thread to be used by the processor.
void free_scratch_space(void* shared_block, void* scratch)
shared_block | The shared data block. |
scratch | Per thread scratch space to be freed |
shared | The shared data block. |
One all processing is done process_list will invoke the provided free_scratch_space routine once for each scratch space allocated at the start of processing. This is the place to do any required cleanup as well as free the memory used.
void qsort_2t(void* list, size_t numel, size_t szof, int (*cmpfunc)(const void*,const void*))
list | The list to be sorted. |
numel | The number of elements on the list. |
szof | The size of each element. |
cmpfunc | The comparator function. |
This list sorts an array in a manner similar to qsort. When number_of_processors reports only a single processor, or when the list is less than 50 elements, this function will simply pass its parameters to qsort. On a multi-processor machines longer lists will be divided and each half sent to qsort in a separate thread. The resulting lists will then be merge sorted. Wall clock runtime in this case is N/2 + log(N/2) + N.