High Performance Computing

GridLAB-D was designed to support high performance computers such as share-memory machines, cluster, and other specialized architecture. Many of the problems observed with previous implementations seem to be resolved in the version targeted for 3.0 branch. Extensive testing remains to be completed. Contact [mailto:shuangshuang.jin@pnl.gov Shuangshuang Jin at PNNL] for more information.

GridLAB-D HPC History
The allocation of objects in the GridLAB-D 1.x core does not take into account the processor to which an object is assigned. Consequently processor migration caused frequent cache invalidation and reduced performance far more than expected. This has been mitigated by controlling processor affinity. See for details.

GridLAB-D 2.x includes an object property that identifies the affinity of an object for a particular thread. However, for this to be effective required that memory allocation routine be aware of and comply with processor affinity when objects are created. This was never successfully implemented. In addition, 2.x had a thread pool implementation that was largely ineffective due to its high overhead and its inability to control the CPU affinity of threads.

As of 3.0, many internal loops have been successfully parallelized and performance in multi-threaded environments is significantly improved. See for details.

Single-threaded CPU affinity
When running in single thread mode, CPU affinity of a GridLAB-D run can be controlled from the shell. In Windows, the  command can be used as follows

C:\> start /affinity cpuset /b /wait gridlabd options glmfile

On linux/mac you can use the schedutils package to control CPU affinity as follows:

host# taskset -c cpuid gridlabd options glmfile

The user can enable single-threaded CPU affinity by setting the global variable  to a non-zero value. The number provided will apply the CPU mask to the CPU affinity using the appropriate API for the platform. See for information on applying CPU affinity in the core.

As of revision 2700 the trunk Windows and Linux builds automatically supports single-threaded processor affinity internally. This is accomplished using the processor affinity API in code/module.c.

The processor affinity API uses a global map of the processor affinities for all instances of GridLAB-D running on a machine. To display the global process map, use the --pstatus command line option.

host% gridlabd --randtest & gridlabd --pstatus [1] 16799 PROC  PID STATE                      CLOCK COMMAND 0 16807 Running                    INIT /usr/lib/gridlabd/gridlabd.bin --pstatus 1 16808 Running                    INIT /usr/lib/gridlabd/gridlabd.bin --randtest host%

This global map can become corrupted in the event of a catastrophic failure of an instance of GridLAB-D and leave a zombie entry in the process map. In such circumstance the --clearmap command can be used to purge the map.

host% gridlabd --clearmap host%

A run associate with a processor can be killed using the --pkill options:

host% gridlabd --pkill 1 host%

A number of important considerations for both users and developers should be noted.


 * 1) The processor affinity API only tracks runs that have been assigned to processors.  When more runs are active than the number of available processors, the API stops tracking the runs.  A message such as WARNING [INIT] : no processor available to avoid overloading will be displayed when a run is started that overloads the affinity API.
 * 2) On Linux/POSIX systems the process map is associated with a file in the /tmp folder that must be accessible to all gridlabd users.  If the file doesn't not have permissions 0666 or cannot be created the API will not work and processor affinity will not be automatically controlled.
 * 3) In the event that API fails on a map access error, it is possible that the map has become corrupted.  Usually the only remedy is to delete the tmp file and/or reboot the machine.  It would be nice if somebody implemented a --repairmap command line option that scanned the map and cleared out any zombies and/or deleted the map when it was empty so that it could be recreated from scratch without rebooting.
 * 4) At this point it looks like the processor affinity API on Mac OS X will have to wait until thread affinity is implemented.  The Mac affinity API seems to only support threads and not processes.

Multi-threaded CPU affinity
GridLAB-D assigns the various synchronization loops to threads from a pool created specifically for each sync task. Locks and mutexes are then used to start and signal completion of each loop.

The data needed to implement parallelization of a component is as follows:

The parallel synchronization control for any sync loop is implemented as follows:

The parallel synchronization code itself is implemented as follows:

When running GridLAB-D in multithreaded mode performance can degrade significantly because of the granularity of the core parallelization. The normal OS schedule algorithm in symmetric multiprocessing operating systems can be adversely impacted by GridLAB-D. CPU migration can be a very large drain on resources because of cache invalidation issues with objects that have significant memory accesses (which most do). The way to control this problem is to establish CPU affinity for each thread in advance. This requires that the core determine which threads get assigned to which CPUs very early on such that optimal CPU balance is maintained throughout the run.

The CPU affinity plan implemented by the core shall account for both the sequence of events and clustering of events during parallel sections. The following event sequence shall applied by the core's CPU affinity plan:


 * 1) schedules synchronization
 * 2) loadshape synchrnoization
 * 3) transform synchronization
 * 4) enduse synchronization
 * 5) object synchronization
 * 6) object commit

For each of these event sequences, parallelization is implemented separately and CPU affinity is planned separately using all the available CPUs. Exactly one thread shall be assigned to an available CPU.

The CPU affinity of a thread shall be controlled using the platform-native thread management functions. For Linux systems, the function used shall be sched_setaffinity. On Windows the function used shall be. On Mac OS X, the thread affinity API shall be used.

If the global variable cpu_affinity</tt> is non-zero, the CPU affinity will be limited to the CPU mask described by the variable. The global variable cpu_affinity</tt> shall support 65536 processors. When the global variable cpu_affinity</tt> is zero, the number of processor specified by the num_threads</tt> global variable shall be selected automatically from a group of unutilized CPUs. GridLAB-D shall implement a shared memory variable to coordinate CPU reservation to prevent oversubscription of CPUs.