TASCEL: Task Scheduling Library
TASCEL (pronounced "tassel") is a framework to study the design of algorithms to address the challenges associated with programming abstractions supporting finer-grained concurrency. It uses an active message framework built on MPI. MPI allows quick prototyping and evaluation on various platforms with minimal porting effort. The active message framework enables the design of supporting algorithms that are concurrent with ongoing execution (e.g., load balancing or fault recovery concurrent with application execution). TASCEL supports various threading modes, progress semantics, together with SPMD and non-SPMD execution, so as to be representative of a variety of useful execution environments.
Following is some recent work on algorithms for finer-grained concurrency abstractions using the TASCEL library.
Scheduling and Load Balancing
Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. We evaluated two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. We developed a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also developed a retentive work stealing algorithm optimized for iterative applications on distributed memory machines. Retentive work stealing exploits persistence to incrementally rebalance work and reduce the overhead associated with random stealing.
J. Lifflander, S. Krishnamoorthy, and L. Kale. "Work stealing and persistence-based load balancers for iterative overdecomposed applications". HPDC'12
Fault Tolerance
We have designed fault tolerance mechanisms for task parallel computations operating on global data. We developed three recovery schemes that present distinct trade-offs: lazy recovery with potentially increased re-execution cost, immediate collective recovery with associated synchronization overheads, and noncollective recovery enabled by additional communication. We employ distributed-memory work stealing to dynamically rebalance the tasks onto the live processes. We demonstrated that the overheads (space and time) of the fault tolerance mechanism were low, the costs incurred due to failures were small, and the overheads decreased with per-process work at scale.
W. Ma and S. Krishnamoorthy. "Data-driven fault tolerance for work stealing computations". ICS'12
Tracing Work Stealing
Work stealing is inherently flexible and can tolerate variations due to faults, power, of system noise anticipated on exascale systems. However, such a flexible approach to dealing with unbalanced distribution of work results in seemingly irregular computation structures, complicating the study of the runtime behavior of work stealing schedulers. Typical approaches to studying work stealing often resorted to tracking information on each individual task. Given that the number of tasks can be orders of magnitude greater than the number of processor cores, this approach can quickly become intractable. We have developed an approach to efficiently trace async-finish parallel programs scheduled using work stealing with low time and space overheads. We demonstrated the broader applicability of this work, in addition to replay-based performance analysis, through two use cases: the optimization of correctness tools that detect data races in async-finish programs; the design of retentive work stealing algorithms for recursive parallel programs.
J. Lifflander, S. Krishnamoorthy, and L. Kale. "Steal Tree: low-overhead tracing of work stealing schedulers". PLDI'13