pH (Parallel Haskell)

Principal Investigator Arvind

Co-investigators Srinivas Devadas , Larry Rudolph

Project Website http://csg.csail.mit.edu/pubs/parallel.html

Project Start Date May 1999

Project End Date  May 2006

A formal definition of the pHlanguage itself is available as CSG Memo #369. It defines pH in terms of Haskell. Some modifications have been added to the type system, including constructor classes; these are in line with existing Haskell implementations. The original pH manifesto is available in html format as well.

The pH language is is a parallel, eagerly-evaluated variant of Haskellwith syntactic provisions for loops, barriers, and I- and M- structure storage. The eager evaluation model of pH is similar to that of Id; the current version of the pH compiler shares a back end with the Id compiler, producing code for the Monsoon dataflow machine. The front end of the pH compiler is a modification of hbcc, an ordinary Haskell compiler developed by Lennart Augustsson. Optimizations have been added to parallelize list generation and traversal, while existing optimizations have been weakened somewhat so that they behave properly in the presence of side effects. The pH Prelude is largely identical to the Haskell Prelude, permitting many programs to compile without modification in either Haskell or pH. Functions which deal with infinite data structures (such as enumFrom) have been eliminated, and the error behavior of other functions (such as array) have been modified slightly. In addition, pH provides Prelude code for parallel list operations, I-structures, and M-structures. A variation on State Transformers will permit state-manipulating programs written in a subset of pH to be run under Haskell simply by incorporating additional code. It is hoped that by retaining much of the Haskell language, pH will allow direct parallelization of existing Haskell programs, and permit better comparisons between lazy and eager evaluation models.

There are two documents describing list traversal optimization. This material should be useful to functional programmers of every stripe, and not just implementors of eager languages.

There is also a series of CSG Memos which discuss various aspects of the semantics of pH and various ways to translate barriers into control flow. Current work focuses on getting working implementations of pH running both serially and in parallel on stock architectures. These efforts divide up as follows:

Library issues and implementation, along with high-level code optimizations based on algebraic programming and types are being investigated.

Optimizing code at all levels -- Much work has been done on this subject in Id. This ought to be directly applicable to pH, assuming we retain a compatible (graph-like) world view.

Partitioning code into threads -- This is the focus of work at CSG by Satyan Coorg. The idea is to partition eager code into strict partitions. These partitions are gauranteed to run - to completion (without blocking) -- exactly when all of their input is available. The partitioning phase is thus a bridge between conventional functional (or dataflow) compilation and multithreaded architectures

A graph-like virtual machine -- Much of the work performed by an Id or pH program is analogous to the "graph building" performed by lazy compiled graph-reduction systems such as the G-machine. This observation is leading us to examine graph-like views of pH execution. Shail Aditya has been at the vanguard of this work, and there are several variations on this theme now written up; the most recent is available as CSG Memo #374

*RISC -- multithreaded generic back end: *RISC is a hopefully fairly architecture-independent description of multithreaded code, with an instruction set loosely based on that of the PowerPC. The goal is to generate code for any RISC-based machine, but initially PowerPC-based workstations and SMP's, and the *T-NG machine. The back end efforts are thus closely tied to the *T-NG project.