Pete's Log: I like reading! part 6
Entry #1151, (Coding, Hacking, & CS stuff)(posted when I was 23 years old.)
*T: A Multithreaded Massively Parallel Architecture
by Nikhil, Papadopoulos, Arvind. 19th ISCA, 1992
This paper presents the *T (pronounced Start) architecture. It is a proposed architecture for a massively parallel system. The paper presents some general problems in MPA (latency of remote loads, idling due to synchronizing) and proposes solutions.
The main component of the solution is multithreading, since multithreading can be used to mask latency. Split-phase transactions are also vital to make such a system work. Efficient message passing and handling is also necessary. The authors also decide to have separate processing elements on each node: a main processing unit, a remote memory processing element that handles requests from remote nodes for loads and stores, and a synchronizing processing element which handles responses returning to the node.
The authors' reason for the separate processing elements is that they want competitive single-thread performance, and with that goal, they don't want the main processor to be interrupted by network activity. At the same time, simple requests such as loads and stores should be handled quickly. Hence separate processing units.
The actual instruction set is very similar to the P-RISC instruction set.
I should go over this paper again sometime. The idea of putting multiple processing units on a node is interesting, but unfortunately not applicable to what I'm doing in the short term. The authors do have a bunch of code examples, which may prove enlightening as well.
Cilk: An Efficient Multithreaded Runtime System
by Robert D. Blumofe et. al., 5th Symposium on Principles and Practice of Parallel Programming
Since searches including "Multithreaded" and "Runtime" tend to pick this one up a lot, I came upon it pretty quickly. Unfortunately, it is about software multithreading, and as such does not apply very directly to what I'm looking for.
Cilk is basically an extension to C. Cilk programs get preprocessed into C programs and are then linked against a runtime library.
In Cilk, threads are nonblocking C functions that get run to completion. If a thread wants to spawn a child thread and then wait for a result from that child, it has to spawn a successor thread that waits for the data from the child, since the thread itself can't block.
The way that a successor thread gets scheduled is through continuations. The meaning the authors have chosen for the terms they use are poor. A successor thread should itself be called a continuation. The objects the authors call continuations would more appropriately be called futures. But anyway, the parent will spawn a child, specifying to that child a continuation where it should store the value it computes. It then spawns a successor which depends on that continuation as one of its input values. The successor will not be scheduled until the value of the continuation has been computed. Bad choice of terms. But it makes sense enough.
The Cilk scheduler tries to keep processors busy by employing a work-stealing technique. This sort of technique may be interesting to look at when doing internode scheduling, but right now, this paper really didn't offer me much.