Pete's Log: Multi-tasking on a dataflow-like architecture
Entry #1153, (Coding, Hacking, & CS stuff)(posted when I was 23 years old.)
I still have a backlog of summaries to write up, but in the meantime I am going to start adding summaries as I read things since things are more easily summarized while fresh in my mind. A quick note to those who didn't like the number of summaries posted yesterday: it's Pete's Log. That's me. It's my log, for my benefit. You chose to subscribe and it's easy enough to delete entry mails you don't want to read. *grin* onwards...
Multi-tasking in a dataflow-like architecture
by Kattamuri Ekanadham, IBM tech report RC 12307 (#55198) 11/12/86
My first comment is that I would have expected an IBM tech report to be less poorly written. It was at times very frustrating to read. Beyond that, this paper was interesting. It described an idea that I've seen plenty of times before, which is to merge concepts of dataflow architectures with concepts of traditional von Neumann architectures. Somewhere in the middle, it seems, we'll find our holy land. This starting point is evident in figure one, which is easily described. It's a simple DAG. On the left side, we start with a node marked "imperative program" which points to a node marked "explicit parallel threads" which points to a node marked "von Neumann multi-processors." On the right side, we start with a node marked "functional program" which points to "implicit dataflow graphs" which points to "dataflow multi-processor." In the middle, we have a node labeled "???" which is pointed at by both "imperative program" and "functional program" and which in turn points at another node labeled "???". This middle ground approach labeled simply by question marks has been the topic of quite a few papers I've read. This figure does a good job of summarizing the approach, though I don't like its implication that there is an architectural middle ground, but not one at the programming language level. It's quite possible, though, that the implication is unintentional and I'm reading into the figure too much.
So anyway, what new insight does this paper offer? An abstract architectural model is presented. I desire a name for these sorts of architectures. "Paper architectures" works well enough, but I want something wittier.
Yuk, I'm rambling. Where was I? So this abstract model proposed is supposed to support both von Neumann programming models as well as dataflow graph programming models. So the abstract model is described (global memory, multiple processors, programs are divided into blocks, blocks are divided into tasks, special instructions are used in addition to a traditional instruction set to enable task management. tasks are run-to-completion, but if made fine grained (small number of instructions) then we find ourselves near the dataflow end of the spectrum. the machine has only one register set ... task context switch time is thus an issue). Then means for "compiling" both dataflow graphs and imperative programs into code for this architecture are described. Finally ideas and issues for implementation of such a machine are given.
I will abstain from making too many negative comments about this architecture, because it provides one really cool thing that justifies most of the features I don't like about it. This architecture was designed for experimentation. The whole point of the model is that it easily accommodates dynamic adjustment of the grain size of parallelism exploited. Given this ability, one is almost presented with a knob that can be turned from "100% von Neumann" to "100% dataflow" (well, realistically the model probably can't make it to 100% on either side, but whatever). I like this idea. I'm definitely approaching a point where I've read enough to realize that this middle ground approach seems reasonable, and I'm definitely liking the benefits of MTA more every day, but I've yet to see any significant work which experimentally played with any of the tradeoff variables present in MTA design in order to argue one design is better than another. Seeing how I'm not even remotely close to being well versed in the field, however, this doesn't mean that such experimental results are not out there. I've simply not come across them yet. And it is also apparent that certain approaches have had better success than others. The approach taken by Tera has certainly proven implementable and promising (though I lack any recent news on how Tera is doing, my Tera knowledge is at best two years out of date). But at the same time, all the Tera papers I've read don't offer clear arguments for why the approach they chose is the right one. For the most part, the Tera papers just say "we did it like this" but without too much detail. Sometimes they say "we did it like this, and here's a reason why." But given how many design decisions there are, it seems that better justification needs to be given. I want to see "we considered this approach, but decided against it for reasons x, y, and z and instead went with this other approach, because issues x, y, and z didn't affect it and additionally, reasons q, r, and s suggested it to be a good approach." or "experimental data obtained from simulating these two alternative approaches indicate that in situations X this approach wins, but in situations Y the other approach wins. Given our design goals, we feel situations X are more likely, so we are choosing to take the first approach."
I just realized that this can no longer be accurately describes as a summary of the paper I just read. If anyone knows of any papers that do what I just described, please let me know. Otherwise I rather like the idea of doing such experimentation myself.
Let's see, what else is there to take from this paper. The compilation techniques were unexciting, though my desire to take an advanced compiler course was yet again reinforced. The implementation section was somewhat interesting, though my main impression was that the implementation seemed excessively complex. One interesting idea was that they were describing a resource management unit (RMU) that would be part of the architecture. At first the RMU was described as a separate processor which ran a resource manager program (a.k.a. system code!!!) from memory. I found this interesting because having system code run concurrently with user code is an issue I must deal with. Unfortunately, I was not happy with their further description of the resource manager. Although it was initially described as a separate processing unit, they then went on to talk about how there was overhead involved in switching into the RMU, which made little sense to me if it's supposed to be a separate unit of the processor. So most of the rest of what they said pertained to having the RMU preallocate blocks in big enough chunks so that it would not be called too often, and using caching techniques to find those preallocated blocks when a new block is requested.
Final thoughts: interesting paper, but not so much for any technical reasons. Also, interlibrary loans are cool. I love libraries. The librarians in the engineering library have become some of my favorite people of recent.