Pete's Log: How effective is multithreading, really?
Entry #1154, (Coding, Hacking, & CS stuff)(posted when I was 23 years old.)
I had a really good talk with Dr. Lambert Schaelicke today. He's a new professor in the department and seems like a really nice guy. He gave me a bunch of good pointers in several areas. First he provided me with some useful tips about how to go about doing research in general. Second, he gave me some good advice about writing architectural simulators (something he's done a fair share of, and something I'm likely to dedicate a lot of time to myself). Third, he gave me some good places to look at for hardware multithreading stuff. He also agreed with my observation that in the realm of MTA, too little focus has been placed on operating system support. He also confirmed some of my recent doubts as to how useful hardware multithreading really is. He made some good points against it. Not that he was against the concept, he just played a good devil's advocate. I certainly still like the idea, but it definitely is not without problems.
So from some of the pointers he gave me, I found several promising papers. The first that I read is The Effectiveness of Multiple Hardware Contexts by Radhika Thekkath and Susan J. Eggers, published in ASPLOS 1994. First I just want to mention just how well written I thought this paper was. The authors did a fantastic job of explaining what their process was, presenting detailed results, and properly explaining what their results meant. I was quite impressed.
The paper basically presents the results of running a bunch of simulations in order to determine how effective hardware multithreading really is. The main problem they address is that hardware multithreading generally decreases cache effectiveness, because with multiple contexts, the working set addressed by the processor becomes larger. So the authors ran various programs through a bunch of simulations in which they varied the number of processors, the total number of hardware contexts, and various cache attributes. Some of the programs they used had been optimized for data locality, others had not.
The results were interesting. The locality-optimized programs generally saw improvements when additional contexts were added, while the unoptimized programs usually increased a little when the number of contexts per processor was increased to two, but generally saw decreasing performance if any additional contexts were added.
The authors then played around with cache organization. While they were able to find some improvements with cache organization, it generally wasn't very large, though there were definite exceptions. What everything basically came down to was that performance gains from adding processors were usually much larger than gains from adding contexts, and in some cases adding contexts decreased performance. On the other hand, adding new contexts requires less cost than adding new processors. And in some cases, more contexts on fewer processors does compare favorably to more processors. So multithreading is only a means to gain modest performance improvement, but it's cheap. Cache organization is important in multithreading.
Granted that cache organization probably has less effect on PIM ...
This paper does however fit into the category of what I wanted to see in my entry yesterday, and it has a lot of good references I want to chase down. So I'm quite happy.
must flee to Ann Arbor, no proofreading will be done to this entry.