TY - GEN
T1 - Supporting speculative multithreading on simultaneous multithreaded processors
AU - Packirisamy, Venkatesan
AU - Wang, Shengyue
AU - Zhai, Antonia
AU - Hsu, Wei Chung
AU - Yew, Pen Chung
PY - 2006
Y1 - 2006
N2 - Speculative multithreading is a technique that has been used to improve single thread performance. Speculative multithreading architectures for Chip multiprocessors (CMPs) have been extensively studied. But there have been relatively few studies on the design of speculative multithreading for simultaneous multithreading (SMT) processors. The current SMT based designs - IMT [9] and DMT [2] use load/store queue (LSQ) to perform dependence checking. Since the size of the LSQ is limited, this design is suitable only for small threads. In this paper we present a novel cache-based architecture support for speculative simultaneous multithreading which can efficiently handle larger threads. In our architecture, the associativity in the cache is used to buffer speculative values. Our 4-thread architecture can achieve about 15% speedup when compared to the equivalent superscalar processors and about 3% speedup on the average over the LSQ-based architectures, however, with a less complex hardware. Also our scheme can perform 14% better than the LSQ-based scheme for larger threads.
AB - Speculative multithreading is a technique that has been used to improve single thread performance. Speculative multithreading architectures for Chip multiprocessors (CMPs) have been extensively studied. But there have been relatively few studies on the design of speculative multithreading for simultaneous multithreading (SMT) processors. The current SMT based designs - IMT [9] and DMT [2] use load/store queue (LSQ) to perform dependence checking. Since the size of the LSQ is limited, this design is suitable only for small threads. In this paper we present a novel cache-based architecture support for speculative simultaneous multithreading which can efficiently handle larger threads. In our architecture, the associativity in the cache is used to buffer speculative values. Our 4-thread architecture can achieve about 15% speedup when compared to the equivalent superscalar processors and about 3% speedup on the average over the LSQ-based architectures, however, with a less complex hardware. Also our scheme can perform 14% better than the LSQ-based scheme for larger threads.
UR - http://www.scopus.com/inward/record.url?scp=77049104925&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77049104925&partnerID=8YFLogxK
U2 - 10.1007/11945918_19
DO - 10.1007/11945918_19
M3 - Conference contribution
AN - SCOPUS:77049104925
SN - 354068039X
SN - 9783540680390
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 148
EP - 158
BT - High Performance Computing - HiPC 2006 - 13th International Conference Proceedings
T2 - 13th International Conference on High Performance Computing, HiPC 2006
Y2 - 18 December 2006 through 21 December 2006
ER -