A barrel processor implements a technique to interleave a set of B instruction streams in a round-robin manner, such that a given thread/stream has an instruction slot once every B cycles. The barrel approach delivers throughput in a way that is efficient in circuit area and power, but suffers from the constraint that a given thread of execution can issue only once every N cycles, limiting single-threaded performance. This disclosure describes a hybrid architecture and microarchitecture that improves single-threaded performance while preserving the bandwidth advantages for multithreaded applications.

