Abstract
There is an increasing demand for co-processors to supplement the functions of primary processors. Inside the Kraken architecture, by offloading energy-intensive tasks from the main processor, co-processors can accelerate system performance while maintaining a highly efficient energy-delay product tradeoff. Performance-per-watt is significantly higher in these specialized, energy-efficient processing elements. The high-performance coprocessors targeted in this paper are descendants of conservation cores (c-cores) [1], which are automatically synthesized from application source code. The co-processor shares the coherent cache with the main processor and contains numerous processing nodes (load and store request/respond nodes) that access the cache via an interconnect. This paper presents an experimental study using an application characterized by a heavily unrolled loop to analyze the core and interconnect with respect to area, power, and delay. Results indicate that power is predominantly distributed among operators and mux-tree elements of the datapath. We discuss several micro-architectural and layout solutions to resolve these power distribution bottlenecks.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Tummala, Gopi K., "Kraken Co-Processor Interconnect", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10387