Abstract

There is an increasing demand for co-processors to supplement the functions of primary processors. Inside the Kraken architecture, by offloading energy-intensive tasks from the main processor, co-processors can accelerate system performance while maintaining a highly efficient energy-delay product tradeoff. Performance-per-watt is significantly higher in these specialized, energy-efficient processing elements. The high-performance coprocessors targeted in this paper are descendants of conservation cores (c-cores) [1], which are automatically synthesized from application source code. The co-processor shares the coherent cache with the main processor and contains numerous processing nodes (load and store request/respond nodes) that access the cache via an interconnect. This paper presents an experimental study using an application characterized by a heavily unrolled loop to analyze the core and interconnect with respect to area, power, and delay. Results indicate that power is predominantly distributed among operators and mux-tree elements of the datapath. We discuss several micro-architectural and layout solutions to resolve these power distribution bottlenecks.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Tummala, Gopi K., "Kraken Co-Processor Interconnect", Technical Disclosure Commons, (June 08, 2026)
https://www.tdcommons.org/dpubs_series/10387

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Kraken Co-Processor Interconnect

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Kraken Co-Processor Interconnect

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information