Abstract
The ratio of memory bandwidth to compute power (B/C ratio) of a compute die influences the performance of machine learning (ML) applications. The B/C ratio generally degrades with the size of the compute die, as compute power is proportional to the area of the compute die while bandwidth is proportional to the perimeter of the die. This disclosure describes techniques to increase the ratio of memory bandwidth to compute power within an integrated circuit by using tiling geometries that enable the attachment of a compute die to two or more memory dies and by enabling bidirectional workload flows within the same architecture. Furthermore, the techniques provide design flexibility in provisioning between compute power, memory bandwidth, and memory capacity by enabling a non-uniform mix of high bandwidth memory and static random access memory tiles, thereby addressing advantageously the fast evolution of ML models.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
n/a, "2.5-Dimensional High Bandwidth Flexible Memory for High Performance Parallelism", Technical Disclosure Commons, (February 19, 2024)
https://www.tdcommons.org/dpubs_series/6698