When a memory access for a dynamic random access memory (DRAM) is completed, the accessed page is closed, which consumes energy and time. In the presence of workloads with temporal access locality, such operation is expensive and introduces latency. Traditional CPUs include caches that ensure that such memory behavior does not negatively impact memory accesses with temporal locality. However, for hardware accelerators such as machine learning accelerators that do not include caches, workloads that have temporal access locality can suffer. This disclosure describes techniques to efficiently service memory accesses for workloads that exhibit temporal locality while ensuring that the performance of other types of accesses is not compromised. The techniques result in improved bandwidth efficiency for off-chip memories, especially for accesses by domain-specific hardware accelerators such as machine-learning accelerators.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.