Leveraging the vectorizability of deep-learning weight-updates, this disclosure describes processing-in-memory (PIM) techniques for weight-updates in a large class of deep learning networks. Rather than importing the state of the deep-learning optimizers to the computational die, the techniques send gradients to a die of a high-bandwidth memory (HBM) stack and perform the modest number of optimizer updates in compute units located in the die. Since reads and writes are done inside the HBM stack, the techniques can substantially reduce the CPU-HBM bandwidth requirements. Weight-related memory traffic, dominant for multilayer perceptrons and transformers, is also reduced.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.