Abstract
In the context of large language models (LLMs), memory bandwidth has not kept pace with the computational abilities of machine learning processing units, such that memory is a performance bottleneck. This disclosure describes a memory controller and data retrieval techniques that use semantic search to reduce the complexity of memory access, and, by clearly enforcing data requirements, reduce the amount of data that needs to be transferred. In contrast to traditional memory controllers, which require precisely specified data locations, the described memory controller uses contextual intelligence to determine the importance of requested data, such that only essential data is retrieved. This results in data compression and the saving of resources such as storage capacity and memory bandwidth. Furthermore, the controller offloads memory access patterns, thereby freeing machine learning processing units for primary tasks. The controller scales with and conforms to the requirements of artificial intelligence (AI) generally and LLMs in particular.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
NA, "Semantic Search Based Memory Controller to Accelerate LLMs and Foundational Models", Technical Disclosure Commons, (November 15, 2024)
https://www.tdcommons.org/dpubs_series/7537