Abstract

Many modern processing systems utilizing bit‑scalable architectures may experience performance limitations due to dynamic power consumption from heavy integer arithmetic operations. Executing higher‑precision quantized matrix multiplications on lower‑precision multiply‑and‑accumulate (MAC) hardware typically requires calculating multiple independent partial products, which may lower computational efficiency. The disclosed processing system derives cross‑product terms simultaneously through a fused mathematical operation. Specifically, it uses pre‑multiplication addition on decomposed operand segments and executes a post‑multiplication subtraction to isolate the cross‑products. This approach reduces the total number of hardware multiplication operations utilized per calculation. By reducing these redundant steps, the processing system may lower dynamic power consumption, reduce thermal output, and increase operational throughput for varying mathematical models.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS