Inventor(s)

Abstract

Hierarchical provisioning of alignment bits is described for tensor cores operating on block floating-point or microscaling (MX) data types. A tensor core multiplies element pairs to form products, aligns products within each block using an intra-block alignment bit setting, and reduces aligned products with an intra-block adder tree to form per-block partial sums. The per-block partial sums are then aligned across blocks using an inter-block alignment bit setting and reduced using an inter-block adder tree to produce an accumulated result that may be converted to FP32. The intra-block alignment bit setting is provisioned independently from the inter-block alignment bit setting and is set smaller based on numeric characteristics of the intra-block stage, while the inter-block alignment bit setting is fully provisioned. Input-pattern-based procedures are also described to detect effective intra-block and inter-block alignment behavior.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS