A conditional deep learning model that learns specialized representations on a decision tree is described. Unlike similar methods taking a probabilistic mixture of experts (MoE) approach, a feature augmentation based method is used to jointly train all network and decision parameters using back–propagation, which allows for deterministic binary decisions at both training and test time, specializing subtrees exclusively to clusters of data. Feature augmentation involves combining intermediate representations with scores or confidences assigned to branches. Each representation is augmented with all of the scores assigned to the active branch on the computational path to encode the entire path information, which is essential for efficient training of decision functions. These networks are referred to as Branching Neural Networks (BNNs). As this is an approach that is orthogonal to many other neural network compression methods, such algorithms can be combined to achieve much higher compression rates and further speedups.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.