Inventor(s)

Abstract

Techniques are described for computing online advertisement pricing adjustments using user feedback signals while separating user/context propensity from ad-candidate effects. A first machine learning model generates a state-dependent prediction V(s) of a feedback event probability from state features that exclude ad-candidate features. A second machine learning model generates a candidate-dependent prediction Q(s,a) from the state features and action features describing the ad candidate. An advantage-style combination, such as A(s,a)=Q(s,a)-V(s), produces an ad-attributable signal that controls for inherent user/context feedback propensity. The advantage output is converted to a pricing adjustment, for example UDV×Scalar×(Q-V), and provided to an auction-stage pricing layer. A multi-task neural network with feature routing may implement separate heads for V(s) and Q(s,a), with the combination performed in a pricing layer. The approach may be applied to multiple feedback types and ad formats.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS