This disclosure describes a two-stage adversarial defense framework for the detection of policy violating content in online content platforms. Deep learned semantic features as well as pixel-level features of an input image are utilized to detect policy violating images. A two-stage template-matching based approach is utilized to detect policy violating images. In a first stage, semantic matching is utilized to search for a set of k-nearest neighbors of an input (new) query image from a previously labeled image database. Policy violating (positive) neighbors are identified from the set of the k-nearest neighbors. In a second stage, instance matching between the query image and its positive neighbor(s) is performed to match local features detected from the query image and the positive neighbor images. Geometric verification of local features of the query image is performed against local features of the positive neighbors. Based on the geometric verification, class labels for the query image are determined and utilized to verify the policy compliance of the query image.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.