Current object detection models face difficulties in detecting small but important objects within large frames, posing difficulties in applications such as security cameras or camera timers for selfies or group photographs that are based on detection of hand gestures. This disclosure describes techniques that utilize additional signals from image frames to improve the detection rate and location accuracy of small objects within the frame. For example, face detection can be performed and detected faces can be used as anchors to detect nearby hand gestures that are relatively small. Faces can be clustered and regions around the faces cropped and enlarged such that small but important hand gestures are magnified and are easier to detect.

