Images of crowded scenes typically have been challenging for human-detection and pose-estimation algorithms. Top-down approaches suffer from reliance on non-maximum suppression (NMS) algorithms, which often remove valid detections, while bottom-up approaches inconsistently associate body parts of different people into the same detection. This disclosure presents techniques that combine elements of both top-down and bottom-up approaches, by leveraging the observation that head-boxes overlap less with each other as compared to body-boxes. NMS algorithms are applied to head-boxes instead of body-boxes. Head boxes are detected jointly, and are matched to the corresponding body-boxes. The techniques improve detection and pose estimation results for images of crowded scenes.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.