Augmented reality (AR) glasses and other AR capable devices are often used to recognize text, QR codes, bar codes, and more generally, to automatically detect objects. However, continuously running object detection and categorization techniques, as well as specialized models for individual object categories results in excessive consumption of power and compute resources. This disclosure describes techniques to optimize power consumption and latency in AR glasses by using a coarse-grained classifier as a gating model. The coarse-grained classifier is trained to categorize objects into text, QR code, barcode, objects of interest, etc. When a particular category has a high confidence score, the corresponding fine-grained model is triggered. For example, if the category ‘text’ has a high confidence score in a particular part of the field of view, an OCR model is triggered, e.g., to transcribe or translate the text.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.