After coreference resolution is completed by an automated assistant in an augmented reality (AR) device or smartphone, the automated assistant performs a joint graph-based reasoning method to conduct an intelligent dialog with a user. The joint graph-based reasoning method uses information from various data sources (such as a scene graph, memory graph, knowledge graph, etc.) that enables the automated assistant to provide responses to comments that are provided by the user during the dialog. The automated assistant performs the dialog with the user for shopping, visual question answering (VQA), or other interactive user activity.

