Abstract

Some automated systems for infrastructure inspection may detect defects from imagery but could lack the ability to provide descriptive context, causal reasoning, or interactive querying. A system is described that can utilize a multimodal artificial intelligence model to analyze visual data, such as pre-existing geolocalized imagery, in conjunction with natural language text prompts. The system can generate detailed analytical outputs including, for example, segmentation masks and bounding boxes for defect localization, as well as descriptive labels detailing the nature of a defect, its potential causes, and the reasoning for its detection. This approach can facilitate a more comprehensive and interactive analysis of infrastructure conditions and could enable remote, scalable, and longitudinal assessments to track defect progression over time.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS