Abstract

We propose a method for image retrieval in indoor scenes that are subject to perceptual changes caused by the temporal replacement of common household objects. To address this challenge, we propose a method that masks out these dynamic regions in either the query or database images, allowing retrieval based on the static scene elements. Additionally, we augment the database by rendering images from unseen viewpoints using 3D Gaussian splatting, thereby making candidate matches more likely to be retrieved from an augmented database. Each design choice is supported by the corresponding supporting experiments. Through evaluation on a dataset collected in several indoor scenes under the changes, we show that our method, which augments the database and masks object regions, outperforms the baseline (without masking neither query nor database and without augmentation), especially at thresholds of (0.5 m, 15 ) and (2.0 m, 60 ), achieving a 3% improvement in precision.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS