Abstract
This disclosure introduces Generative Vision Bot, a self-service, prompt-driven image analysis platform powered by generative vision-language models. It enables non-ML professionals to perform advanced computer vision tasks such as defect detection, microscopy analysis, and scientific image interpretation without the need for large labeled datasets, coding expertise, or lengthy model development cycles. By leveraging few-shot and zero-shot learning, the system generates meaningful insights from raw images in under three hours using natural-language prompts and minimal annotation (zero to five examples). The platform includes task-specific templates (FREE, TAG, RACE, TRACE), automatic prompt optimization, and an intuitive interface designed for usability and explainability. Proven to reduce ML engineering time and labeling effort by approximately 90 percent, Generative Vision Bot democratizes access to image analysis, accelerates R&D, and enables data-driven decision-making across high-impact, low-resource environments.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Gu, Wenfeng; Zhu, Zhen; Samael, Danutha; Soh, Chin Hock; Wong, Yee Ping; Chaim, Seng Hin; Chinn, Randy; Mohamed Kamil, Muhammad Yazid; Kok, Jun Lee; Zhang, YongGang; and Bom, Sthitie, "Generative Vision Bot: Democratizing Advanced Image Analysis with GenAI for Non-ML Experts", Technical Disclosure Commons, (August 08, 2025)
https://www.tdcommons.org/dpubs_series/8440