Testing, including user interface testing and A/B testing is an important part of application development. It can be difficult for human testers to write appropriate test assertions or checks for user interface tests. Human testing is costly and not scalable. This disclosure describes the use of a large language model (LLM) to automate testing for software applications, including UI tests, A/B tests, and simulating manual QA tests. A dataset of prior known errors in the UI rendering is constructed and labels are associated with different types of errors. Prompt engineering is carried out to develop a library of prompts that, when provided to an LLM along with UI screenshots, cause the LLM to return responses that indicate whether the UI has errors and details about the error. UI screenshots are obtained by executing test cases for a codebase under test, e.g., using a mobile device simulator, and are analyzed using the LLM with appropriate prompts. The LLM responses include assertions that are integrated into a testing framework. A/B testing is carried out by providing the LLM UI screenshots or results from both control and experiment arms along with appropriate prompts. Automated LLM-driven testing is scalable, can reduce costs, expand test coverage, and reduce errors in production.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.