Modern cameras, including smartphone cameras, are powerful and provide a large set of features that enable users to capture photos/videos in a variety of conditions and per their aesthetic preferences. However, such features are accessed via menus and complex user interfaces that are difficult for a user to navigate. This disclosure describes techniques that utilize a large language model (LLM) to provide a natural language interface to enhance traditional camera functionality. By utilizing an LLM, the user interaction is made more intuitive as users can communicate with their camera using natural language, reducing or eliminating the reliance on complex interfaces, and thereby enhancing user experience and accessibility. Users can indicate their intent in natural language. Additionally, the LLM is utilized to perform scene understanding and to automatically provide suggestions for camera settings, shot composition, etc. The LLM can also interpret user interactions including successes and failures, enabling camera developers to update features to meet user preferences.

