Systems and methods for image capture in headsets are provided. The system may receive, by a headset including a camera, an audio signal comprising a wake-word and a voice command to take a picture. The system may further detect the wake-word of the audio signal by a keyword spotter of the headset. The system may further provide the audio signal to a multi-keyword spotter responsive to detecting the wake-word by the keyword spotter. The system may further send a signal to the camera of the headset to take a picture responsive to detecting by the multi-keyword spotter that a portion of the audio signal represents a voice command for taking a picture.

