This disclosure describes techniques to enhance automated speech recognition by enabling automatic recognition of words spelled out by users. Machine learning techniques are utilized to detect explicit user intent to spell out a word as well as detect spelled out words without an explicitly stated user intent. If it is determined that the user is spelling a word, a spelling mode is triggered wherein received letters are concatenated together to form a word. If the user permits, data that includes the user context, audio of the word, audio of the user spelling out the word, and the textual representation of the word are obtained and utilized for training. The trained machine learning model is utilized in subsequent processing of user speech.

