A virtual, intelligent, or computational assistant (e.g., also referred to simply as an “assistant”) is described that is configured to properly pronounce word junctures when converting text to speech (e.g., when synthesizing audio data for output to a user). Some example word junctures that the assistant may properly pronounce include, but are not limited to, false geminates, affricates, and other such letter/word combinations. For instance, when performing text to speech on the text “black cat”, the assistant may determine that the consecutive combination of the words “black” and “cat” is a false geminate because the last consonant phoneme in “black” is the same consonant phoneme at the start of “cat” (i.e., black ends with the consonant phoneme /k/ and cat starts with the consonant phoneme /k/). As such, the assistant may pronounce the text “black cat” differently than the separate pronunciations of “black” and “cat.” Specifically, the assistant may avoid repeating the consonant phoneme /k/ when pronouncing “cat.”

