Languages differ in the way they express gender. Some languages are gender-neutral, while others have masculine, feminine, or neuter forms. When translating gender-neutral text into a language with gender, there can be more than one valid translation. For example, the Turkish query “O bir doktor” is translated into English as either “He is a doctor” or “She is a doctor.” Historically, machine translation has not been able to generate more than one gender specific translation. Put differently, machine translation would “choose” a single masculine form, feminine form, or other gender variant as the translation output. In neural machine translation, this “choice” reflects bias in the training data used to train the translation model.
This disclosure presents techniques to generate both a masculine and a feminine translation for gender-neutral text, thereby reducing or eliminating gender bias in machine translation. The technique includes three main components: detection, generation of alternatives, and validation. In the detection phase, it is detected whether a given query is gender-ambiguous or not. If the detection component triggers, two translations for the query are generated: one masculine and one feminine. Finally, the validation step verifies that the two translations are high quality before showing them to users.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Kuczmarski, James and Johnson, Melvin, "Gender-Aware Natural Language Translation", Technical Disclosure Commons, (October 08, 2018)