Abstract

Machine translation is widely utilized to translate text between different language pairs. Applications of automatic translation include content localization. Different regions of the world utilize different measurement units (e.g., acre vs. hectare). Correctly converting and translating measurement units is thus an important part of content localization. Current machine translation models have low accuracy when translating numbers and are unable to handle unit conversions. This disclosure describes techniques to train a machine learning model such that it can generate accurate translations of numbers, including unit conversions. A base model is trained using input text that is tokenized, including splitting numbers into individual digits. Parameters of the trained base model are used to initialize a custom model that is fine-tuned using training data that has been augmented to include annotations, e.g., different values and units for each measurement in the source text. The trained custom model described can deliver correct number translations and unit conversions and can be used for content localization.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Hong, Haijie and Sun, Lu, "Number Translation and Unit Conversion Using Machine Learning", Technical Disclosure Commons, (September 27, 2021)
https://www.tdcommons.org/dpubs_series/4621

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Number Translation and Unit Conversion Using Machine Learning

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Number Translation and Unit Conversion Using Machine Learning

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information