Abstract

Structured data gathered from multiple diverse sources may be available in a variety of forms and differing levels of completeness. Some of the missing data may be available in unstructured form, while other attributes may be unknown. This disclosure describes the use of a chatbot, e.g., powered by a large language model, to automatically complete and homogenize structured data. The chatbot is provided a prompt with the available structured and unstructured data, listing the missing attributes. The prompt includes a command that specifies a response format. The chatbot may extract information from the input unstructured data or independently generate values for the missing attributes. The chatbot response includes values for the missing attributes, along with a confidence level for each value. The structured data is homogenized by including the generated values and can be used for subsequent queries. Data from such a dataset, when provided in query responses, may be subjected to a confidence level threshold based on the query and/or application type.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS