Abstract

Existing automation solutions for data processing pipeline, including those that use machine learning techniques, lack customization and flexibility and fail to address specific needs or account for unique data requirements such as privacy requirements or stringent validation requirements for certain types of data. This disclosure describes techniques that support multiple use cases in data processing pipelines through self-service tools and provide a framework for various sections of data pipeline creation. This benefits stakeholders such as data scientists, machine learning (ML) engineers, and data engineers. A large language model (LLM)-backed engine powers the chatbot and automatically generates certain files that provide time savings in data engineering efforts. The LLM can be fine-tuned using two different techniques - prompt engineering and/or retrieval-augmented generation (RAG).

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS