Inventor(s)

NAFollow

Abstract

Artificial intelligence (AI) agents typically serve a single human as a client, but such a single human focus by an AI agent can be sub-optimal in contexts where the AI agent performs tasks for teams or other groups of humans. This disclosure describes techniques to develop and evaluate AI agents capable of robust operation in multi-human environments. In a data collection phase, group chat or virtual meeting experiments are organized where participants roleplay conversations to accomplish predefined tasks such as scheduling a meeting, planning an event, etc. Communication transcripts from such experiments capture the complexities of group interactions. In a data augmentation phase, the collected data is expanded to cover a wide array of multi-human, multi-agent scenarios. In an evaluation phase, human and model-based raters assess agent performance against rubrics such as effectiveness, factuality, efficiency, appropriateness, coherence, etc. Results from human and model-based evaluation are synthesized and visualized to arrive at a holistic understanding of the capabilities of the agent and to identify areas for improvement. In this manner, datasets for agent creation and improvement in a multi-human context can be generated, and the performance of AI agents in multi-human environments can be evaluated.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS